Reasoning Complexity Classifier
A ModernBERT-base model fine-tuned to predict the reasoning complexity of educational text on a continuous 1โ4 scale. Trained on FineWeb-Edu documents labeled by GPT-5-nano via the OpenAI Batch API (~$20 in credits).
Model Description
This is a regression model (num_labels=1, problem_type="regression") that outputs a continuous score. The score can be rounded to the nearest integer to obtain a discrete complexity level. Level 5 (Formal/Abstract reasoning) was excluded from training due to data scarcity; the model's effective range is 1.0โ4.0.
Complexity Levels
| Level | Name | Description | Example |
|---|---|---|---|
| 1 | Factual/Declarative | States facts with no reasoning | "The Pacific Ocean covers ~165 million kmยฒ." |
| 2 | Single-step reasoning | One inference or comparison | "Because boiling point decreases at altitude, water boils faster in Denver than Miami." |
| 3 | Multi-step reasoning | 2โ4 chained logical steps | "Demand rose while supply held fixed โ prices rose โ consumer spending fell โ GDP slowed." |
| 4 | Complex reasoning | 5+ steps, conditionals, competing factors | Medical differential diagnosis with branching conditions and exclusion criteria. |
Training Details
Data
- Source: FineWeb-Edu โ a curated subset of Common Crawl filtered for educational content.
- Labeling: ~100,000 documents reservoir-sampled from ~6,000 records per subject category, then labeled with GPT-5-nano via the OpenAI Batch API using structured output (integer 1โ5).
- Splits: 80% train / 10% validation / 10% test (stratified by integer complexity level).
- Preprocessing: Texts truncated to 8,000 characters before labeling; tokenized to 512 tokens during training with dynamic padding.
- Level 5 exclusion: Rows labeled as level 5 were excluded from the training set.
Hyperparameters
| Parameter | Value |
|---|---|
| Base model | answerdotai/ModernBERT-base |
| Epochs | 3 |
| Batch size | 32 |
| Learning rate | 2e-5 |
| Weight decay | 0.01 |
| Warmup ratio | 0.1 |
| Max token length | 512 |
| Optimizer | AdamW |
| Scheduler | Linear with warmup |
| AMP | bf16 (CUDA) |
| Loss | MSE |
Training History
| Epoch | Train Loss | Val MAE | Val Acc (rounded) | Val Spearman r |
|---|---|---|---|---|
| 1 | 0.6002 | 0.5190 | 56.98% | 0.7533 |
| 2 | 0.3631 | 0.5040 | 58.43% | 0.7597 |
| 3 | 0.2040 | 0.5114 | 58.19% | 0.7485 |
The best checkpoint (by validation MAE) was saved at epoch 2.
Evaluation Results
Evaluated on a held-out test set:
| Metric | Value |
|---|---|
| MSE | 0.4388 |
| MAE | 0.5063 |
| Rounded accuracy | 58.6% |
| Spearman r | 0.7527 |
Interpretation: The model achieves a Spearman correlation of ~0.75 with gold labels, indicating strong ordinal ranking ability. The MAE of ~0.51 means predictions are on average within half a level of the true score when treated as a continuous signal.
Output Interpretation
| Raw score | Meaning |
|---|---|
| ~1.0 | Factual/Declarative |
| ~2.0 | Single-step reasoning |
| ~3.0 | Multi-step reasoning |
| ~4.0 | Complex reasoning |
Clip and round the raw float output to [1, 4] for a discrete level.
Architecture
Based on answerdotai/ModernBERT-base:
- Layers: 22 transformer layers (alternating full and sliding attention)
- Hidden size: 768
- Attention heads: 12
- Intermediate size: 1,152
- Max position embeddings: 8,192
- Classifier pooling: mean
- Classifier activation: GELU
Limitations
- Labels are silver-standard (GPT-5-nano), not human-annotated; label noise may affect the ~1.5% of ambiguous texts.
- Texts are truncated to 512 tokens; very long documents are judged on their first ~512 tokens only.
- Trained primarily on English educational web text; performance may degrade on other domains or languages.
Intended Use
Designed for data curation pipelines that need to filter or balance training corpora by reasoning complexity โ for example, constructing curriculum-ordered datasets for language model training.
- Downloads last month
- 14
Model tree for mdonigian/fineweb-edu-complexity-classifier
Base model
answerdotai/ModernBERT-base