Reasoning Complexity Classifier

A ModernBERT-base model fine-tuned to predict the reasoning complexity of educational text on a continuous 1–4 scale. Trained on FineWeb-Edu documents labeled by GPT-5-nano via the OpenAI Batch API (~$20 in credits).

Model Description

This is a regression model (num_labels=1, problem_type="regression") that outputs a continuous score. The score can be rounded to the nearest integer to obtain a discrete complexity level. Level 5 (Formal/Abstract reasoning) was excluded from training due to data scarcity; the model's effective range is 1.0–4.0.

Complexity Levels

Level	Name	Description	Example
1	Factual/Declarative	States facts with no reasoning	"The Pacific Ocean covers ~165 million km²."
2	Single-step reasoning	One inference or comparison	"Because boiling point decreases at altitude, water boils faster in Denver than Miami."
3	Multi-step reasoning	2–4 chained logical steps	"Demand rose while supply held fixed → prices rose → consumer spending fell → GDP slowed."
4	Complex reasoning	5+ steps, conditionals, competing factors	Medical differential diagnosis with branching conditions and exclusion criteria.

Training Details

Data

Source: FineWeb-Edu — a curated subset of Common Crawl filtered for educational content.
Labeling: ~100,000 documents reservoir-sampled from ~6,000 records per subject category, then labeled with GPT-5-nano via the OpenAI Batch API using structured output (integer 1–5).
Splits: 80% train / 10% validation / 10% test (stratified by integer complexity level).
Preprocessing: Texts truncated to 8,000 characters before labeling; tokenized to 512 tokens during training with dynamic padding.
Level 5 exclusion: Rows labeled as level 5 were excluded from the training set.

Hyperparameters

Parameter	Value
Base model	`answerdotai/ModernBERT-base`
Epochs	3
Batch size	32
Learning rate	2e-5
Weight decay	0.01
Warmup ratio	0.1
Max token length	512
Optimizer	AdamW
Scheduler	Linear with warmup
AMP	bf16 (CUDA)
Loss	MSE

Training History

Epoch	Train Loss	Val MAE	Val Acc (rounded)	Val Spearman r
1	0.6002	0.5190	56.98%	0.7533
2	0.3631	0.5040	58.43%	0.7597
3	0.2040	0.5114	58.19%	0.7485

The best checkpoint (by validation MAE) was saved at epoch 2.

Evaluation Results

Evaluated on a held-out test set:

Metric	Value
MSE	0.4388
MAE	0.5063
Rounded accuracy	58.6%
Spearman r	0.7527

Interpretation: The model achieves a Spearman correlation of ~0.75 with gold labels, indicating strong ordinal ranking ability. The MAE of ~0.51 means predictions are on average within half a level of the true score when treated as a continuous signal.

Output Interpretation

Raw score	Meaning
~1.0	Factual/Declarative
~2.0	Single-step reasoning
~3.0	Multi-step reasoning
~4.0	Complex reasoning

Clip and round the raw float output to [1, 4] for a discrete level.

Architecture

Based on answerdotai/ModernBERT-base:

Layers: 22 transformer layers (alternating full and sliding attention)
Hidden size: 768
Attention heads: 12
Intermediate size: 1,152
Max position embeddings: 8,192
Classifier pooling: mean
Classifier activation: GELU

Limitations

Labels are silver-standard (GPT-5-nano), not human-annotated; label noise may affect the ~1.5% of ambiguous texts.
Texts are truncated to 512 tokens; very long documents are judged on their first ~512 tokens only.
Trained primarily on English educational web text; performance may degrade on other domains or languages.

Intended Use

Designed for data curation pipelines that need to filter or balance training corpora by reasoning complexity — for example, constructing curriculum-ordered datasets for language model training.

Downloads last month: 14

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for mdonigian/fineweb-edu-complexity-classifier

Base model

answerdotai/ModernBERT-base

Finetuned

(1091)

this model