Safetensors
qwen3

Logics-STEM-8B-SFT

πŸ“° News


πŸ”Ž Overview

Logics-STEM-8B-SFT is an 8B-parameter open reasoning model for STEM (Science, Technology, Engineering, Mathematics) tasks. It is fine-tuned from Qwen3-8B using supervised fine-tuning (SFT) on Logics-STEM-SFT-Dataset-2.2M, a large-scale long chain-of-thought dataset covering math and broader STEM problems.

Logics-STEM-8B-SFT provides a strong SFT baseline for STEM reasoning: it achieves strong performance on STEM benchmarks and is competitive with many open-source RL-trained reasoners at similar scale, while maintaining robust instruction following and answer formatting (especially on STEM multiple-choice tasks that require outputting an option letter in \boxed{}).

This model can also serve as a high-quality starting point for second-stage reinforcement learning (e.g., RL with verifiable rewards) to further improve reasoning accuracy and robustness.


πŸ“Š Experimental Results

Figure. The pass@1 performance of Logics-STEM-8B-RL on Math Evaluation Benchmark.

Figure. The pass@1 performance of Logics-STEM-8B-RL on STEM-related Evaluation Benchmark.


πŸš€ Quickstart

Transformers (recommended)

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Logics-MLLM/Logics-STEM-8B-SFT"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

prompt = """Let p be the least prime number for which there exists a positive integer n such that n^4 + 1 is divisible by p^2.
Find the least positive integer m such that m^4 + 1 is divisible by p^2.
Please put your final answer within \\boxed{}.
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(
    **inputs,
    max_new_tokens=4096,
    temperature=0.6,
    top_p=0.95,
    top_k=20,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))

πŸ“š Citation

If you use Logics-STEM-8B-RL, please cite the following technical report:

@misc{xu2026logicsstemempoweringllmreasoning,
      title={Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement}, 
      author={Mingyu Xu and Cheng Fang and Keyue Jiang and Yuqian Zheng and Yanghua Xiao and Baojian Zhou and Qifang Zhao and Suhang Zheng and Xiuwen Zhu and Jiyang Tang and Yongchi Zhao and Yijia Luo and Zhiqi Bai and Yuchi Xu and Wenbo Su and Wei Wang and Bing Zhao and Lin Qu and Xiaoxiao Xu},
      year={2026},
      eprint={2601.01562},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2601.01562}, 
}

πŸ™ Acknowledgement

We thank the open-source community and toolchain that enabled this work, including (non-exhaustive): Qwen3 models (teacher-model, embedding-model, base-model), DeepSeek-R1 (data synthesis), ROLL (RLVR reward/infra), math-verify (answer verification), and community datasets listed in the technical report.

Downloads last month
14
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Logics-MLLM/Logics-STEM-8B-SFT

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Finetuned
(821)
this model

Dataset used to train Logics-MLLM/Logics-STEM-8B-SFT

Paper for Logics-MLLM/Logics-STEM-8B-SFT