Logics-STEM-8B-SFT
π° News
- [2026.01.05]π₯ Release of our Techinical Report.
- [2026.01.05]π₯ Release the first version of Logics-STEM-8B-SFT, Logics-STEM-8B-RL, Logics-STEM-SFT-Dataset-2.2M.
π Overview
Logics-STEM-8B-SFT is an 8B-parameter open reasoning model for STEM (Science, Technology, Engineering, Mathematics) tasks. It is fine-tuned from Qwen3-8B using supervised fine-tuning (SFT) on Logics-STEM-SFT-Dataset-2.2M, a large-scale long chain-of-thought dataset covering math and broader STEM problems.
Logics-STEM-8B-SFT provides a strong SFT baseline for STEM reasoning: it achieves strong performance on STEM benchmarks and is competitive with many open-source RL-trained reasoners at similar scale, while maintaining robust instruction following and answer formatting (especially on STEM multiple-choice tasks that require outputting an option letter in \boxed{}).
This model can also serve as a high-quality starting point for second-stage reinforcement learning (e.g., RL with verifiable rewards) to further improve reasoning accuracy and robustness.
π Experimental Results
Figure. The pass@1 performance of Logics-STEM-8B-RL on Math Evaluation Benchmark.
Figure. The pass@1 performance of Logics-STEM-8B-RL on STEM-related Evaluation Benchmark.
π Quickstart
Transformers (recommended)
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "Logics-MLLM/Logics-STEM-8B-SFT"
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
prompt = """Let p be the least prime number for which there exists a positive integer n such that n^4 + 1 is divisible by p^2.
Find the least positive integer m such that m^4 + 1 is divisible by p^2.
Please put your final answer within \\boxed{}.
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(
**inputs,
max_new_tokens=4096,
temperature=0.6,
top_p=0.95,
top_k=20,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))
π Citation
If you use Logics-STEM-8B-RL, please cite the following technical report:
@misc{xu2026logicsstemempoweringllmreasoning,
title={Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement},
author={Mingyu Xu and Cheng Fang and Keyue Jiang and Yuqian Zheng and Yanghua Xiao and Baojian Zhou and Qifang Zhao and Suhang Zheng and Xiuwen Zhu and Jiyang Tang and Yongchi Zhao and Yijia Luo and Zhiqi Bai and Yuchi Xu and Wenbo Su and Wei Wang and Bing Zhao and Lin Qu and Xiaoxiao Xu},
year={2026},
eprint={2601.01562},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2601.01562},
}
π Acknowledgement
We thank the open-source community and toolchain that enabled this work, including (non-exhaustive): Qwen3 models (teacher-model, embedding-model, base-model), DeepSeek-R1 (data synthesis), ROLL (RLVR reward/infra), math-verify (answer verification), and community datasets listed in the technical report.
- Downloads last month
- 14