Logics-STEM-8B-SFT

📰 News

[2026.01.05]🔥 Release of our Techinical Report.
[2026.01.05]🔥 Release the first version of Logics-STEM-8B-SFT, Logics-STEM-8B-RL, Logics-STEM-SFT-Dataset-2.2M.

🔎 Overview

Logics-STEM-8B-SFT is an 8B-parameter open reasoning model for STEM (Science, Technology, Engineering, Mathematics) tasks. It is fine-tuned from Qwen3-8B using supervised fine-tuning (SFT) on Logics-STEM-SFT-Dataset-2.2M, a large-scale long chain-of-thought dataset covering math and broader STEM problems.

Logics-STEM-8B-SFT provides a strong SFT baseline for STEM reasoning: it achieves strong performance on STEM benchmarks and is competitive with many open-source RL-trained reasoners at similar scale, while maintaining robust instruction following and answer formatting (especially on STEM multiple-choice tasks that require outputting an option letter in \boxed{}).

This model can also serve as a high-quality starting point for second-stage reinforcement learning (e.g., RL with verifiable rewards) to further improve reasoning accuracy and robustness.

📊 Experimental Results

Figure. The pass@1 performance of Logics-STEM-8B-RL on Math Evaluation Benchmark.

Figure. The pass@1 performance of Logics-STEM-8B-RL on STEM-related Evaluation Benchmark.

🚀 Quickstart

Transformers (recommended)

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Logics-MLLM/Logics-STEM-8B-SFT"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

prompt = """Let p be the least prime number for which there exists a positive integer n such that n^4 + 1 is divisible by p^2.
Find the least positive integer m such that m^4 + 1 is divisible by p^2.
Please put your final answer within \\boxed{}.
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(
    **inputs,
    max_new_tokens=4096,
    temperature=0.6,
    top_p=0.95,
    top_k=20,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))

📚 Citation

If you use Logics-STEM-8B-RL, please cite the following technical report:

@misc{xu2026logicsstemempoweringllmreasoning,
      title={Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement}, 
      author={Mingyu Xu and Cheng Fang and Keyue Jiang and Yuqian Zheng and Yanghua Xiao and Baojian Zhou and Qifang Zhao and Suhang Zheng and Xiuwen Zhu and Jiyang Tang and Yongchi Zhao and Yijia Luo and Zhiqi Bai and Yuchi Xu and Wenbo Su and Wei Wang and Bing Zhao and Lin Qu and Xiaoxiao Xu},
      year={2026},
      eprint={2601.01562},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2601.01562}, 
}

🙏 Acknowledgement

We thank the open-source community and toolchain that enabled this work, including (non-exhaustive): Qwen3 models (teacher-model, embedding-model, base-model), DeepSeek-R1 (data synthesis), ROLL (RLVR reward/infra), math-verify (answer verification), and community datasets listed in the technical report.

Downloads last month: 14

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Logics-MLLM/Logics-STEM-8B-SFT

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

(821)

this model

Dataset used to train Logics-MLLM/Logics-STEM-8B-SFT

Paper for Logics-MLLM/Logics-STEM-8B-SFT

Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement

Paper • 2601.01562 • Published 7 days ago • 24