Qwen3-4B-Instruct-SFT
A Qwen3-4B model fine-tuned for multi-turn tool-calling in customer support domains.
Overview
This model was supervised fine-tuned on 219 expert trajectories generated by Qwen3-235B-A22B-Thinking for the tau2-bench evaluation framework. It demonstrates tool-first behavior with structured JSON tool calls.
Training Details
- Base Model: Qwen/Qwen3-4B-Instruct
- Training Framework: slime with Megatron backend
- Dataset: 219 multi-turn trajectories across telecom, airline, and retail domains
- Training: 3 epochs, batch size 8, learning rate 2e-5
- Loss: SFT loss dropped from ~1.1 to ~0.18
Output Format
The model produces tool calls in inline JSON format:
<thinking>Analysis of the customer issue...</thinking>
{"name": "tool_name", "arguments": {"param": "value"}}
Intended Use
- Multi-turn agentic tasks requiring tool orchestration
- Customer support automation workflows
- Research on tool-calling LLM agents
Limitations
- Optimized for tau2-bench domains (telecom, airline, retail)
- Requires structured tool schemas in system prompt
- Not intended for general-purpose chat
Citation
If you use this model, please cite the tau2-bench paper:
@article{yao2024tau2bench,
title={tau2-Bench: Evaluating Conversational Agents in a Dual-Control Environment},
author={Yao, Shunyu and others},
journal={arXiv preprint arXiv:2506.07982},
year={2024}
}
- Downloads last month
- 16