Qwen3-4B-Instruct-SFT

A Qwen3-4B model fine-tuned for multi-turn tool-calling in customer support domains.

Overview

This model was supervised fine-tuned on 219 expert trajectories generated by Qwen3-235B-A22B-Thinking for the tau2-bench evaluation framework. It demonstrates tool-first behavior with structured JSON tool calls.

Training Details

  • Base Model: Qwen/Qwen3-4B-Instruct
  • Training Framework: slime with Megatron backend
  • Dataset: 219 multi-turn trajectories across telecom, airline, and retail domains
  • Training: 3 epochs, batch size 8, learning rate 2e-5
  • Loss: SFT loss dropped from ~1.1 to ~0.18

Output Format

The model produces tool calls in inline JSON format:

<thinking>Analysis of the customer issue...</thinking>
{"name": "tool_name", "arguments": {"param": "value"}}

Intended Use

  • Multi-turn agentic tasks requiring tool orchestration
  • Customer support automation workflows
  • Research on tool-calling LLM agents

Limitations

  • Optimized for tau2-bench domains (telecom, airline, retail)
  • Requires structured tool schemas in system prompt
  • Not intended for general-purpose chat

Citation

If you use this model, please cite the tau2-bench paper:

@article{yao2024tau2bench,
  title={tau2-Bench: Evaluating Conversational Agents in a Dual-Control Environment},
  author={Yao, Shunyu and others},
  journal={arXiv preprint arXiv:2506.07982},
  year={2024}
}
Downloads last month
16
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jarrodbarnes/Qwen3-4B-Instruct-SFT

Finetunes
1 model

Dataset used to train Jarrodbarnes/Qwen3-4B-Instruct-SFT