Qwen3-4B-Instruct-SFT

A Qwen3-4B model fine-tuned for multi-turn tool-calling in customer support domains.

Overview

This model was supervised fine-tuned on 219 expert trajectories generated by Qwen3-235B-A22B-Thinking for the tau2-bench evaluation framework. It demonstrates tool-first behavior with structured JSON tool calls.

Training Details

Base Model: Qwen/Qwen3-4B-Instruct
Training Framework: slime with Megatron backend
Dataset: 219 multi-turn trajectories across telecom, airline, and retail domains
Training: 3 epochs, batch size 8, learning rate 2e-5
Loss: SFT loss dropped from ~1.1 to ~0.18

Output Format

The model produces tool calls in inline JSON format:

<thinking>Analysis of the customer issue...</thinking>
{"name": "tool_name", "arguments": {"param": "value"}}

Intended Use

Multi-turn agentic tasks requiring tool orchestration
Customer support automation workflows
Research on tool-calling LLM agents

Limitations

Optimized for tau2-bench domains (telecom, airline, retail)
Requires structured tool schemas in system prompt
Not intended for general-purpose chat

Citation

If you use this model, please cite the tau2-bench paper:

@article{yao2024tau2bench,
  title={tau2-Bench: Evaluating Conversational Agents in a Dual-Control Environment},
  author={Yao, Shunyu and others},
  journal={arXiv preprint arXiv:2506.07982},
  year={2024}
}

Downloads last month: 16

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for Jarrodbarnes/Qwen3-4B-Instruct-SFT

Finetunes

1 model

Jarrodbarnes
/

Qwen3-4B-Instruct-SFT