| --- |
| base_model: Qwen/Qwen3-4B |
| library_name: transformers |
| tags: |
| - generated_from_trainer |
| - open-r1 |
| - Text2SQL |
| - Reasoning |
| licence: apache-2.0 |
| language: |
| - en |
| --- |
| |
| # Model Information |
|
|
| This model is the reasoning model for the Text-to-SQL task introduced in [Think2SQL: Blueprinting Reward Density and Advantage Scaling for Effective Text-to-SQL Reasoning]() |
|
|
|
|
| This model is a fine-tuned version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) with thinking disabled on the [BIRD](https://bird-bench.github.io/) dataset. |
| It has been trained using [TRL](https://github.com/huggingface/trl). |
|
|
|
|
|
|
| ## Quick start |
|
|
| The best model performance is given with its System and User prompts. |
| The model is intended to be used with three inputs: question, evidence, and the database schema. |
|
|
|
|
| Required `transformers > 4.51.0` to have Qwen3. Make sure to update your transformers installation via `pip install --upgrade transformers`. |
|
|
| ```python |
| import transformers |
| import torch |
| model_id = "anonymous-2321/Think2SQL-4B" |
| pipeline = transformers.pipeline( |
| "text-generation", |
| model=model_id, |
| model_kwargs={"torch_dtype": torch.bfloat16}, |
| device_map="auto", |
| ) |
| |
| system_message =""" |
| You are a data science expert that provides well-reasoned and detailed responses. Your task is to understand the schema and generate a valid SQL query to answer the question. |
| You first think about the reasoning process as an internal monologue and then provide the user with the answer. |
| Respond in the following format: |
| <reasoning> |
| ... |
| </reasoning> |
| <answer> |
| ... |
| </answer> |
| """.strip() |
| |
| user_message = """ |
| Answer the following question with the SQL code. Use the piece of evidence and base your answer on the database schema. |
| Given the question, the evidence and the database schema, return in the <answer> tags only the SQL script that addresses the question. |
| |
| Database Engine: |
| SQLite |
| |
| Question: |
| Return the product name, sorted alphabetically and by price in descending order. |
| |
| |
| Evidence: |
| |
| |
| Database Schema: |
| CREATE TABLE products ( |
| id INTEGER PRIMARY KEY, |
| name TEXT NOT NULL, |
| price REAL NOT NULL |
| ); |
| |
| CREATE TABLE customers ( |
| id INTEGER PRIMARY KEY, |
| name TEXT NOT NULL, |
| email TEXT NOT NULL |
| ); |
| """ |
| |
| |
| messages = [ |
| {"role": "system", "content": system_message}, |
| {"role": "user", "content": user_message}, |
| ] |
| |
| outputs = pipeline( |
| messages, |
| max_new_tokens=4096, |
| temperature=0.6, |
| top_p=0.95, |
| top_k=20 |
| ) |
| print(outputs[0]["generated_text"][-1]) |
| ``` |
|
|
| ## 📖 Overview |
| Think2SQL is a systematic study on injecting reasoning capabilities into Text-to-SQL through Reinforcement Learning with Verifiable Rewards (RLVR). We uncover the critical interplay between reward density, advantage scaling, and model capacity, proposing novel execution-guided dense rewards and optimal scaling strategies. Our 4B-parameter model achieves reasoning capabilities competitive with state-of-the-art models, while providing a comprehensive analysis for optimizing Text-to-SQL reasoning under computational constraints. |
|
|
| **Key Contributions:** |
| - Execution-guided dense reward function that outperforms binary signals |
| - Analysis of advantage scaling mechanics for models of different sizes |
| - Evaluation of cold start effects and supervised fine-tuning impact |
| - Pareto frontier mapping for training efficiency optimization |