DeepSeek-R1-Style Reasoning Model (3B)
A fine-tuned Llama 3.2 3B model trained to mimic DeepSeek-R1's reasoning patterns using structured <think></think> and <answer></answer> tags.
Model Details
- Base Model: Llama 3.2 3B
- Training Method: Full-weight supervised fine-tuning
- Training Data: 8,000 high-quality reasoning examples from DeepSeek R1 outputs
- Training Hardware: NVIDIA H100 GPU
- Training Time: ~3 epochs with max_length=10,000 tokens
- License: Llama 3.2 Community License
Features
β Structured Reasoning
- Uses
<think></think>tags for internal reasoning process - Provides
<answer></answer>tags for final responses - Shows step-by-step mathematical and logical reasoning
β Anti-Overthinking
- Trained with repetition penalty to avoid reasoning loops
- Includes stop conditions to prevent excessive meta-reasoning
- Decisive conclusion patterns
β Mathematical Capabilities
- Algebra and equation solving
- Combinatorics and probability
- Set theory and logical operations
- Step-by-step derivations
Usage
Basic Inference
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("./full_weight_out")
model = AutoModelForCausalLM.from_pretrained(
"./full_weight_out",
torch_dtype=torch.float16,
device_map="auto"
)
# Generate response
prompt = "User: Solve: If x+2=7, what is x? Show your reasoning.\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.3,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
With System Prompt (Recommended)
system_prompt = """You are DeepSeek-R1, an AI assistant created by DeepSeek. You are helpful, harmless, and honest.
## Reasoning and Response Format
When faced with questions that require reasoning, problem-solving, mathematical calculations, logical analysis, or complex thinking:
1. **Use structured reasoning**: Wrap your internal reasoning process in <think></think> tags
2. **Provide clear answers**: After reasoning, give your final response in <answer></answer> tags
3. **Think step-by-step**: Break down complex problems into logical steps
4. **Show your work**: Make your reasoning process transparent and verifiable
## Anti-Overthinking Guidelines
**CRITICAL: Avoid repetitive reasoning loops!**
- If you find multiple correct answers, state them clearly and choose the best one
- Don't repeat the same reasoning more than twice
- When you reach a conclusion, state it and move on
Remember: Use <think></think> and <answer></answer> tags whenever the question involves reasoning, analysis, or problem-solving to show your thought process clearly. Be decisive and avoid repetitive loops!
"""
full_prompt = f"System: {system_prompt}\nUser: {user_question}\nAssistant:"
Example Outputs
Mathematical Reasoning
Input: "If a-3b=-3, then the value of the algebraic expression 5-a+3b is?"
Output:
<think>
To solve this problem, we need to manipulate the given equation a - 3b = -3 to express a in terms of b. Then substitute this expression into the target expression 5 - a + 3b.
First, let's isolate a from the given equation:
a - 3b = -3
a = -3 + 3b
Now, substitute a = -3 + 3b into 5 - a + 3b:
5 - (-3 + 3b) + 3b
= 5 + 3 - 3b + 3b
= 8
Thus, regardless of the values of a and b, the result of 5 - a + 3b must always be 8.
</think>
<answer>
The value of the algebraic expression 5 - a + 3b is 8.
</answer>
Combinatorics Problem
Input: "There are 3 girls and 5 boys standing in a row. If all the girls stand together, how many different arrangements are there?"
Output:
<think>
To solve this, we need to consider the girls as a single unit since they all need to stand together. This effectively reduces the number of units to arrange from 8 (3 girls + 5 boys) to 6 (1 group of girls + 5 boys).
The number of ways to arrange these 6 units is 6!.
However, within the group of girls, the 3 girls can be arranged among themselves in 3! ways.
Therefore, the total number of arrangements where all girls stand together is 6! Γ 3!.
Calculating:
6! = 720
3! = 6
Total arrangements = 720 Γ 6 = 4320
</think>
<answer>
The number of different arrangements where all girls stand together is 4320.
</answer>
Training Details
Dataset
- Size: 8,000 examples
- Source: DeepSeek R1 reasoning outputs
- Format: Prompt-completion pairs with structured reasoning
- Max Length: 10,000 tokens (to preserve complete reasoning chains)
Training Configuration
- Learning Rate: 2e-5
- Batch Size: 32 (via gradient accumulation)
- Optimizer: AdamW 8-bit (bitsandbytes)
- Precision: BF16
- Epochs: 3
- Hardware: NVIDIA H100 GPU
Memory Optimizations
- Gradient checkpointing enabled
- Flash attention (xformers) when available
- Packed examples to reduce padding waste
- Conservative micro-batching (batch_size=1 per device)
Limitations
Known Issues
- Letter counting: May struggle with character-level tasks (e.g., counting 'r's in "strawberry")
- Complex factual knowledge: Limited by 3B parameter size
- Very long reasoning: May occasionally produce incomplete responses despite training on long sequences
Model Size Constraints
- 3B parameters: Smaller than full-scale reasoning models
- Context length: Trained on 10k tokens but may be less reliable at maximum length
- Computational requirements: Requires GPU for reasonable inference speed
Ethical Considerations
- Model may generate incorrect information with confident-sounding reasoning
- Always verify mathematical calculations and factual claims
- Reasoning traces show the model's "thinking" but may contain errors
- Not suitable for high-stakes decision making without human oversight
License
This model is released under the Llama 3.2 Community License. Please see the original Llama 3.2 license for terms and conditions.
Acknowledgments
- Meta AI for the Llama 3.2 base model
- DeepSeek for pioneering structured reasoning approaches with DeepSeek-R1
- a-m-team for the AM-DeepSeek-R1-Distilled-1.4M dataset used as the source for training data
- Hugging Face for the transformers library and training infrastructure
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support