DeepSeek-R1-Style Reasoning Model (3B)

A fine-tuned Llama 3.2 3B model trained to mimic DeepSeek-R1's reasoning patterns using structured <think></think> and <answer></answer> tags.

Model Details

Base Model: Llama 3.2 3B
Training Method: Full-weight supervised fine-tuning
Training Data: 8,000 high-quality reasoning examples from DeepSeek R1 outputs
Training Hardware: NVIDIA H100 GPU
Training Time: ~3 epochs with max_length=10,000 tokens
License: Llama 3.2 Community License

Features

✅ Structured Reasoning

Uses <think></think> tags for internal reasoning process
Provides <answer></answer> tags for final responses
Shows step-by-step mathematical and logical reasoning

✅ Anti-Overthinking

Trained with repetition penalty to avoid reasoning loops
Includes stop conditions to prevent excessive meta-reasoning
Decisive conclusion patterns

✅ Mathematical Capabilities

Algebra and equation solving
Combinatorics and probability
Set theory and logical operations
Step-by-step derivations

Usage

Basic Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("./full_weight_out")
model = AutoModelForCausalLM.from_pretrained(
    "./full_weight_out",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Generate response
prompt = "User: Solve: If x+2=7, what is x? Show your reasoning.\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.3,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

With System Prompt (Recommended)

system_prompt = """You are DeepSeek-R1, an AI assistant created by DeepSeek. You are helpful, harmless, and honest.

## Reasoning and Response Format
When faced with questions that require reasoning, problem-solving, mathematical calculations, logical analysis, or complex thinking:

1. **Use structured reasoning**: Wrap your internal reasoning process in <think></think> tags
2. **Provide clear answers**: After reasoning, give your final response in <answer></answer> tags
3. **Think step-by-step**: Break down complex problems into logical steps
4. **Show your work**: Make your reasoning process transparent and verifiable

## Anti-Overthinking Guidelines
**CRITICAL: Avoid repetitive reasoning loops!**
- If you find multiple correct answers, state them clearly and choose the best one
- Don't repeat the same reasoning more than twice
- When you reach a conclusion, state it and move on

Remember: Use <think></think> and <answer></answer> tags whenever the question involves reasoning, analysis, or problem-solving to show your thought process clearly. Be decisive and avoid repetitive loops!
"""

full_prompt = f"System: {system_prompt}\nUser: {user_question}\nAssistant:"

Example Outputs

Mathematical Reasoning

Input: "If a-3b=-3, then the value of the algebraic expression 5-a+3b is?"

Output:

<think>
To solve this problem, we need to manipulate the given equation a - 3b = -3 to express a in terms of b. Then substitute this expression into the target expression 5 - a + 3b.

First, let's isolate a from the given equation:
a - 3b = -3
a = -3 + 3b

Now, substitute a = -3 + 3b into 5 - a + 3b:
5 - (-3 + 3b) + 3b
= 5 + 3 - 3b + 3b
= 8

Thus, regardless of the values of a and b, the result of 5 - a + 3b must always be 8.
</think>

<answer>
The value of the algebraic expression 5 - a + 3b is 8.
</answer>

Combinatorics Problem

Input: "There are 3 girls and 5 boys standing in a row. If all the girls stand together, how many different arrangements are there?"

Output:

<think>
To solve this, we need to consider the girls as a single unit since they all need to stand together. This effectively reduces the number of units to arrange from 8 (3 girls + 5 boys) to 6 (1 group of girls + 5 boys).

The number of ways to arrange these 6 units is 6!.
However, within the group of girls, the 3 girls can be arranged among themselves in 3! ways.

Therefore, the total number of arrangements where all girls stand together is 6! × 3!.

Calculating:
6! = 720
3! = 6
Total arrangements = 720 × 6 = 4320
</think>

<answer>
The number of different arrangements where all girls stand together is 4320.
</answer>

Training Details

Dataset

Size: 8,000 examples
Source: DeepSeek R1 reasoning outputs
Format: Prompt-completion pairs with structured reasoning
Max Length: 10,000 tokens (to preserve complete reasoning chains)

Training Configuration

Learning Rate: 2e-5
Batch Size: 32 (via gradient accumulation)
Optimizer: AdamW 8-bit (bitsandbytes)
Precision: BF16
Epochs: 3
Hardware: NVIDIA H100 GPU

Memory Optimizations

Gradient checkpointing enabled
Flash attention (xformers) when available
Packed examples to reduce padding waste
Conservative micro-batching (batch_size=1 per device)

Limitations

Known Issues

Letter counting: May struggle with character-level tasks (e.g., counting 'r's in "strawberry")
Complex factual knowledge: Limited by 3B parameter size
Very long reasoning: May occasionally produce incomplete responses despite training on long sequences

Model Size Constraints

3B parameters: Smaller than full-scale reasoning models
Context length: Trained on 10k tokens but may be less reliable at maximum length
Computational requirements: Requires GPU for reasonable inference speed

Ethical Considerations

Model may generate incorrect information with confident-sounding reasoning
Always verify mathematical calculations and factual claims
Reasoning traces show the model's "thinking" but may contain errors
Not suitable for high-stakes decision making without human oversight

License

This model is released under the Llama 3.2 Community License. Please see the original Llama 3.2 license for terms and conditions.

Acknowledgments

Meta AI for the Llama 3.2 base model
DeepSeek for pioneering structured reasoning approaches with DeepSeek-R1
a-m-team for the AM-DeepSeek-R1-Distilled-1.4M dataset used as the source for training data
Hugging Face for the transformers library and training infrastructure

Downloads last month: 4

Safetensors

Model size

3B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for x1nx3r/Llama-3.2-3B-thinking-8k-v1

Base model

meta-llama/Llama-3.2-3B

Finetuned

(363)

this model

Quantizations

1 model

x1nx3r
/

Llama-3.2-3B-thinking-8k-v1