DeepSeek-R1-Style Reasoning Model (3B)

A fine-tuned Llama 3.2 3B model trained to mimic DeepSeek-R1's reasoning patterns using structured <think></think> and <answer></answer> tags.

Model Details

  • Base Model: Llama 3.2 3B
  • Training Method: Full-weight supervised fine-tuning
  • Training Data: 8,000 high-quality reasoning examples from DeepSeek R1 outputs
  • Training Hardware: NVIDIA H100 GPU
  • Training Time: ~3 epochs with max_length=10,000 tokens
  • License: Llama 3.2 Community License

Features

βœ… Structured Reasoning

  • Uses <think></think> tags for internal reasoning process
  • Provides <answer></answer> tags for final responses
  • Shows step-by-step mathematical and logical reasoning

βœ… Anti-Overthinking

  • Trained with repetition penalty to avoid reasoning loops
  • Includes stop conditions to prevent excessive meta-reasoning
  • Decisive conclusion patterns

βœ… Mathematical Capabilities

  • Algebra and equation solving
  • Combinatorics and probability
  • Set theory and logical operations
  • Step-by-step derivations

Usage

Basic Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("./full_weight_out")
model = AutoModelForCausalLM.from_pretrained(
    "./full_weight_out",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Generate response
prompt = "User: Solve: If x+2=7, what is x? Show your reasoning.\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.3,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

With System Prompt (Recommended)

system_prompt = """You are DeepSeek-R1, an AI assistant created by DeepSeek. You are helpful, harmless, and honest.

## Reasoning and Response Format
When faced with questions that require reasoning, problem-solving, mathematical calculations, logical analysis, or complex thinking:

1. **Use structured reasoning**: Wrap your internal reasoning process in <think></think> tags
2. **Provide clear answers**: After reasoning, give your final response in <answer></answer> tags
3. **Think step-by-step**: Break down complex problems into logical steps
4. **Show your work**: Make your reasoning process transparent and verifiable

## Anti-Overthinking Guidelines
**CRITICAL: Avoid repetitive reasoning loops!**
- If you find multiple correct answers, state them clearly and choose the best one
- Don't repeat the same reasoning more than twice
- When you reach a conclusion, state it and move on

Remember: Use <think></think> and <answer></answer> tags whenever the question involves reasoning, analysis, or problem-solving to show your thought process clearly. Be decisive and avoid repetitive loops!
"""

full_prompt = f"System: {system_prompt}\nUser: {user_question}\nAssistant:"

Example Outputs

Mathematical Reasoning

Input: "If a-3b=-3, then the value of the algebraic expression 5-a+3b is?"

Output:

<think>
To solve this problem, we need to manipulate the given equation a - 3b = -3 to express a in terms of b. Then substitute this expression into the target expression 5 - a + 3b.

First, let's isolate a from the given equation:
a - 3b = -3
a = -3 + 3b

Now, substitute a = -3 + 3b into 5 - a + 3b:
5 - (-3 + 3b) + 3b
= 5 + 3 - 3b + 3b
= 8

Thus, regardless of the values of a and b, the result of 5 - a + 3b must always be 8.
</think>

<answer>
The value of the algebraic expression 5 - a + 3b is 8.
</answer>

Combinatorics Problem

Input: "There are 3 girls and 5 boys standing in a row. If all the girls stand together, how many different arrangements are there?"

Output:

<think>
To solve this, we need to consider the girls as a single unit since they all need to stand together. This effectively reduces the number of units to arrange from 8 (3 girls + 5 boys) to 6 (1 group of girls + 5 boys).

The number of ways to arrange these 6 units is 6!.
However, within the group of girls, the 3 girls can be arranged among themselves in 3! ways.

Therefore, the total number of arrangements where all girls stand together is 6! Γ— 3!.

Calculating:
6! = 720
3! = 6
Total arrangements = 720 Γ— 6 = 4320
</think>

<answer>
The number of different arrangements where all girls stand together is 4320.
</answer>

Training Details

Dataset

  • Size: 8,000 examples
  • Source: DeepSeek R1 reasoning outputs
  • Format: Prompt-completion pairs with structured reasoning
  • Max Length: 10,000 tokens (to preserve complete reasoning chains)

Training Configuration

  • Learning Rate: 2e-5
  • Batch Size: 32 (via gradient accumulation)
  • Optimizer: AdamW 8-bit (bitsandbytes)
  • Precision: BF16
  • Epochs: 3
  • Hardware: NVIDIA H100 GPU

Memory Optimizations

  • Gradient checkpointing enabled
  • Flash attention (xformers) when available
  • Packed examples to reduce padding waste
  • Conservative micro-batching (batch_size=1 per device)

Limitations

Known Issues

  • Letter counting: May struggle with character-level tasks (e.g., counting 'r's in "strawberry")
  • Complex factual knowledge: Limited by 3B parameter size
  • Very long reasoning: May occasionally produce incomplete responses despite training on long sequences

Model Size Constraints

  • 3B parameters: Smaller than full-scale reasoning models
  • Context length: Trained on 10k tokens but may be less reliable at maximum length
  • Computational requirements: Requires GPU for reasonable inference speed

Ethical Considerations

  • Model may generate incorrect information with confident-sounding reasoning
  • Always verify mathematical calculations and factual claims
  • Reasoning traces show the model's "thinking" but may contain errors
  • Not suitable for high-stakes decision making without human oversight

License

This model is released under the Llama 3.2 Community License. Please see the original Llama 3.2 license for terms and conditions.

Acknowledgments

  • Meta AI for the Llama 3.2 base model
  • DeepSeek for pioneering structured reasoning approaches with DeepSeek-R1
  • a-m-team for the AM-DeepSeek-R1-Distilled-1.4M dataset used as the source for training data
  • Hugging Face for the transformers library and training infrastructure
Downloads last month
4
Safetensors
Model size
3B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for x1nx3r/Llama-3.2-3B-thinking-8k-v1

Finetuned
(363)
this model
Quantizations
1 model

Dataset used to train x1nx3r/Llama-3.2-3B-thinking-8k-v1