Model Card: Galena-2B (Granite 3.3 Math & Physics)
Model Description
Galena-2B is a specialized 2-billion parameter language model optimized for mathematical reasoning and physics problem-solving. It is derived from IBM's Granite 3.3-2B Instruct base model through parameter-efficient fine-tuning (LoRA) on curated datasets focused on advanced calculations and physics concepts.
- Developed by: [Your Name/Organization]
- Base Model: IBM Granite 3.3-2B Instruct
- Model Type: Causal Language Model (Decoder-only Transformer)
- Language: English
- License: Apache 2.0
- Fine-tuned from: ibm-granite/granite-3.3-2b-instruct
Model Architecture
- Architecture: GraniteForCausalLM
- Parameters: 2.0B
- Layers: 40
- Hidden Size: 2048
- Attention Heads: 32 (query) / 8 (key-value, GQA)
- Intermediate Size: 8192
- Vocabulary Size: 49,159 tokens
- Context Window: 131,072 tokens (128k)
- Precision: bfloat16 (training & inference)
- Activation Function: SiLU (Swish)
Key Features
- Grouped Query Attention (GQA) for efficient inference
- RoPE Embeddings with extended context support (theta=10M)
- Attention & Logits Scaling for training stability
- Embedding Multiplier (12.0) and Residual Multiplier (0.22)
Intended Use
Primary Use Cases
- Educational Applications: Teaching and learning advanced mathematics and physics
- Research Tools: Assisting with physics problem formulation and mathematical reasoning
- Conversational AI: Domain-specific chatbots for STEM topics
- Tool-Augmented Reasoning: Integration with calculators and symbolic math engines
Out-of-Scope Use
- Critical Decision Making: Not suitable for medical, legal, or safety-critical applications
- General-Purpose Conversational AI: Optimized for math/physics; may underperform on general topics
- Production Systems: This is a research/educational model without production guarantees
- Factual Information Retrieval: May hallucinate; always verify outputs
Training Data
The model was fine-tuned on a carefully curated dataset of 26,000 instruction-response pairs blending two specialized datasets:
1. NVIDIA Nemotron-RL-Math (Advanced Calculations)
- Source:
nvidia/Nemotron-RL-math-advanced_calculations - Content: Complex mathematical problems with step-by-step reasoning traces
- Features: Tool-augmented reasoning, calculator integration, multi-step problem decomposition
- Format: Instruction-following with detailed solution traces
2. CAMEL-AI Physics Dataset
- Source:
camel-ai/physics - Content: Physics dialogue pairs covering diverse topics and subtopics
- Features: Conceptual explanations, problem-solving, physics principles
- Metadata: Topic and subtopic categorization for structured learning
Data Preparation
- Preprocessing:
scripts/prepare_math_physics.pyin parent GRANITE repository - Format Conversion: Unified into Granite's chat format (
<|user|>/<|assistant|>tags) - Output:
data/math_physics.jsonl(26k examples) - Token Length: Max sequence length capped at 512 tokens during training
Training Procedure
Training Hyperparameters
- Method: QLoRA (Quantized Low-Rank Adaptation)
- Base Model Precision: 4-bit quantization (NF4)
- LoRA Rank: Default (typically 8-16)
- LoRA Alpha: Default
- Target Modules: Query, Key, Value, Output projections
- Gradient Checkpointing: Enabled
- Mixed Precision: bfloat16
Training Configuration
{
"base_model": "ibm-granite/granite-3.3-2b-instruct",
"dataset_path": "data/math_physics.jsonl",
"output_dir": "outputs/granite-math-physics-lora",
"use_4bit": true,
"per_device_train_batch_size": 1,
"gradient_accumulation_steps": 4,
"effective_batch_size": 4,
"num_train_epochs": 1,
"max_steps": 500,
"max_seq_length": 512,
"learning_rate": "2e-4 (default)",
"batching_strategy": "padding",
"optimizer": "paged_adamw_8bit",
"bf16": true
}
Training Infrastructure
- Hardware: NVIDIA GeForce RTX 4060 (8GB VRAM)
- Software Stack:
- PyTorch 2.x
- Hugging Face Transformers 4.44+
- PEFT 0.11+
- bitsandbytes 0.43+
- CUDA 12.1
- Training Time: ~500 steps (1 epoch over 26k examples with batch size 4)
- Checkpointing: LoRA adapters saved every N steps
Post-Training
- Adapter Merging: LoRA adapters merged back into base weights using
scripts/merge_lora.py - GGUF Conversion: Exported to F16 GGUF format via
llama.cpp/convert_hf_to_gguf.py - Formats Produced:
- Hugging Face Transformers (safetensors)
- GGUF F16 (llama.cpp compatible)
Evaluation
Qualitative Assessment
The model demonstrates improved performance on:
- Multi-step mathematical reasoning
- Physics problem explanation
- Calculator-augmented computation tasks
- Domain-specific terminology and notation
Limitations
- Limited Training Steps: Only 500 training steps; longer training may improve performance
- Domain Specialization: May sacrifice general capabilities for math/physics expertise
- Hallucination Risk: Can generate plausible but incorrect solutions
- Tool Integration: Expects calculator tools in reasoning traces; standalone performance may vary
- Context Window: Fine-tuned on 512-token sequences; full 128k context not extensively tested
Bias, Risks, and Limitations
Known Limitations
- Domain Specificity: Optimized for math/physics; general knowledge may be limited
- Factual Accuracy: No guarantee of correctness; outputs should be verified
- Training Data Bias: Inherits biases from Nemotron and CAMEL-AI datasets
- Base Model Limitations: Retains all limitations of Granite 3.3-2B Instruct
- Small Training Set: 26k examples may not cover all edge cases
Ethical Considerations
- Educational Use: Should supplement, not replace, human instruction
- Verification Required: Always validate mathematical and scientific outputs
- Accessibility: May use technical jargon inaccessible to beginners
- Dataset Provenance: Users should review source dataset licenses and terms
Recommendations
- Use as an educational aid, not a source of truth
- Implement output validation for critical applications
- Combine with symbolic computation tools for verification
- Monitor for hallucinations and incorrect reasoning
- Consider fine-tuning on domain-specific data for production use
Environmental Impact
- Hardware: NVIDIA RTX 4060 (8GB VRAM)
- Training Duration: ~500 steps (estimated 1-2 hours)
- Energy Consumption: Estimated <1 kWh for training
- Carbon Footprint: Minimal due to efficient LoRA training
Technical Specifications
Model Formats
| Format | Precision | Size | Compatible Frameworks |
|---|---|---|---|
| Hugging Face Transformers | bfloat16 | ~5.0 GB | PyTorch, Transformers, vLLM, TGI |
| GGUF F16 | float16 | ~4.7 GB | llama.cpp, Ollama, LM Studio |
System Requirements
Minimum (CPU Inference):
- RAM: 8 GB
- Storage: 10 GB free space
- CPU: Modern x86-64 with AVX2 support
Recommended (GPU Inference):
- GPU: 6+ GB VRAM (RTX 3060, A4000, or better)
- RAM: 16 GB
- CUDA 12.1+ or ROCm 5.7+
Loading & Inference
Before running inference, pull the artifacts into models/math-physics/:
python scripts/download_artifacts.py --artifact all
Transformers (Python):
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"models/math-physics/hf",
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("models/math-physics/hf")
llama.cpp (Command Line):
./llama-cli -m granite-math-physics-f16.gguf -p "Your prompt" -n 256
Citation
@software{galena_2b_2024,
title = {Galena-2B: Granite 3.3 Math & Physics Model},
author = {Your Name},
year = {2024},
url = {https://github.com/yourusername/galena-2B},
note = {Fine-tuned from IBM Granite 3.3-2B Instruct on math and physics datasets}
}
Acknowledgments
- IBM Research for the Granite 3.3 foundation model
- NVIDIA for the Nemotron-RL-Math dataset
- CAMEL-AI for the physics dialogue dataset
- Hugging Face for training infrastructure and libraries
Contact
For questions, issues, or contributions:
- Repository: GitHub Issues
- Email: your.email@example.com
Changelog
Version 1.0 (2024-11-17)
- Initial release
- Fine-tuned on 26k math/physics examples
- 500 training steps with QLoRA
- Hugging Face and GGUF formats released