galena-2b-math-physics / MODEL_CARD.md
xJoePec's picture
Upload MODEL_CARD.md
ac75d74 verified

Model Card: Galena-2B (Granite 3.3 Math & Physics)

Model Description

Galena-2B is a specialized 2-billion parameter language model optimized for mathematical reasoning and physics problem-solving. It is derived from IBM's Granite 3.3-2B Instruct base model through parameter-efficient fine-tuning (LoRA) on curated datasets focused on advanced calculations and physics concepts.

  • Developed by: [Your Name/Organization]
  • Base Model: IBM Granite 3.3-2B Instruct
  • Model Type: Causal Language Model (Decoder-only Transformer)
  • Language: English
  • License: Apache 2.0
  • Fine-tuned from: ibm-granite/granite-3.3-2b-instruct

Model Architecture

  • Architecture: GraniteForCausalLM
  • Parameters: 2.0B
  • Layers: 40
  • Hidden Size: 2048
  • Attention Heads: 32 (query) / 8 (key-value, GQA)
  • Intermediate Size: 8192
  • Vocabulary Size: 49,159 tokens
  • Context Window: 131,072 tokens (128k)
  • Precision: bfloat16 (training & inference)
  • Activation Function: SiLU (Swish)

Key Features

  • Grouped Query Attention (GQA) for efficient inference
  • RoPE Embeddings with extended context support (theta=10M)
  • Attention & Logits Scaling for training stability
  • Embedding Multiplier (12.0) and Residual Multiplier (0.22)

Intended Use

Primary Use Cases

  • Educational Applications: Teaching and learning advanced mathematics and physics
  • Research Tools: Assisting with physics problem formulation and mathematical reasoning
  • Conversational AI: Domain-specific chatbots for STEM topics
  • Tool-Augmented Reasoning: Integration with calculators and symbolic math engines

Out-of-Scope Use

  • Critical Decision Making: Not suitable for medical, legal, or safety-critical applications
  • General-Purpose Conversational AI: Optimized for math/physics; may underperform on general topics
  • Production Systems: This is a research/educational model without production guarantees
  • Factual Information Retrieval: May hallucinate; always verify outputs

Training Data

The model was fine-tuned on a carefully curated dataset of 26,000 instruction-response pairs blending two specialized datasets:

1. NVIDIA Nemotron-RL-Math (Advanced Calculations)

  • Source: nvidia/Nemotron-RL-math-advanced_calculations
  • Content: Complex mathematical problems with step-by-step reasoning traces
  • Features: Tool-augmented reasoning, calculator integration, multi-step problem decomposition
  • Format: Instruction-following with detailed solution traces

2. CAMEL-AI Physics Dataset

  • Source: camel-ai/physics
  • Content: Physics dialogue pairs covering diverse topics and subtopics
  • Features: Conceptual explanations, problem-solving, physics principles
  • Metadata: Topic and subtopic categorization for structured learning

Data Preparation

  • Preprocessing: scripts/prepare_math_physics.py in parent GRANITE repository
  • Format Conversion: Unified into Granite's chat format (<|user|>/<|assistant|> tags)
  • Output: data/math_physics.jsonl (26k examples)
  • Token Length: Max sequence length capped at 512 tokens during training

Training Procedure

Training Hyperparameters

  • Method: QLoRA (Quantized Low-Rank Adaptation)
  • Base Model Precision: 4-bit quantization (NF4)
  • LoRA Rank: Default (typically 8-16)
  • LoRA Alpha: Default
  • Target Modules: Query, Key, Value, Output projections
  • Gradient Checkpointing: Enabled
  • Mixed Precision: bfloat16

Training Configuration

{
    "base_model": "ibm-granite/granite-3.3-2b-instruct",
    "dataset_path": "data/math_physics.jsonl",
    "output_dir": "outputs/granite-math-physics-lora",
    "use_4bit": true,
    "per_device_train_batch_size": 1,
    "gradient_accumulation_steps": 4,
    "effective_batch_size": 4,
    "num_train_epochs": 1,
    "max_steps": 500,
    "max_seq_length": 512,
    "learning_rate": "2e-4 (default)",
    "batching_strategy": "padding",
    "optimizer": "paged_adamw_8bit",
    "bf16": true
}

Training Infrastructure

  • Hardware: NVIDIA GeForce RTX 4060 (8GB VRAM)
  • Software Stack:
    • PyTorch 2.x
    • Hugging Face Transformers 4.44+
    • PEFT 0.11+
    • bitsandbytes 0.43+
    • CUDA 12.1
  • Training Time: ~500 steps (1 epoch over 26k examples with batch size 4)
  • Checkpointing: LoRA adapters saved every N steps

Post-Training

  1. Adapter Merging: LoRA adapters merged back into base weights using scripts/merge_lora.py
  2. GGUF Conversion: Exported to F16 GGUF format via llama.cpp/convert_hf_to_gguf.py
  3. Formats Produced:
    • Hugging Face Transformers (safetensors)
    • GGUF F16 (llama.cpp compatible)

Evaluation

Qualitative Assessment

The model demonstrates improved performance on:

  • Multi-step mathematical reasoning
  • Physics problem explanation
  • Calculator-augmented computation tasks
  • Domain-specific terminology and notation

Limitations

  • Limited Training Steps: Only 500 training steps; longer training may improve performance
  • Domain Specialization: May sacrifice general capabilities for math/physics expertise
  • Hallucination Risk: Can generate plausible but incorrect solutions
  • Tool Integration: Expects calculator tools in reasoning traces; standalone performance may vary
  • Context Window: Fine-tuned on 512-token sequences; full 128k context not extensively tested

Bias, Risks, and Limitations

Known Limitations

  1. Domain Specificity: Optimized for math/physics; general knowledge may be limited
  2. Factual Accuracy: No guarantee of correctness; outputs should be verified
  3. Training Data Bias: Inherits biases from Nemotron and CAMEL-AI datasets
  4. Base Model Limitations: Retains all limitations of Granite 3.3-2B Instruct
  5. Small Training Set: 26k examples may not cover all edge cases

Ethical Considerations

  • Educational Use: Should supplement, not replace, human instruction
  • Verification Required: Always validate mathematical and scientific outputs
  • Accessibility: May use technical jargon inaccessible to beginners
  • Dataset Provenance: Users should review source dataset licenses and terms

Recommendations

  • Use as an educational aid, not a source of truth
  • Implement output validation for critical applications
  • Combine with symbolic computation tools for verification
  • Monitor for hallucinations and incorrect reasoning
  • Consider fine-tuning on domain-specific data for production use

Environmental Impact

  • Hardware: NVIDIA RTX 4060 (8GB VRAM)
  • Training Duration: ~500 steps (estimated 1-2 hours)
  • Energy Consumption: Estimated <1 kWh for training
  • Carbon Footprint: Minimal due to efficient LoRA training

Technical Specifications

Model Formats

Format Precision Size Compatible Frameworks
Hugging Face Transformers bfloat16 ~5.0 GB PyTorch, Transformers, vLLM, TGI
GGUF F16 float16 ~4.7 GB llama.cpp, Ollama, LM Studio

System Requirements

Minimum (CPU Inference):

  • RAM: 8 GB
  • Storage: 10 GB free space
  • CPU: Modern x86-64 with AVX2 support

Recommended (GPU Inference):

  • GPU: 6+ GB VRAM (RTX 3060, A4000, or better)
  • RAM: 16 GB
  • CUDA 12.1+ or ROCm 5.7+

Loading & Inference

Before running inference, pull the artifacts into models/math-physics/:

python scripts/download_artifacts.py --artifact all

Transformers (Python):

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "models/math-physics/hf",
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("models/math-physics/hf")

llama.cpp (Command Line):

./llama-cli -m granite-math-physics-f16.gguf -p "Your prompt" -n 256

Citation

@software{galena_2b_2024,
  title = {Galena-2B: Granite 3.3 Math & Physics Model},
  author = {Your Name},
  year = {2024},
  url = {https://github.com/yourusername/galena-2B},
  note = {Fine-tuned from IBM Granite 3.3-2B Instruct on math and physics datasets}
}

Acknowledgments

  • IBM Research for the Granite 3.3 foundation model
  • NVIDIA for the Nemotron-RL-Math dataset
  • CAMEL-AI for the physics dialogue dataset
  • Hugging Face for training infrastructure and libraries

Contact

For questions, issues, or contributions:

Changelog

Version 1.0 (2024-11-17)

  • Initial release
  • Fine-tuned on 26k math/physics examples
  • 500 training steps with QLoRA
  • Hugging Face and GGUF formats released