galena-2b-math-physics / MODEL_CARD.md

xJoePec

Upload MODEL_CARD.md

ac75d74 verified about 2 months ago

preview code

raw

history blame contribute delete

9.14 kB

Model Card: Galena-2B (Granite 3.3 Math & Physics)

Model Description

Galena-2B is a specialized 2-billion parameter language model optimized for mathematical reasoning and physics problem-solving. It is derived from IBM's Granite 3.3-2B Instruct base model through parameter-efficient fine-tuning (LoRA) on curated datasets focused on advanced calculations and physics concepts.

Developed by: [Your Name/Organization]
Base Model: IBM Granite 3.3-2B Instruct
Model Type: Causal Language Model (Decoder-only Transformer)
Language: English
License: Apache 2.0
Fine-tuned from: ibm-granite/granite-3.3-2b-instruct

Model Architecture

Architecture: GraniteForCausalLM
Parameters: 2.0B
Layers: 40
Hidden Size: 2048
Attention Heads: 32 (query) / 8 (key-value, GQA)
Intermediate Size: 8192
Vocabulary Size: 49,159 tokens
Context Window: 131,072 tokens (128k)
Precision: bfloat16 (training & inference)
Activation Function: SiLU (Swish)

Key Features

Grouped Query Attention (GQA) for efficient inference
RoPE Embeddings with extended context support (theta=10M)
Attention & Logits Scaling for training stability
Embedding Multiplier (12.0) and Residual Multiplier (0.22)

Intended Use

Primary Use Cases

Educational Applications: Teaching and learning advanced mathematics and physics
Research Tools: Assisting with physics problem formulation and mathematical reasoning
Conversational AI: Domain-specific chatbots for STEM topics
Tool-Augmented Reasoning: Integration with calculators and symbolic math engines

Out-of-Scope Use

Critical Decision Making: Not suitable for medical, legal, or safety-critical applications
General-Purpose Conversational AI: Optimized for math/physics; may underperform on general topics
Production Systems: This is a research/educational model without production guarantees
Factual Information Retrieval: May hallucinate; always verify outputs

Training Data

The model was fine-tuned on a carefully curated dataset of 26,000 instruction-response pairs blending two specialized datasets:

1. NVIDIA Nemotron-RL-Math (Advanced Calculations)

Source: nvidia/Nemotron-RL-math-advanced_calculations
Content: Complex mathematical problems with step-by-step reasoning traces
Features: Tool-augmented reasoning, calculator integration, multi-step problem decomposition
Format: Instruction-following with detailed solution traces

2. CAMEL-AI Physics Dataset

Source: camel-ai/physics
Content: Physics dialogue pairs covering diverse topics and subtopics
Features: Conceptual explanations, problem-solving, physics principles
Metadata: Topic and subtopic categorization for structured learning

Data Preparation

Preprocessing: scripts/prepare_math_physics.py in parent GRANITE repository
Format Conversion: Unified into Granite's chat format (<|user|>/<|assistant|> tags)
Output: data/math_physics.jsonl (26k examples)
Token Length: Max sequence length capped at 512 tokens during training

Training Procedure

Training Hyperparameters

Method: QLoRA (Quantized Low-Rank Adaptation)
Base Model Precision: 4-bit quantization (NF4)
LoRA Rank: Default (typically 8-16)
LoRA Alpha: Default
Target Modules: Query, Key, Value, Output projections
Gradient Checkpointing: Enabled
Mixed Precision: bfloat16

Training Configuration

{
    "base_model": "ibm-granite/granite-3.3-2b-instruct",
    "dataset_path": "data/math_physics.jsonl",
    "output_dir": "outputs/granite-math-physics-lora",
    "use_4bit": true,
    "per_device_train_batch_size": 1,
    "gradient_accumulation_steps": 4,
    "effective_batch_size": 4,
    "num_train_epochs": 1,
    "max_steps": 500,
    "max_seq_length": 512,
    "learning_rate": "2e-4 (default)",
    "batching_strategy": "padding",
    "optimizer": "paged_adamw_8bit",
    "bf16": true
}

Training Infrastructure

Hardware: NVIDIA GeForce RTX 4060 (8GB VRAM)
Software Stack:
- PyTorch 2.x
- Hugging Face Transformers 4.44+
- PEFT 0.11+
- bitsandbytes 0.43+
- CUDA 12.1
Training Time: ~500 steps (1 epoch over 26k examples with batch size 4)
Checkpointing: LoRA adapters saved every N steps

Post-Training

Adapter Merging: LoRA adapters merged back into base weights using scripts/merge_lora.py
GGUF Conversion: Exported to F16 GGUF format via llama.cpp/convert_hf_to_gguf.py
Formats Produced:
- Hugging Face Transformers (safetensors)
- GGUF F16 (llama.cpp compatible)

Evaluation

Qualitative Assessment

The model demonstrates improved performance on:

Multi-step mathematical reasoning
Physics problem explanation
Calculator-augmented computation tasks
Domain-specific terminology and notation

Limitations

Limited Training Steps: Only 500 training steps; longer training may improve performance
Domain Specialization: May sacrifice general capabilities for math/physics expertise
Hallucination Risk: Can generate plausible but incorrect solutions
Tool Integration: Expects calculator tools in reasoning traces; standalone performance may vary
Context Window: Fine-tuned on 512-token sequences; full 128k context not extensively tested

Bias, Risks, and Limitations

Known Limitations

Domain Specificity: Optimized for math/physics; general knowledge may be limited
Factual Accuracy: No guarantee of correctness; outputs should be verified
Training Data Bias: Inherits biases from Nemotron and CAMEL-AI datasets
Base Model Limitations: Retains all limitations of Granite 3.3-2B Instruct
Small Training Set: 26k examples may not cover all edge cases

Ethical Considerations

Educational Use: Should supplement, not replace, human instruction
Verification Required: Always validate mathematical and scientific outputs
Accessibility: May use technical jargon inaccessible to beginners
Dataset Provenance: Users should review source dataset licenses and terms

Recommendations

Use as an educational aid, not a source of truth
Implement output validation for critical applications
Combine with symbolic computation tools for verification
Monitor for hallucinations and incorrect reasoning
Consider fine-tuning on domain-specific data for production use

Environmental Impact

Hardware: NVIDIA RTX 4060 (8GB VRAM)
Training Duration: ~500 steps (estimated 1-2 hours)
Energy Consumption: Estimated <1 kWh for training
Carbon Footprint: Minimal due to efficient LoRA training

Technical Specifications

Model Formats

Format	Precision	Size	Compatible Frameworks
Hugging Face Transformers	bfloat16	~5.0 GB	PyTorch, Transformers, vLLM, TGI
GGUF F16	float16	~4.7 GB	llama.cpp, Ollama, LM Studio

System Requirements

Minimum (CPU Inference):

RAM: 8 GB
Storage: 10 GB free space
CPU: Modern x86-64 with AVX2 support

Recommended (GPU Inference):

GPU: 6+ GB VRAM (RTX 3060, A4000, or better)
RAM: 16 GB
CUDA 12.1+ or ROCm 5.7+

Loading & Inference

Before running inference, pull the artifacts into models/math-physics/:

python scripts/download_artifacts.py --artifact all

Transformers (Python):

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "models/math-physics/hf",
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("models/math-physics/hf")

llama.cpp (Command Line):

./llama-cli -m granite-math-physics-f16.gguf -p "Your prompt" -n 256

Citation

@software{galena_2b_2024,
  title = {Galena-2B: Granite 3.3 Math & Physics Model},
  author = {Your Name},
  year = {2024},
  url = {https://github.com/yourusername/galena-2B},
  note = {Fine-tuned from IBM Granite 3.3-2B Instruct on math and physics datasets}
}

Acknowledgments

IBM Research for the Granite 3.3 foundation model
NVIDIA for the Nemotron-RL-Math dataset
CAMEL-AI for the physics dialogue dataset
Hugging Face for training infrastructure and libraries

Contact

For questions, issues, or contributions:

Repository: GitHub Issues
Email: your.email@example.com

Changelog

Version 1.0 (2024-11-17)

Initial release
Fine-tuned on 26k math/physics examples
500 training steps with QLoRA
Hugging Face and GGUF formats released