Pacific-Prime 1.5B
A 1.5B parameter language model with Mu-Guided Attention and Token-Routed MLP (Mixture of Experts)
Overview
Pacific-Prime is a 1.5B parameter causal language model featuring novel architectural innovations:
- Mu-Guided Attention: Dynamic attention biasing via learned mu parameter
- Token-Routed MLP: 4-expert mixture with mu-influenced routing
- INL Dynamics: Velocity tracking for temporal coherence
- Grouped Query Attention (GQA): Efficient KV caching with 8 KV heads
Model Variants
| Variant | Description | Status |
|---|---|---|
| Base | Pre-trained on FineWeb-Edu (1M steps) | Available |
| SFT | Instruction-tuned on Alpaca (52K) | Available |
SFT Model (Instruction-Tuned)
Status: Experimental - Fine-tuning in progress, not yet production-ready.
Instruction-tuned version fine-tuned on Alpaca dataset.
Fine-tuning Details
| Attribute | Value |
|---|---|
| Base Model | Pacific-Prime 1.5B v0.13.0 |
| Parameters | ~1.52B |
| Fine-tuning | SFT (Supervised Fine-Tuning) |
| Dataset | Alpaca (52K examples) |
| Epochs | 3 |
| Format | Alpaca (Instruction/Response) |
| Precision | F32 |
| Hardware | RTX 5090 32GB |
Prompt Format (Alpaca)
### Instruction:
Your question or task here.
### Response:
Usage
from complexity_deep import DeepForCausalLM, DeepConfig
from tokenizers import Tokenizer
import torch
# Load model
model = DeepForCausalLM.from_pretrained("Pacific-Prime/pacific-prime")
tokenizer = Tokenizer.from_file("tokenizer.json")
# Format prompt (Alpaca style)
prompt = """### Instruction:
What is the capital of France?
### Response:
"""
input_ids = torch.tensor([tokenizer.encode(prompt).ids])
output = model.generate(input_ids, max_new_tokens=100, temperature=0.8)
print(tokenizer.decode(output[0].tolist()))
CLI
mu-generate -m ./model -p "### Instruction:
Explain machine learning in simple terms.
### Response:"
SFT Limitations
- Experimental: Early fine-tuning attempt
- Small dataset: Only 52K examples (recommend 500K+ for 1.5B models)
- English-focused: Alpaca is primarily English
- Format-dependent: Works best with Alpaca prompt format
Next Steps
- Continue fine-tuning with SlimOrca (~500K examples)
- Add French instruction data
- Evaluate on benchmarks
Architecture
Model Configuration
| Parameter | Value |
|---|---|
| Hidden Size | 2048 |
| Intermediate Size | 5632 |
| Layers | 24 |
| Attention Heads | 16 |
| KV Heads (GQA) | 8 |
| Max Position | 2048 |
| Vocab Size | 32,000 |
| Parameters | ~1.5B |
Innovations (v0.13.0)
- KQV Order - Industry standard like Qwen, Llama, GPT
- Mu-Guided KQV (INL 2025) - mu biases K, Q, AND V projections
- Mu-Guided Expert Routing - mu influences MLP expert selection
- Mu Residual Highway - accumulated context across layers
- Token-Routed MLP with mu override
- INL Dynamics with velocity tracking
- Grouped Query Attention (GQA)
- RoPE positional embeddings
- QK Normalization
- Flash Attention (SDPA)
INL Dynamics Parameters
{
"dynamics_alpha": 0.9,
"dynamics_beta": 0.1,
"dynamics_gate": 0.5,
"dynamics_dt": 0.1,
"dynamics_controller_hidden": 64
}
Training
Pre-training
- Dataset: FineWeb-Edu (educational web content)
- Steps: 1,000,000
- Batch Size: Variable (gradient accumulation)
- Optimizer: AdamW
- Hardware: H100 80GB
Generation Example
Benchmarks
Files
| File | Description | Size |
|---|---|---|
model.safetensors |
Model weights (F32) | ~6GB |
config.json |
Architecture configuration | 1KB |
tokenizer.json |
BPE tokenizer (32K vocab) | 2MB |
Installation
pip install complexity-deep
Or from source:
git clone https://github.com/Complexity-ML/complexity-deep
cd complexity-deep
pip install -e .
Inference
With mu-inference (Recommended)
pip install mu-inference
# Generate
mu-generate -m Pacific-Prime/pacific-prime -p "### Instruction:\nExplain gravity.\n\n### Response:"
# Serve API
mu-serve -m Pacific-Prime/pacific-prime --port 8000
With complexity-deep
from complexity_deep import DeepForCausalLM
import torch
model = DeepForCausalLM.from_pretrained("Pacific-Prime/pacific-prime")
model.eval()
# Your generation code here
Links
License
CC-BY-NC-4.0 (Creative Commons Attribution-NonCommercial 4.0)
- Academic research
- Personal projects
- Non-commercial use
- Commercial use requires permission
Acknowledgments
- FineWeb-Edu dataset by HuggingFace
- Alpaca dataset by Stanford
- INL Dynamics research (2025)
Citation
@misc{pacific-prime-2025,
title={Pacific-Prime: A 1.5B Parameter Language Model with Mu-Guided Attention},
author={Boris},
year={2025},
url={https://huggingface.co/Pacific-Prime/pacific-prime}
}
- Downloads last month
- 732



