Pacific-Prime 1.5B

A 1.5B parameter language model with Mu-Guided Attention and Token-Routed MLP (Mixture of Experts)

Training Progress

Overview

Pacific-Prime is a 1.5B parameter causal language model featuring novel architectural innovations:

  • Mu-Guided Attention: Dynamic attention biasing via learned mu parameter
  • Token-Routed MLP: 4-expert mixture with mu-influenced routing
  • INL Dynamics: Velocity tracking for temporal coherence
  • Grouped Query Attention (GQA): Efficient KV caching with 8 KV heads

Model Variants

Variant Description Status
Base Pre-trained on FineWeb-Edu (1M steps) Available
SFT Instruction-tuned on Alpaca (52K) Available

SFT Model (Instruction-Tuned)

Status: Experimental - Fine-tuning in progress, not yet production-ready.

Instruction-tuned version fine-tuned on Alpaca dataset.

Fine-tuning Results

Fine-tuning Details

Attribute Value
Base Model Pacific-Prime 1.5B v0.13.0
Parameters ~1.52B
Fine-tuning SFT (Supervised Fine-Tuning)
Dataset Alpaca (52K examples)
Epochs 3
Format Alpaca (Instruction/Response)
Precision F32
Hardware RTX 5090 32GB

Prompt Format (Alpaca)

### Instruction:
Your question or task here.

### Response:

Usage

from complexity_deep import DeepForCausalLM, DeepConfig
from tokenizers import Tokenizer
import torch

# Load model
model = DeepForCausalLM.from_pretrained("Pacific-Prime/pacific-prime")
tokenizer = Tokenizer.from_file("tokenizer.json")

# Format prompt (Alpaca style)
prompt = """### Instruction:
What is the capital of France?

### Response:
"""

input_ids = torch.tensor([tokenizer.encode(prompt).ids])
output = model.generate(input_ids, max_new_tokens=100, temperature=0.8)
print(tokenizer.decode(output[0].tolist()))

CLI

mu-generate -m ./model -p "### Instruction:
Explain machine learning in simple terms.

### Response:"

SFT Limitations

  • Experimental: Early fine-tuning attempt
  • Small dataset: Only 52K examples (recommend 500K+ for 1.5B models)
  • English-focused: Alpaca is primarily English
  • Format-dependent: Works best with Alpaca prompt format

Next Steps

  • Continue fine-tuning with SlimOrca (~500K examples)
  • Add French instruction data
  • Evaluate on benchmarks

Architecture

Model Configuration

Parameter Value
Hidden Size 2048
Intermediate Size 5632
Layers 24
Attention Heads 16
KV Heads (GQA) 8
Max Position 2048
Vocab Size 32,000
Parameters ~1.5B

Innovations (v0.13.0)

  1. KQV Order - Industry standard like Qwen, Llama, GPT
  2. Mu-Guided KQV (INL 2025) - mu biases K, Q, AND V projections
  3. Mu-Guided Expert Routing - mu influences MLP expert selection
  4. Mu Residual Highway - accumulated context across layers
  5. Token-Routed MLP with mu override
  6. INL Dynamics with velocity tracking
  7. Grouped Query Attention (GQA)
  8. RoPE positional embeddings
  9. QK Normalization
  10. Flash Attention (SDPA)

INL Dynamics Parameters

{
  "dynamics_alpha": 0.9,
  "dynamics_beta": 0.1,
  "dynamics_gate": 0.5,
  "dynamics_dt": 0.1,
  "dynamics_controller_hidden": 64
}

Training

Pre-training

  • Dataset: FineWeb-Edu (educational web content)
  • Steps: 1,000,000
  • Batch Size: Variable (gradient accumulation)
  • Optimizer: AdamW
  • Hardware: H100 80GB

Training Loss

Generation Example

Generation

Benchmarks

Benchmarks


Files

File Description Size
model.safetensors Model weights (F32) ~6GB
config.json Architecture configuration 1KB
tokenizer.json BPE tokenizer (32K vocab) 2MB

Installation

pip install complexity-deep

Or from source:

git clone https://github.com/Complexity-ML/complexity-deep
cd complexity-deep
pip install -e .

Inference

With mu-inference (Recommended)

pip install mu-inference

# Generate
mu-generate -m Pacific-Prime/pacific-prime -p "### Instruction:\nExplain gravity.\n\n### Response:"

# Serve API
mu-serve -m Pacific-Prime/pacific-prime --port 8000

With complexity-deep

from complexity_deep import DeepForCausalLM
import torch

model = DeepForCausalLM.from_pretrained("Pacific-Prime/pacific-prime")
model.eval()

# Your generation code here

Links


License

CC-BY-NC-4.0 (Creative Commons Attribution-NonCommercial 4.0)

  • Academic research
  • Personal projects
  • Non-commercial use
  • Commercial use requires permission

Acknowledgments

  • FineWeb-Edu dataset by HuggingFace
  • Alpaca dataset by Stanford
  • INL Dynamics research (2025)

Citation

@misc{pacific-prime-2025,
  title={Pacific-Prime: A 1.5B Parameter Language Model with Mu-Guided Attention},
  author={Boris},
  year={2025},
  url={https://huggingface.co/Pacific-Prime/pacific-prime}
}
Downloads last month
732
Safetensors
Model size
2B params
Tensor type
I64
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 5 Ask for provider support