RippleGPT: Context-Aware Code Completion via Decay-Biased Attention 🌊

RippleGPT is a modern Transformer architecture optimized for code completion tasks. It replaces learned positional embeddings with a Decay-Biased Attention Mechanism (Ripple Field / ALiBi-style) and utilizes Multiplicative Gating (SwiGLU) for improved signal flow.

🎯 What RippleGPT IS (and is NOT)

✅ Is	❌ Is NOT
Context-aware code completion engine	Long-context Q&A assistant
Excellent at structural understanding (indentation, scope, flow)	Good at factual recall from distant context
Extrapolation-native (train 512 → infer 2048+)	Memory-efficient (uses O(T²) attention)
Sample-efficient (18% fewer params than GPT)	Infinite-memory chatbot

🧪 The Core Innovation

Standard Transformers fail when context exceeds training length. RippleGPT thrives on longer contexts:

Context Window	Ratio	Loss	Perplexity	vs Training
512 (Training)	1.0x	0.83	2.29	Baseline
1024	2.0x	0.73	2.08	-9.1% ✅
2048	4.0x	0.70	2.00	-12.5% ✅

Key Finding: The model performs better at 4x training context. This is contextual synergy, not just "stable extrapolation".

The Trade-Off: Structural vs Factual Memory

The Ripple Field creates a "memory horizon" of ~25-35 lines. Beyond this, factual recall fails:

Task	Example	Performance
Structural	"What's the next line of code?"	✅ Excellent
Factual	"What password was defined 50 lines ago?"	❌ Fails

This is ideal for code completion (local context matters most) but unsuitable for document Q&A.

⚠️ Technical Note: Memory Complexity

┌───────────────────────────────────────────────────────────────────────┐
│  RFC-001 OPTIMIZATIONS: Memory-Aware Ripple Attention         │
├───────────────────────────────────────────────────────────────────────┤
│  Phase 1 (SDPA): 83% memory reduction via fused operations    │
│  Phase 2 (Sliding Window): O(T×w) → 10,000+ token contexts!  │
│                                                               │
│  Benchmarks (window=512):                                     │
│  • T=2000: 153ms → 74ms (2.1x faster)                         │
│  • T=5000: 648ms → 210ms (3.1x faster)                        │
│  • T=10000: OOM → 324ms (∞ gain!)                             │
│                                                               │
│  ✅ ADVANTAGE: Length extrapolation, fast convergence          │
│  ✅ NEW: Sliding window for infinite context                   │
└───────────────────────────────────────────────────────────────────────┘

📊 Performance Summary

Training: 17M param model trained on 50MB code dataset for 10K iterations

Best validation loss: 0.72 (from random initialization at 7.88)
Training time: ~2 hours on Apple M-Series

Extrapolation: Trained on 512 tokens, tested up to 2048

Perplexity improves with longer context (-12.5% at 4x)

Needle Test: Factual recall accuracy by distance

15 lines: 67% accurate | 35+ lines: 0% accurate

🚀 Quick Start

import torch
from src.model import RippleGPT, RippleConfig

# 1. Initialize (Full attention for short contexts)
config = RippleConfig(vocab_size=2260, block_size=512, n_layer=8, n_head=8, n_embd=512)
model = RippleGPT(config)

# 2. OR: Enable Sliding Window for 10k+ token contexts
config = RippleConfig(
    vocab_size=2260, block_size=512, n_layer=8, n_head=8, n_embd=512,
    attention_window=512  # Enables O(T×512) memory!
)
model = RippleGPT(config)

# 3. Inference (Works on lengths > 512!)
idx = torch.zeros((1, 1), dtype=torch.long)
generated = model.generate(idx, max_new_tokens=500)

🔬 Scientific Validation

# 1. Prepare code dataset
python validation/memory/prepare_large_data.py --size 50

# 2. Train model (block_size=512)
python validation/memory/train_large.py --config medium

# 3. Test extrapolation (definitive ALiBi validation)
python validation/memory/extrapolation_test.py --config medium --max-context 2048

# 4. Test factual memory (Needle in a Haystack)
python validation/memory/needle_test.py --config medium --depths 5 10 15 20 25 30 35 40 50 100

📂 Repository Structure

├── src/
│   ├── model.py          # Core architecture (RippleHead + SwiGLU MLP)
│   └── config.py         # Configuration dataclass
├── train.py              # Training script
├── sample.py             # Text generation script
├── validation/
│   ├── code/             # Code completion validation
│   └── memory/           # Memory & extrapolation tests
│       ├── needle_test.py         # "Needle in a Haystack" test
│       ├── extrapolation_test.py  # Context extrapolation validation
│       └── train_large.py         # Large-scale training script
└── tests/                # Unit tests

📜 Citation

If you find this architecture useful, please cite this repository.

@misc{tavernari2026ripplegpt,
  author       = {Tavernari, Victor Carvalho},
  title        = {RippleGPT: High-Efficiency Sequence Modeling via Decay-Biased Attention},
  year         = {2026},
  howpublished = {\url{https://github.com/Tavernari/RippleGPT}},
  publisher    = {GitHub},
  note         = {GitHub repository}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support