RippleGPT: Context-Aware Code Completion via Decay-Biased Attention π
RippleGPT is a modern Transformer architecture optimized for code completion tasks. It replaces learned positional embeddings with a Decay-Biased Attention Mechanism (Ripple Field / ALiBi-style) and utilizes Multiplicative Gating (SwiGLU) for improved signal flow.
π― What RippleGPT IS (and is NOT)
| β Is | β Is NOT |
|---|---|
| Context-aware code completion engine | Long-context Q&A assistant |
| Excellent at structural understanding (indentation, scope, flow) | Good at factual recall from distant context |
| Extrapolation-native (train 512 β infer 2048+) | Memory-efficient (uses O(TΒ²) attention) |
| Sample-efficient (18% fewer params than GPT) | Infinite-memory chatbot |
π§ͺ The Core Innovation
Standard Transformers fail when context exceeds training length. RippleGPT thrives on longer contexts:
| Context Window | Ratio | Loss | Perplexity | vs Training |
|---|---|---|---|---|
| 512 (Training) | 1.0x | 0.83 | 2.29 | Baseline |
| 1024 | 2.0x | 0.73 | 2.08 | -9.1% β |
| 2048 | 4.0x | 0.70 | 2.00 | -12.5% β |
Key Finding: The model performs better at 4x training context. This is contextual synergy, not just "stable extrapolation".
The Trade-Off: Structural vs Factual Memory
The Ripple Field creates a "memory horizon" of ~25-35 lines. Beyond this, factual recall fails:
| Task | Example | Performance |
|---|---|---|
| Structural | "What's the next line of code?" | β Excellent |
| Factual | "What password was defined 50 lines ago?" | β Fails |
This is ideal for code completion (local context matters most) but unsuitable for document Q&A.
β οΈ Technical Note: Memory Complexity
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RFC-001 OPTIMIZATIONS: Memory-Aware Ripple Attention β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Phase 1 (SDPA): 83% memory reduction via fused operations β
β Phase 2 (Sliding Window): O(TΓw) β 10,000+ token contexts! β
β β
β Benchmarks (window=512): β
β β’ T=2000: 153ms β 74ms (2.1x faster) β
β β’ T=5000: 648ms β 210ms (3.1x faster) β
β β’ T=10000: OOM β 324ms (β gain!) β
β β
β β
ADVANTAGE: Length extrapolation, fast convergence β
β β
NEW: Sliding window for infinite context β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Performance Summary
Training: 17M param model trained on 50MB code dataset for 10K iterations
- Best validation loss: 0.72 (from random initialization at 7.88)
- Training time: ~2 hours on Apple M-Series
Extrapolation: Trained on 512 tokens, tested up to 2048
- Perplexity improves with longer context (-12.5% at 4x)
Needle Test: Factual recall accuracy by distance
- 15 lines: 67% accurate | 35+ lines: 0% accurate
π Quick Start
import torch
from src.model import RippleGPT, RippleConfig
# 1. Initialize (Full attention for short contexts)
config = RippleConfig(vocab_size=2260, block_size=512, n_layer=8, n_head=8, n_embd=512)
model = RippleGPT(config)
# 2. OR: Enable Sliding Window for 10k+ token contexts
config = RippleConfig(
vocab_size=2260, block_size=512, n_layer=8, n_head=8, n_embd=512,
attention_window=512 # Enables O(TΓ512) memory!
)
model = RippleGPT(config)
# 3. Inference (Works on lengths > 512!)
idx = torch.zeros((1, 1), dtype=torch.long)
generated = model.generate(idx, max_new_tokens=500)
π¬ Scientific Validation
# 1. Prepare code dataset
python validation/memory/prepare_large_data.py --size 50
# 2. Train model (block_size=512)
python validation/memory/train_large.py --config medium
# 3. Test extrapolation (definitive ALiBi validation)
python validation/memory/extrapolation_test.py --config medium --max-context 2048
# 4. Test factual memory (Needle in a Haystack)
python validation/memory/needle_test.py --config medium --depths 5 10 15 20 25 30 35 40 50 100
π Repository Structure
βββ src/
β βββ model.py # Core architecture (RippleHead + SwiGLU MLP)
β βββ config.py # Configuration dataclass
βββ train.py # Training script
βββ sample.py # Text generation script
βββ validation/
β βββ code/ # Code completion validation
β βββ memory/ # Memory & extrapolation tests
β βββ needle_test.py # "Needle in a Haystack" test
β βββ extrapolation_test.py # Context extrapolation validation
β βββ train_large.py # Large-scale training script
βββ tests/ # Unit tests
π Citation
If you find this architecture useful, please cite this repository.
@misc{tavernari2026ripplegpt,
author = {Tavernari, Victor Carvalho},
title = {RippleGPT: High-Efficiency Sequence Modeling via Decay-Biased Attention},
year = {2026},
howpublished = {\url{https://github.com/Tavernari/RippleGPT}},
publisher = {GitHub},
note = {GitHub repository}
}