RippleGPT: Context-Aware Code Completion via Decay-Biased Attention 🌊

RippleGPT is a modern Transformer architecture optimized for code completion tasks. It replaces learned positional embeddings with a Decay-Biased Attention Mechanism (Ripple Field / ALiBi-style) and utilizes Multiplicative Gating (SwiGLU) for improved signal flow.

Comparison License

🎯 What RippleGPT IS (and is NOT)

βœ… Is ❌ Is NOT
Context-aware code completion engine Long-context Q&A assistant
Excellent at structural understanding (indentation, scope, flow) Good at factual recall from distant context
Extrapolation-native (train 512 β†’ infer 2048+) Memory-efficient (uses O(TΒ²) attention)
Sample-efficient (18% fewer params than GPT) Infinite-memory chatbot

πŸ§ͺ The Core Innovation

Standard Transformers fail when context exceeds training length. RippleGPT thrives on longer contexts:

Context Window Ratio Loss Perplexity vs Training
512 (Training) 1.0x 0.83 2.29 Baseline
1024 2.0x 0.73 2.08 -9.1% βœ…
2048 4.0x 0.70 2.00 -12.5% βœ…

Key Finding: The model performs better at 4x training context. This is contextual synergy, not just "stable extrapolation".

The Trade-Off: Structural vs Factual Memory

The Ripple Field creates a "memory horizon" of ~25-35 lines. Beyond this, factual recall fails:

Task Example Performance
Structural "What's the next line of code?" βœ… Excellent
Factual "What password was defined 50 lines ago?" ❌ Fails

This is ideal for code completion (local context matters most) but unsuitable for document Q&A.

⚠️ Technical Note: Memory Complexity

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  RFC-001 OPTIMIZATIONS: Memory-Aware Ripple Attention         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Phase 1 (SDPA): 83% memory reduction via fused operations    β”‚
β”‚  Phase 2 (Sliding Window): O(TΓ—w) β†’ 10,000+ token contexts!  β”‚
β”‚                                                               β”‚
β”‚  Benchmarks (window=512):                                     β”‚
β”‚  β€’ T=2000: 153ms β†’ 74ms (2.1x faster)                         β”‚
β”‚  β€’ T=5000: 648ms β†’ 210ms (3.1x faster)                        β”‚
β”‚  β€’ T=10000: OOM β†’ 324ms (∞ gain!)                             β”‚
β”‚                                                               β”‚
β”‚  βœ… ADVANTAGE: Length extrapolation, fast convergence          β”‚
β”‚  βœ… NEW: Sliding window for infinite context                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“Š Performance Summary

Training: 17M param model trained on 50MB code dataset for 10K iterations

  • Best validation loss: 0.72 (from random initialization at 7.88)
  • Training time: ~2 hours on Apple M-Series

Extrapolation: Trained on 512 tokens, tested up to 2048

  • Perplexity improves with longer context (-12.5% at 4x)

Needle Test: Factual recall accuracy by distance

  • 15 lines: 67% accurate | 35+ lines: 0% accurate

πŸš€ Quick Start

import torch
from src.model import RippleGPT, RippleConfig

# 1. Initialize (Full attention for short contexts)
config = RippleConfig(vocab_size=2260, block_size=512, n_layer=8, n_head=8, n_embd=512)
model = RippleGPT(config)

# 2. OR: Enable Sliding Window for 10k+ token contexts
config = RippleConfig(
    vocab_size=2260, block_size=512, n_layer=8, n_head=8, n_embd=512,
    attention_window=512  # Enables O(TΓ—512) memory!
)
model = RippleGPT(config)

# 3. Inference (Works on lengths > 512!)
idx = torch.zeros((1, 1), dtype=torch.long)
generated = model.generate(idx, max_new_tokens=500)

πŸ”¬ Scientific Validation

# 1. Prepare code dataset
python validation/memory/prepare_large_data.py --size 50

# 2. Train model (block_size=512)
python validation/memory/train_large.py --config medium

# 3. Test extrapolation (definitive ALiBi validation)
python validation/memory/extrapolation_test.py --config medium --max-context 2048

# 4. Test factual memory (Needle in a Haystack)
python validation/memory/needle_test.py --config medium --depths 5 10 15 20 25 30 35 40 50 100

πŸ“‚ Repository Structure

β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ model.py          # Core architecture (RippleHead + SwiGLU MLP)
β”‚   └── config.py         # Configuration dataclass
β”œβ”€β”€ train.py              # Training script
β”œβ”€β”€ sample.py             # Text generation script
β”œβ”€β”€ validation/
β”‚   β”œβ”€β”€ code/             # Code completion validation
β”‚   └── memory/           # Memory & extrapolation tests
β”‚       β”œβ”€β”€ needle_test.py         # "Needle in a Haystack" test
β”‚       β”œβ”€β”€ extrapolation_test.py  # Context extrapolation validation
β”‚       └── train_large.py         # Large-scale training script
└── tests/                # Unit tests

πŸ“œ Citation

If you find this architecture useful, please cite this repository.

@misc{tavernari2026ripplegpt,
  author       = {Tavernari, Victor Carvalho},
  title        = {RippleGPT: High-Efficiency Sequence Modeling via Decay-Biased Attention},
  year         = {2026},
  howpublished = {\url{https://github.com/Tavernari/RippleGPT}},
  publisher    = {GitHub},
  note         = {GitHub repository}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support