Spaces:

chrisxx
/

pong

Sleeping

App Files Files Community

chrisxx commited on Nov 11

Commit

8746765

1 Parent(s): a8aa75d

Add Neural Pong application files

Browse files

Files changed (39) hide show

DEPLOYMENT.md +68 -0
Dockerfile +22 -0
QUICKSTART.md +97 -0
README.md +37 -5
RUN_SETUP.md +58 -0
SETUP_GUIDE.md +244 -0
SETUP_STEPS.md +201 -0
SOURCE_FILES.md +63 -0
START_HERE.md +78 -0
TROUBLESHOOTING.md +90 -0
app.py +480 -0
checkpoints/ckpt-step=053700-metric=0.00092727.pt +3 -0
cleanup.sh +40 -0
configs/inference.yaml +50 -0
push-and-cleanup.sh +69 -0
push.sh +64 -0
requirements.txt +25 -0
setup.sh +55 -0
src/__init__.py +0 -0
src/__pycache__/__init__.cpython-311.pyc +0 -0
src/config.py +59 -0
src/datasets/__init__.py +2 -0
src/datasets/__pycache__/__init__.cpython-311.pyc +0 -0
src/datasets/__pycache__/pong1m.cpython-311.pyc +0 -0
src/datasets/pong1m.py +62 -0
src/inference/__init__.py +1 -0
src/inference/__pycache__/__init__.cpython-311.pyc +0 -0
src/inference/__pycache__/sampling.cpython-311.pyc +0 -0
src/inference/sampling.py +23 -0
src/models/__init__.py +0 -0
src/models/dit_dforce.py +206 -0
src/nn/__init__.py +0 -0
src/nn/attn.py +473 -0
src/nn/geglu.py +20 -0
src/nn/patch.py +80 -0
src/nn/pe.py +77 -0
src/utils/__init__.py +2 -0
src/utils/checkpoint.py +283 -0
static/index.html +162 -0

DEPLOYMENT.md ADDED Viewed

	@@ -0,0 +1,68 @@

+# Hugging Face Space Setup
+This folder contains everything needed to deploy the Neural Pong demo to Hugging Face Spaces.
+## Files Structure
+- `app.py` - Main Flask application (modified from play_pong.py, removed single-user limitation)
+- `Dockerfile` - Docker configuration for HF Spaces
+- `requirements.txt` - Python dependencies
+- `README.md` - Space description and metadata
+- `static/index.html` - Frontend web interface
+- `configs/inference.yaml` - Model configuration
+- `src/` - Source code for model loading and inference
+## Important Notes
+### Dependencies Fixed
+✅ **No external git dependencies**: The app now imports `sample` directly from `src.inference.sampling` instead of going through training code, avoiding the `muon-optimizer` git dependency.
+✅ **No data files needed**: The app uses `fixed2frame` directly instead of calling `get_loader`, so it doesn't need the training data files (`frames.npy`, `actions.npy`).
+✅ **Minimal codebase**: Only inference-related code is included. All training scripts and utilities have been removed:
+   - Removed: `src/trainers/`, `src/main.py`, `src/main_dmd.py`
+   - Removed: Unused dataset files, alternative models, custom norm
+   - Removed: Matplotlib dependencies (not needed for inference)
+   - **Total: 15 Python files** (down from 25+)
+See `SOURCE_FILES.md` for a complete list of included files.
+### Checkpoint Path
+The `configs/inference.yaml` file currently references a local checkpoint path:
+```yaml
+checkpoint: "experiments/radiant-forest-398/ckpt-step=053700-metric=0.00092727.pt"
+```
+**Before deploying**, you need to either:
+1. **Upload checkpoint to Hugging Face Hub** and update the path to load from Hub
+2. **Include the checkpoint file** in this directory and update the path
+3. **Use HF Spaces storage/secrets** to store the checkpoint
+### Changes Made
+- Removed single-user limitation (all users can connect simultaneously)
+- Simplified frontend to remove busy state handling
+- Updated port to use environment variable (defaults to 7860 for HF Spaces)
+- Created Dockerfile for containerized deployment
+## Deployment Steps
+1. Upload this folder to a Hugging Face Space repository
+2. Update the checkpoint path in `configs/inference.yaml` to point to your model
+3. Ensure the Space has GPU access enabled
+4. The Space will automatically build and deploy
+## Testing Locally
+To test locally with Docker:
+```bash
+docker build -t neural-pong .
+docker run -p 7860:7860 neural-pong
+```
+Then visit http://localhost:7860

Dockerfile ADDED Viewed

	@@ -0,0 +1,22 @@

+FROM python:3.11-slim
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    && rm -rf /var/lib/apt/lists/*
+# Copy requirements and install Python dependencies
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application code
+COPY . .
+# Expose port (HF Spaces will map this)
+EXPOSE 7860
+# Run the Flask app
+CMD python app.py

QUICKSTART.md ADDED Viewed

	@@ -0,0 +1,97 @@

+# 🚀 Quick Setup Guide for Hugging Face Space
+Your Neural Pong demo is ready to deploy! Follow these steps:
+## Step 1: Create Your Hugging Face Space
+1. **Go to Hugging Face Spaces:** https://huggingface.co/spaces
+2. **Click "Create new Space"**
+3. **Fill in the details:**
+   - **Space name:** `neural-pong` (or your preferred name)
+   - **SDK:** Select **"Docker"** ⚠️ Important!
+   - **Hardware:** Select **"GPU"** → **"T4 small"** (or larger)
+   - **Visibility:** Public or Private
+4. **Click "Create Space"**
+## Step 2: Upload Files Using Git
+```bash
+cd /share/u/wendler/code/toy-wm-hf-space
+# Initialize git (if not already done)
+git init
+# Add all files
+git add .
+# Commit
+git commit -m "Initial commit: Neural Pong demo"
+# Add your Space as remote (replace YOUR_USERNAME and SPACE_NAME)
+git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME
+# Push everything
+git push -u origin main
+```
+**Note:** The checkpoint file is 225MB, so Git is recommended over web upload.
+## Step 3: Wait for Build
+1. After pushing, Hugging Face will automatically start building
+2. Go to your Space page → **"Logs"** tab to watch progress
+3. Build time: **5-15 minutes** (installing PyTorch, etc.)
+## Step 4: Test Your Space
+1. Once build completes, visit your Space URL
+2. You should see the Pong interface
+3. Wait for model to load (loading spinner)
+4. Click **"Start Stream"**
+5. Use **Arrow Keys** or **WASD** to play!
+## Quick Commands
+```bash
+# Run the setup script
+./setup.sh
+# Check files are ready
+ls -la
+# Test Docker build locally (optional)
+docker build -t neural-pong .
+docker run -p 7860:7860 neural-pong
+```
+## Troubleshooting
+### Build Fails?
+- Check **"Logs"** tab for errors
+- Verify checkpoint path in `configs/inference.yaml`
+- Ensure GPU is selected in Space settings
+### Model Won't Load?
+- Verify checkpoint exists: `checkpoints/ckpt-step=053700-metric=0.00092727.pt`
+- Check the path in `configs/inference.yaml`
+- Look for errors in the Logs tab
+## What's Included
+✅ **app.py** - Flask application (no single-user limitation)
+✅ **checkpoints/** - Model checkpoint (225MB)
+✅ **src/** - All necessary source code (15 Python files)
+✅ **static/index.html** - Frontend interface
+✅ **configs/inference.yaml** - Model configuration
+✅ **Dockerfile** - Container configuration
+✅ **requirements.txt** - Python dependencies
+## Need More Help?
+- See `SETUP_GUIDE.md` for detailed instructions
+- See `DEPLOYMENT.md` for technical details
+- Check Hugging Face Spaces docs: https://huggingface.co/docs/hub/spaces
+---
+**Ready?** Run `./setup.sh` to get started! 🚀

README.md CHANGED Viewed

@@ -1,10 +1,42 @@
 ---
-title: Pong
-emoji: 📊
-colorFrom: purple
-colorTo: gray
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Neural Pong
+emoji: 🎮
+colorFrom: blue
+colorTo: purple
 sdk: docker
 pinned: false
+license: mit
 ---
+# Neural Pong
+A real-time Pong game where frames are generated by a diffusion model trained with rectified flow matching. Control the blue paddle using arrow keys or WASD to play!
+## Features
+- **Real-time frame generation**: Uses a frame-autoregressive transformer with diffusion sampling
+- **Interactive gameplay**: Control the paddle with keyboard inputs
+- **Configurable parameters**: Adjust FPS and diffusion steps
+- **Low-latency streaming**: Achieves ~20 FPS with 4 diffusion steps
+## How to Play
+1. Wait for the model to load (you'll see a loading spinner)
+2. Click "Start Stream" to begin generating frames
+3. Use **Arrow Keys** or **WASD** to control the blue paddle:
+   - **Up/W**: Move paddle up
+   - **Down/S**: Move paddle down
+4. Adjust the FPS and diffusion steps using the controls
+5. Click "Stop Stream" when done
+## Technical Details
+This demo uses a small transformer model trained with rectified flow matching to simulate Pong game frames conditioned on user inputs. The model generates 24×24 pixel frames in real-time using diffusion sampling with configurable steps.
+## Model Architecture
+- Frame-autoregressive transformer
+- Rectified flow matching training
+- Caching for efficient inference
+- GPU-accelerated generation

RUN_SETUP.md ADDED Viewed

	@@ -0,0 +1,58 @@

+# Git Setup Complete - Next Steps
+I've created a setup script for you. Here's what to do:
+## Run the Setup Script
+```bash
+cd /share/u/wendler/code/toy-wm-hf-space
+chmod +x setup-git.sh
+./setup-git.sh
+```
+This script will:
+1. ✅ Initialize git (if needed)
+2. ✅ Remove old SSH remote
+3. ✅ Add HTTPS remote: `https://huggingface.co/spaces/wendlerc/pong`
+4. ✅ Stage all files
+5. ✅ Create initial commit (if needed)
+6. ✅ Ensure branch is named `main`
+7. ✅ Show you the push command
+## After Running the Script
+The script will show you the exact command to push. It will be:
+```bash
+git push -u origin main
+```
+## Before Pushing - Important!
+**Make sure your Space exists:**
+1. Go to: https://huggingface.co/spaces/wendlerc/pong
+2. If it doesn't exist, create it:
+   - Go to: https://huggingface.co/spaces
+   - Click "Create new Space"
+   - Name: `pong`
+   - SDK: **Docker**
+   - Hardware: **GPU (T4 small)**
+   - Click "Create Space"
+## Then Push
+```bash
+git push -u origin main
+```
+You'll be prompted for your Hugging Face credentials (username and access token).
+## If You Need an Access Token
+1. Go to: https://huggingface.co/settings/tokens
+2. Create a new token with "write" permissions
+3. Use it as your password when pushing
+---
+**The setup script is ready!** Just run `./setup-git.sh` and follow the instructions.

SETUP_GUIDE.md ADDED Viewed

	@@ -0,0 +1,244 @@

+# Step-by-Step Setup Guide for Hugging Face Space
+This guide will walk you through deploying your Neural Pong demo to Hugging Face Spaces.
+## Prerequisites
+- A Hugging Face account (sign up at https://huggingface.co/join)
+- The model checkpoint file (`ckpt-step=053700-metric=0.00092727.pt`)
+- Git installed on your machine (for uploading files)
+---
+## Step 1: Prepare Your Checkpoint File
+First, you need to decide how to handle the model checkpoint. You have two main options:
+### Option A: Include Checkpoint in Repository (Simplest)
+1. **Locate your checkpoint file:**
+   ```bash
+   # Check if the file exists
+   ls /share/u/wendler/code/toy-wm/experiments/radiant-forest-398/ckpt-step=053700-metric=0.00092727.pt
+   ```
+2. **Copy it to the hf-space directory:**
+   ```bash
+   mkdir -p /share/u/wendler/code/toy-wm/hf-space/checkpoints
+   cp /share/u/wendler/code/toy-wm/experiments/radiant-forest-398/ckpt-step=053700-metric=0.00092727.pt \
+      /share/u/wendler/code/toy-wm/hf-space/checkpoints/
+   ```
+3. **Update the config file** to point to the new location:
+   ```yaml
+   checkpoint: "checkpoints/ckpt-step=053700-metric=0.00092727.pt"
+   ```
+### Option B: Upload to Hugging Face Hub (Better for Large Files)
+1. **Install Hugging Face Hub:**
+   ```bash
+   pip install huggingface-hub
+   ```
+2. **Login to Hugging Face:**
+   ```bash
+   huggingface-cli login
+   ```
+3. **Create a model repository and upload:**
+   ```bash
+   # Create a repository (replace YOUR_USERNAME with your HF username)
+   huggingface-cli repo create YOUR_USERNAME/neural-pong-checkpoint --type model
+   # Upload the checkpoint
+   huggingface-cli upload YOUR_USERNAME/neural-pong-checkpoint \
+     /share/u/wendler/code/toy-wm/experiments/radiant-forest-398/ckpt-step=053700-metric=0.00092727.pt \
+     ckpt-step=053700-metric=0.00092727.pt
+   ```
+4. **Modify the checkpoint loading code** to download from Hub (we'll do this in Step 2)
+---
+## Step 2: Update Configuration Files
+### If using Option A (checkpoint in repo):
+Update `configs/inference.yaml`:
+```yaml
+checkpoint: "checkpoints/ckpt-step=053700-metric=0.00092727.pt"
+```
+### If using Option B (HF Hub):
+We'll need to modify the app.py to download the checkpoint. Let me know if you want to go this route.
+---
+## Step 3: Create a Hugging Face Space
+1. **Go to Hugging Face Spaces:** https://huggingface.co/spaces
+2. **Click "Create new Space"**
+3. **Fill in the details:**
+   - **Space name:** `neural-pong` (or your preferred name)
+   - **SDK:** Select **Docker**
+   - **Hardware:** Select **GPU** (T4 small or larger)
+   - **Visibility:** Public or Private (your choice)
+4. **Click "Create Space"**
+---
+## Step 4: Upload Files to the Space
+You have two options:
+### Option A: Using Git (Recommended)
+1. **Initialize git in your hf-space directory:**
+   ```bash
+   cd /share/u/wendler/code/toy-wm/hf-space
+   git init
+   git add .
+   git commit -m "Initial commit"
+   ```
+2. **Add the Hugging Face remote:**
+   ```bash
+   # Replace YOUR_USERNAME and SPACE_NAME with your values
+   git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME
+   ```
+3. **Push to Hugging Face:**
+   ```bash
+   git push -u origin main
+   ```
+### Option B: Using Web Interface
+1. **Go to your Space page** on Hugging Face
+2. **Click "Files" tab**
+3. **Click "Add file" → "Upload files"**
+4. **Drag and drop all files** from the `hf-space` directory
+5. **Click "Commit changes"**
+**Note:** For large checkpoint files, Git is recommended as the web interface has size limits.
+---
+## Step 5: Configure the Space
+1. **Go to your Space settings** (click the gear icon)
+2. **Important settings:**
+   - **Hardware:** Ensure GPU is selected (T4 small minimum)
+   - **Environment variables:** None needed for basic setup
+   - **Storage:** If using Option B, you might want persistent storage
+3. **Save settings**
+---
+## Step 6: Wait for Build and Deployment
+1. **After pushing files**, Hugging Face will automatically:
+   - Build the Docker image
+   - Install dependencies
+   - Start your application
+2. **Monitor the build:**
+   - Go to your Space page
+   - Click "Logs" tab to see build progress
+   - Look for any errors
+3. **Expected build time:** 5-15 minutes depending on dependencies
+---
+## Step 7: Test Your Space
+1. **Once the build completes**, your Space will be live
+2. **Visit your Space URL:** `https://huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME`
+3. **Test the application:**
+   - Wait for model to load (loading spinner)
+   - Click "Start Stream"
+   - Use arrow keys or WASD to control paddle
+   - Verify frames are generating correctly
+---
+## Troubleshooting
+### Build Fails
+- **Check logs** in the Space's "Logs" tab
+- **Common issues:**
+  - Missing dependencies in `requirements.txt`
+  - Dockerfile syntax errors
+  - Checkpoint file not found (check path in `inference.yaml`)
+### Model Won't Load
+- **Check checkpoint path** in `configs/inference.yaml`
+- **Verify checkpoint file exists** in the repository
+- **Check GPU availability** in Space settings
+### Port Issues
+- The app uses port 7860 (HF Spaces default)
+- If you see port errors, check the `PORT` environment variable
+### Out of Memory
+- **Reduce batch size** or model size
+- **Upgrade to larger GPU** in Space settings
+- **Check if checkpoint is too large** (consider Option B)
+---
+## Quick Reference Commands
+```bash
+# Navigate to hf-space directory
+cd /share/u/wendler/code/toy-wm/hf-space
+# Check files are ready
+ls -la
+# Test Docker build locally (optional)
+docker build -t neural-pong .
+docker run -p 7860:7860 neural-pong
+# Git setup (if using Git)
+git init
+git add .
+git commit -m "Initial commit"
+git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME
+git push -u origin main
+```
+---
+## Next Steps
+After successful deployment:
+1. **Share your Space** with others
+2. **Monitor usage** in the Space analytics
+3. **Update as needed** by pushing new commits
+4. **Consider adding:**
+   - Better error handling
+   - More configuration options
+   - Performance optimizations
+---
+## Need Help?
+- Check Hugging Face Spaces docs: https://huggingface.co/docs/hub/spaces
+- Review your Space logs for errors
+- Test locally with Docker first to catch issues early

SETUP_STEPS.md ADDED Viewed

	@@ -0,0 +1,201 @@

+# Step-by-Step Setup for Hugging Face Space
+Follow these steps to deploy your Neural Pong demo to Hugging Face Spaces.
+## ✅ Pre-flight Check
+Your directory structure looks good! Here's what you have:
+```
+toy-wm-hf-space/
+├── app.py                    ✅ Main Flask application
+├── Dockerfile                ✅ Docker configuration
+├── requirements.txt         ✅ Python dependencies
+├── README.md                ✅ Space metadata
+├── checkpoints/             ✅ Model checkpoint (225MB)
+│   └── ckpt-step=053700-metric=0.00092727.pt
+├── configs/
+│   └── inference.yaml       ✅ Model config (checkpoint path correct)
+├── static/
+│   └── index.html           ✅ Frontend
+└── src/                     ✅ All source code (15 files)
+```
+## Step 1: Create Your Hugging Face Space
+1. **Go to:** https://huggingface.co/spaces
+2. **Click:** "Create new Space" button
+3. **Fill in:**
+   - **Space name:** `neural-pong` (or your choice)
+   - **SDK:** **Docker** ⚠️ Must be Docker!
+   - **Hardware:** **GPU** → **T4 small** (minimum)
+   - **Visibility:** Public or Private
+4. **Click:** "Create Space"
+You'll get a URL like: `https://huggingface.co/spaces/YOUR_USERNAME/neural-pong`
+## Step 2: Initialize Git Repository
+```bash
+cd /share/u/wendler/code/toy-wm-hf-space
+# Initialize git (if not already done)
+git init
+# Add all files
+git add .
+# Make initial commit
+git commit -m "Initial commit: Neural Pong demo"
+```
+## Step 3: Connect to Your Hugging Face Space
+Replace `YOUR_USERNAME` and `SPACE_NAME` with your actual values:
+```bash
+# Add your Space as remote
+git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME
+# Push everything
+git push -u origin main
+```
+**Example:**
+```bash
+git remote add origin https://huggingface.co/spaces/johndoe/neural-pong
+git push -u origin main
+```
+**Note:** The checkpoint is 225MB, so this may take a few minutes to upload.
+## Step 4: Monitor the Build
+1. **Go to your Space page** on Hugging Face
+2. **Click the "Logs" tab** to watch the build progress
+3. **Wait 5-15 minutes** for:
+   - Docker image build
+   - Dependency installation (PyTorch, Flask, etc.)
+   - Model loading
+## Step 5: Test Your Space
+Once the build completes:
+1. **Visit your Space URL** (e.g., `https://huggingface.co/spaces/YOUR_USERNAME/neural-pong`)
+2. **You should see:**
+   - Loading spinner while model loads
+   - Controls for FPS and diffusion steps
+   - "Start Stream" button
+3. **Test the game:**
+   - Click "Start Stream"
+   - Use Arrow Keys or WASD to control paddle
+   - Verify frames are generating
+## Troubleshooting
+### Build Fails?
+**Check the Logs tab for errors:**
+- **Missing dependencies?** → Check `requirements.txt`
+- **Checkpoint not found?** → Verify path in `configs/inference.yaml`
+- **GPU errors?** → Ensure GPU is enabled in Space settings
+- **Port errors?** → Should use port 7860 automatically
+### Model Won't Load?
+1. **Verify checkpoint path** in `configs/inference.yaml`:
+   ```yaml
+   checkpoint: "checkpoints/ckpt-step=053700-metric=0.00092727.pt"
+   ```
+2. **Check checkpoint exists:**
+   ```bash
+   ls -lh checkpoints/ckpt-step=053700-metric=0.00092727.pt
+   ```
+3. **Look for errors** in the Logs tab
+### Out of Memory?
+- **Upgrade GPU** in Space settings (T4 medium or larger)
+- **Reduce batch size** if applicable
+- **Check checkpoint size** (225MB is reasonable)
+## Testing Locally (Optional)
+Before deploying, you can test locally with Docker:
+```bash
+cd /share/u/wendler/code/toy-wm-hf-space
+# Build Docker image
+docker build -t neural-pong .
+# Run container
+docker run -p 7860:7860 --gpus all neural-pong
+# Visit http://localhost:7860
+```
+**Note:** Requires Docker and NVIDIA Docker runtime for GPU support.
+## Updating Your Space
+After making changes:
+```bash
+git add .
+git commit -m "Your update message"
+git push origin main
+```
+Hugging Face will automatically rebuild and redeploy.
+## File Checklist
+Before pushing, verify:
+- ✅ `app.py` exists and is executable
+- ✅ `Dockerfile` exists
+- ✅ `requirements.txt` has all dependencies
+- ✅ `checkpoints/ckpt-step=053700-metric=0.00092727.pt` exists (225MB)
+- ✅ `configs/inference.yaml` has correct checkpoint path
+- ✅ `static/index.html` exists
+- ✅ `src/` directory has all necessary files
+## Quick Reference
+**Your Space URL format:**
+```
+https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
+```
+**Git remote format:**
+```bash
+git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
+```
+**Key files:**
+- `app.py` - Main application (port 7860)
+- `Dockerfile` - Container config
+- `requirements.txt` - Dependencies
+- `configs/inference.yaml` - Model config
+## Next Steps After Deployment
+1. **Share your Space** with others
+2. **Monitor usage** in Space analytics
+3. **Update as needed** by pushing new commits
+4. **Consider adding:**
+   - Better error handling
+   - Performance metrics
+   - More configuration options
+---
+**Ready to deploy?** Follow Step 1 above! 🚀
+For more details, see `QUICKSTART.md` or `DEPLOYMENT.md`.

SOURCE_FILES.md ADDED Viewed

	@@ -0,0 +1,63 @@

+# Source Files Included in Hugging Face Space
+This document lists all the source files included in the deployment. Only inference-related code is included - all training code has been removed.
+## File Structure
+```
+src/
+├── __init__.py                    # Package init
+├── config.py                      # Configuration classes (Config, TransformerConfig, etc.)
+├── datasets/
+│   ├── __init__.py                # Datasets package init
+│   └── pong1m.py                  # Dataset utilities (only fixed2frame used)
+├── inference/
+│   ├── __init__.py                # Inference package init
+│   └── sampling.py                # Diffusion sampling function
+├── models/
+│   ├── __init__.py                # Models package init
+│   └── dit_dforce.py              # CausalDit model and get_model function
+├── nn/
+│   ├── __init__.py                # NN package init
+│   ├── attn.py                    # Attention mechanisms and KVCache
+│   ├── geglu.py                   # GEGLU activation
+│   ├── patch.py                   # Patch/UnPatch for image tokens
+│   └── pe.py                      # Positional encodings (RoPE, FrameRoPE, etc.)
+└── utils/
+    ├── __init__.py                # Utils package init
+    └── checkpoint.py               # Model loading utilities
+```
+## Total: 15 Python files
+## Files Removed (Training Code)
+- ❌ `src/main.py` - Training script
+- ❌ `src/main_dmd.py` - Training script
+- ❌ `src/trainers/` - All training code (5 files)
+- ❌ `src/datasets/pong1m_embedding.py` - Not used
+- ❌ `src/datasets/pong1m_gpt.py` - Not used
+- ❌ `src/models/dit.py` - Alternative model (not used)
+- ❌ `src/nn/norm.py` - Custom norm (not used, PyTorch LayerNorm used instead)
+- ❌ `src/utils/logging.py` - Logging utilities (not needed for inference)
+- ❌ `src/config/` - Empty directory
+## Dependencies Removed
+- ✅ Removed `matplotlib` imports (not needed for inference)
+- ✅ Removed `muon-optimizer` dependency (only used in training)
+- ✅ Removed training data file dependencies
+## Verification
+All necessary classes and functions are included:
+- ✅ `load_model_from_config` - Model loading
+- ✅ `sample` - Diffusion sampling
+- ✅ `fixed2frame` - Frame conversion
+- ✅ `Config` - Configuration parsing
+- ✅ `CausalDit` - Model class
+- ✅ `KVCache` - KV caching for inference
+- ✅ All NN components (Attention, GEGLU, Patch, Positional Encodings)
+The codebase is now minimal and contains only what's needed for inference.

START_HERE.md ADDED Viewed

	@@ -0,0 +1,78 @@

+# 🚀 Your Hugging Face Space is Ready!
+Everything is set up and ready to deploy. Here's what you need to do:
+## Quick Start (3 Steps)
+### 1. Create Your Space
+- Go to: https://huggingface.co/spaces
+- Click "Create new Space"
+- Name: `neural-pong`
+- SDK: **Docker** ⚠️
+- Hardware: **GPU (T4 small)**
+### 2. Push Your Code
+```bash
+cd /share/u/wendler/code/toy-wm-hf-space
+git init
+git add .
+git commit -m "Initial commit"
+# Replace with your actual Space URL
+git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
+git push -u origin main
+```
+### 3. Wait & Test
+- Watch build in "Logs" tab (5-15 min)
+- Visit your Space URL when done
+- Click "Start Stream" and play!
+## ✅ What's Ready
+- ✅ **app.py** - Flask app (port 7860, no user limits)
+- ✅ **Dockerfile** - Container config
+- ✅ **requirements.txt** - All dependencies
+- ✅ **checkpoints/** - Model file (225MB)
+- ✅ **configs/inference.yaml** - Config (checkpoint path correct)
+- ✅ **static/index.html** - Frontend
+- ✅ **src/** - All source code (15 files, cleaned)
+## 📚 Documentation
+- **SETUP_STEPS.md** - Detailed step-by-step guide
+- **QUICKSTART.md** - Quick reference
+- **DEPLOYMENT.md** - Technical details
+- **README.md** - Space description
+## 🔍 Verify Before Pushing
+```bash
+# Check checkpoint exists
+ls -lh checkpoints/ckpt-step=053700-metric=0.00092727.pt
+# Check config path
+grep checkpoint configs/inference.yaml
+# Check main files
+ls app.py Dockerfile requirements.txt
+```
+## 💡 Tips
+- **Large file upload:** The checkpoint is 225MB, Git is recommended
+- **Build time:** 5-15 minutes (PyTorch installation)
+- **GPU required:** Make sure GPU is enabled in Space settings
+- **Port:** Automatically uses 7860 (HF Spaces default)
+## 🆘 Need Help?
+1. Check **SETUP_STEPS.md** for detailed instructions
+2. Check Space **Logs** tab for build errors
+3. Verify all files are present (see checklist above)
+---
+**Ready?** Start with Step 1 above! 🎮

TROUBLESHOOTING.md ADDED Viewed

	@@ -0,0 +1,90 @@

+# Troubleshooting: Repository Not Found
+The error "Repository not found" usually means one of these issues:
+## Issue 1: Space Doesn't Exist Yet
+**You need to create the Space on Hugging Face first!**
+1. Go to: https://huggingface.co/spaces
+2. Click "Create new Space"
+3. Fill in:
+   - **Space name:** `pong` (or your choice)
+   - **SDK:** Docker
+   - **Hardware:** GPU (T4 small)
+4. Click "Create Space"
+**Then** come back and push your code.
+## Issue 2: Wrong Remote URL
+Your remote is currently set to SSH: `git@hf.co:spaces/wendlerc/pong`
+**For Hugging Face Spaces, use HTTPS instead:**
+```bash
+cd /share/u/wendler/code/toy-wm-hf-space
+# Remove the old remote
+git remote remove origin
+# Add the correct HTTPS remote
+git remote add origin https://huggingface.co/spaces/wendlerc/pong
+# Now try pushing
+git push -u origin main
+```
+## Issue 3: Wrong Space Name
+Make sure the Space name matches exactly. If you created it with a different name, update the URL:
+```bash
+# Check what remote you have
+git remote -v
+# Update to correct URL (replace with your actual Space name)
+git remote set-url origin https://huggingface.co/spaces/wendlerc/YOUR_SPACE_NAME
+```
+## Quick Fix Commands
+```bash
+cd /share/u/wendler/code/toy-wm-hf-space
+# 1. Check current remote
+git remote -v
+# 2. Remove old remote
+git remote remove origin
+# 3. Add HTTPS remote (replace wendlerc/pong with your actual Space)
+git remote add origin https://huggingface.co/spaces/wendlerc/pong
+# 4. Verify remote
+git remote -v
+# 5. Push
+git push -u origin main
+```
+## Verify Your Space Exists
+1. Go to: https://huggingface.co/spaces/wendlerc
+2. Check if `pong` Space exists
+3. If not, create it first!
+## Alternative: Use Hugging Face CLI
+If you have `huggingface-cli` installed:
+```bash
+# Login
+huggingface-cli login
+# Create Space (if it doesn't exist)
+huggingface-cli repo create wendlerc/pong --type space --sdk docker
+```
+Then push your code.

app.py ADDED Viewed

	@@ -0,0 +1,480 @@

+#!/usr/bin/env python3
+"""
+Pong backend (GPU, eager) for Hugging Face Spaces.
+Broadcasts readiness via Socket.IO so the frontend can auto-hide a loading overlay once the model is ready.
+"""
+# Eventlet must be imported first and monkey-patched before other imports
+import eventlet
+eventlet.monkey_patch()
+import sys
+import os
+import time
+import threading
+import base64
+import traceback
+from contextlib import contextmanager
+from io import BytesIO
+import torch as t
+import torch._dynamo as _dynamo
+import numpy as np
+from PIL import Image
+from flask import Flask, request, jsonify, send_from_directory
+from flask_cors import CORS
+from flask_socketio import SocketIO, emit
+# --------------------------
+# Project imports
+# --------------------------
+project_root = os.path.dirname(os.path.abspath(__file__))
+if project_root not in sys.path:
+    sys.path.insert(0, project_root)
+from src.utils.checkpoint import load_model_from_config
+from src.inference.sampling import sample
+from src.datasets.pong1m import fixed2frame
+from src.config import Config
+# --------------------------
+# App setup
+# --------------------------
+app = Flask(__name__, static_folder='static')
+CORS(app)
+# Configure SocketIO - use eventlet for proper WebSocket support
+socketio = SocketIO(
+    app,
+    cors_allowed_origins="*",
+    async_mode='eventlet',
+    logger=False,
+    engineio_logger=False,
+    ping_timeout=60,
+    ping_interval=25,
+    max_http_buffer_size=1e8  # Allow larger messages
+)
+# --------------------------
+# Globals
+# --------------------------
+model = None
+pred2frame = None
+device = None
+server_ready = False    # <--- readiness flag
+stream_lock = threading.Lock()
+stream_thread = None
+stream_running = False
+latest_action = 1  # 0=init, 1=nothing, 2=up, 3=down
+target_fps = 30
+frame_index = 0
+noise_buf = None       # (1,1,3,24,24) on GPU
+action_buf = None      # (1,1) long on GPU
+cpu_png_buffer = None  # BytesIO; reused
+step_once = None
+# --------------------------
+# Perf (new API)
+# --------------------------
+t.backends.cudnn.benchmark = True
+t.backends.cudnn.conv.fp32_precision = "tf32"
+t.backends.cuda.matmul.fp32_precision = "high"
+# --------------------------
+# Debug helpers
+# --------------------------
+def _shape(x):
+    try:
+        return f"{tuple(x.shape)} | {x.dtype} | {x.device}"
+    except Exception:
+        return "<?>"
+def _shape_attr(obj, name):
+    try:
+        ten = getattr(obj, name, None)
+        return None if ten is None else _shape(ten)
+    except Exception:
+        return None
+def _fail(msg, extra=None):
+    lines = [f"[GEN ERROR] {msg}"]
+    if extra:
+        for k, v in extra.items():
+            lines.append(f"  - {k}: {v}")
+    raise RuntimeError("\n".join(lines))
+@contextmanager
+def log_step_debug(action_tensor=None, noise_tensor=None):
+    try:
+        yield
+    except Exception as e:
+        tb = traceback.format_exc(limit=6)
+        _fail("Step failed",
+              extra={
+                  "action": _shape(action_tensor),
+                  "noise": _shape(noise_tensor),
+                  "model.device": str(device),
+                  "cache.keys": _shape_attr(getattr(model, "cache", None), "keys"),
+                  "cache.values": _shape_attr(getattr(model, "cache", None), "values"),
+                  "frame_index": str(frame_index),
+                  "exception": f"{type(e).__name__}: {e}",
+                  "trace": tb.strip()
+              })
+# --------------------------
+# Utilities
+# --------------------------
+def _ensure_cuda():
+    if not t.cuda.is_available():
+        raise RuntimeError("CUDA GPU required; torch.cuda.is_available() is False.")
+    return t.device("cuda:0")
+def _png_base64_from_uint8(frame_uint8) -> str:
+    global cpu_png_buffer
+    if cpu_png_buffer is None:
+        cpu_png_buffer = BytesIO()
+    else:
+        cpu_png_buffer.seek(0)
+        cpu_png_buffer.truncate(0)
+    Image.fromarray(frame_uint8).save(cpu_png_buffer, format="PNG")
+    return base64.b64encode(cpu_png_buffer.getvalue()).decode()
+def _reset_cache_fresh():
+    model.cache.reset()
+def _broadcast_ready():
+    """Tell all clients whether the server is ready."""
+    socketio.emit('server_status', {'ready': server_ready, 'busy': False})
+# --------------------------
+# Model init (pure eager) & warmup
+# --------------------------
+def initialize_model():
+    global model, pred2frame, device
+    global noise_buf, action_buf, step_once, server_ready
+    t_start = time.time()
+    print("Loading model and preparing GPU runtime...")
+    device = _ensure_cuda()
+    config_path = os.path.join(project_root, "configs/inference.yaml")
+    cfg = Config.from_yaml(config_path)
+    checkpoint_path = cfg.model.checkpoint
+    model = load_model_from_config(config_path, checkpoint_path=checkpoint_path, strict=False)
+    model.to(device)  # Move model to GPU before activating cache
+    model.eval()
+    model.activate_caching(1, 300)  # Cache will now be created on the same device as model
+    # Use fixed2frame directly instead of get_loader to avoid loading data files
+    globals()["pred2frame"] = fixed2frame
+    H = W = 24
+    noise_buf = t.empty((1, 1, 3, H, W), device=device)
+    action_buf = t.empty((1, 1), dtype=t.long, device=device)
+    @_dynamo.disable
+    def _step(model_, action_scalar_long: int, n_steps: int, cfg: float, clamp: bool):
+        # Match the notebook logic exactly: create fresh noise each time
+        noise = t.randn(1, 1, 3, 24, 24, device=device)
+        action_buf.fill_(int(action_scalar_long))
+        assert action_buf.shape == (1, 1) and action_buf.dtype == t.long and action_buf.device == device, \
+            f"action_buf wrong: { _shape(action_buf) }"
+        assert noise.shape == (1, 1, 3, 24, 24) and noise.device == device, \
+            f"noise wrong: { _shape(noise) }"
+        # Debug: Check cache state before sampling
+        if model_.cache is not None:
+            cache_loc = model_.cache.local_location
+            if cache_loc == 0:
+                # Cache is empty, this should be fine for the first frame
+                pass
+            elif cache_loc > 0:
+                # Check if cache has valid data
+                k_test, v_test = model_.cache.get(0)
+                if k_test.shape[1] == 0:
+                    print(f"Warning: Cache returned empty tensors at frame {frame_index}, resetting...")
+                    _reset_cache_fresh()
+        # Sample with the fresh noise (matching notebook: sample(model, noise, actions[:, aidx:aidx+1], ...))
+        z = sample(model_, noise, action_buf, num_steps=n_steps, cfg=cfg, negative_actions=None)
+        # Update cache location after sample (matching notebook: model.cache.update_global_location(1))
+        model_.cache.update_global_location(1)
+        if clamp:
+            z = t.clamp(z, -1, 1)
+        return z
+    globals()["step_once"] = _step
+    print("Mode: eager (no torch.compile)")
+    # Warmup
+    _reset_cache_fresh()
+    with t.inference_mode(), t.autocast(device_type="cuda", dtype=t.bfloat16):
+        for _ in range(4):
+            with log_step_debug(action_tensor=action_buf, noise_tensor=noise_buf):
+                _ = step_once(model, action_scalar_long=1, n_steps=4, cfg=0.0, clamp=True)
+    server_ready = True
+    print(f"Model ready on {device}")
+    _broadcast_ready()
+    return model, pred2frame
+# --------------------------
+# Fixed-FPS streaming worker
+# --------------------------
+class FrameScheduler(threading.Thread):
+    def __init__(self, fps=30, n_steps=8, cfg=0.0, clamp=True):
+        super().__init__(daemon=True)
+        self.frame_period = 1.0 / max(1, int(fps))
+        self.n_steps = int(n_steps)
+        self.cfg = float(cfg)
+        self.clamp = bool(clamp)
+        self._stop = threading.Event()
+        # FPS tracking
+        self.frame_times = []
+        self.last_frame_time = None
+    def stop(self):
+        self._stop.set()
+    def run(self):
+        global frame_index, latest_action
+        next_tick = time.perf_counter()
+        while not self._stop.is_set():
+            start = time.perf_counter()
+            if start - next_tick > self.frame_period * 0.75:
+                next_tick = start + self.frame_period
+                continue
+            try:
+                with stream_lock:
+                    action = int(latest_action)
+                with t.inference_mode(), t.autocast(device_type="cuda", dtype=t.bfloat16):
+                    with log_step_debug(action_tensor=action_buf, noise_tensor=noise_buf):
+                        z = step_once(model, action_scalar_long=action,
+                                      n_steps=self.n_steps, cfg=self.cfg, clamp=self.clamp)
+                frames_btchw = pred2frame(z)
+                # Debug: check what pred2frame returns
+                if frame_index < 3:
+                    print(f"Frame {frame_index}: z range [{z.min().item():.3f}, {z.max().item():.3f}], "
+                          f"frames_btchw dtype={frames_btchw.dtype}, range [{frames_btchw.min().item()}, {frames_btchw.max().item()}]")
+                frame_arr = frames_btchw[0, 0].permute(1, 2, 0).contiguous()
+                if isinstance(frame_arr, t.Tensor):
+                    frame_np = frame_arr.to("cpu", non_blocking=True).numpy()
+                else:
+                    frame_np = frame_arr.astype(np.uint8, copy=False)
+                img_b64 = _png_base64_from_uint8(frame_np)
+                # Calculate achieved FPS
+                current_time = time.perf_counter()
+                if self.last_frame_time is not None:
+                    frame_delta = current_time - self.last_frame_time
+                    self.frame_times.append(frame_delta)
+                    # Keep only last 30 frames for moving average
+                    if len(self.frame_times) > 30:
+                        self.frame_times.pop(0)
+                    avg_frame_time = sum(self.frame_times) / len(self.frame_times)
+                    achieved_fps = 1.0 / avg_frame_time if avg_frame_time > 0 else 0
+                else:
+                    achieved_fps = 0
+                self.last_frame_time = current_time
+                socketio.emit('frame', {'frame': img_b64,
+                                        'frame_index': frame_index,
+                                        'action': action,
+                                        'fps': achieved_fps})
+                frame_index += 1
+            except Exception as e:
+                print("Generation error:", repr(e))
+                socketio.emit('error', {'message': str(e)})
+            next_tick += self.frame_period
+            now = time.perf_counter()
+            sleep_for = next_tick - now
+            if sleep_for > 0:
+                time.sleep(sleep_for)
+# --------------------------
+# Routes
+# --------------------------
+@app.route('/')
+def index():
+    return send_from_directory('static', 'index.html')
+@app.errorhandler(500)
+def handle_500(e):
+    """Handle WSGI errors gracefully"""
+    import traceback
+    print(f"Flask error handler caught: {e}")
+    traceback.print_exc()
+    return jsonify({'error': 'Internal server error'}), 500
+@app.route('/api/health', methods=['GET'])
+def health():
+    return jsonify({
+        'status': 'ok',
+        'ready': server_ready,
+        'model_loaded': model is not None,
+        'device': str(device) if device else None,
+        'stream_running': stream_running,
+        'target_fps': target_fps
+    })
+@app.route('/api/generate', methods=['POST'])
+def generate_frames():
+    try:
+        if not server_ready:
+            return jsonify({'success': False, 'error': 'Server not ready'}), 503
+        data = request.json or {}
+        actions_list = data.get('actions', [1])
+        n_steps = int(data.get('n_steps', 8))
+        cfg = float(data.get('cfg', 0))
+        clamp = bool(data.get('clamp', True))
+        if len(actions_list) == 0 or actions_list[0] != 0:
+            actions_list = [0] + actions_list
+        _reset_cache_fresh()
+        frames_png = []
+        with t.inference_mode(), t.autocast(device_type="cuda", dtype=t.bfloat16):
+            for a in actions_list:
+                with log_step_debug(action_tensor=action_buf, noise_tensor=noise_buf):
+                    z = step_once(model, action_scalar_long=int(a), n_steps=n_steps, cfg=cfg, clamp=clamp)
+                f_btchw = pred2frame(z)
+                f_arr = f_btchw[0, 0].permute(1, 2, 0).contiguous()
+                if isinstance(f_arr, t.Tensor):
+                    if f_arr.dtype != t.uint8:
+                        f_arr = f_arr.to(t.uint8)
+                    f_np = f_arr.to("cpu", non_blocking=True).numpy()
+                else:
+                    f_np = f_arr.astype(np.uint8, copy=False)
+                frames_png.append(_png_base64_from_uint8(f_np))
+        return jsonify({'success': True, 'frames': frames_png, 'num_frames': len(frames_png)})
+    except Exception as e:
+        print("Batch generation error:", repr(e))
+        return jsonify({'success': False, 'error': str(e)}), 500
+# --------------------------
+# Socket events & helpers
+# --------------------------
+def start_stream(n_steps=8, cfg=0.0, fps=30, clamp=True):
+    global stream_thread, stream_running, frame_index, target_fps, latest_action
+    if not server_ready:
+        _broadcast_ready()
+        raise RuntimeError("Server not ready")
+    with stream_lock:
+        stop_stream()
+        target_fps = int(fps)
+        frame_index = 0
+        _reset_cache_fresh()
+        latest_action = 0  # first action = 0 (init)
+        stream_thread = FrameScheduler(fps=target_fps, n_steps=n_steps, cfg=cfg, clamp=clamp)
+        stream_running = True
+        stream_thread.start()
+def stop_stream():
+    global stream_thread, stream_running
+    if stream_thread is not None:
+        stream_thread.stop()
+        stream_thread.join(timeout=1.0)
+        stream_thread = None
+    stream_running = False
+@socketio.on_error_default
+def default_error_handler(e):
+    print(f"SocketIO error: {e}")
+    import traceback
+    traceback.print_exc()
+@socketio.on('connect')
+def handle_connect():
+    try:
+        sid = request.sid
+        print(f'Client connected: {sid}')
+        # Immediately tell the new client current readiness
+        emit('server_status', {
+            'ready': server_ready,
+            'busy': False
+        })
+        emit('connected', {
+            'status': 'connected',
+            'model_loaded': model is not None,
+            'ready': server_ready
+        })
+    except Exception as e:
+        print(f"Error in handle_connect: {e}")
+        import traceback
+        traceback.print_exc()
+@socketio.on('disconnect')
+def handle_disconnect(*args):
+    sid = request.sid
+    print(f'Client disconnected: {sid}')
+    # Note: We don't stop the stream on disconnect since multiple users can be connected
+@socketio.on('start_stream')
+def handle_start_stream(data):
+    try:
+        if not server_ready:
+            # Tell client to keep showing spinner
+            emit('server_status', {'ready': server_ready, 'busy': False})
+            return
+        n_steps = int(data.get('n_steps', 8))
+        cfg = float(data.get('cfg', 0))
+        fps = int(data.get('fps', 30))
+        clamp = bool(data.get('clamp', True))
+        print(f"Starting stream @ {fps} FPS (n_steps={n_steps}, cfg={cfg}, clamp={clamp})")
+        try:
+            start_stream(n_steps=n_steps, cfg=cfg, fps=fps, clamp=clamp)
+            emit('stream_started', {'status': 'ok'})
+        except Exception as e:
+            print(f"Error starting stream: {e}")
+            import traceback
+            traceback.print_exc()
+            emit('error', {'message': str(e)})
+    except Exception as e:
+        print(f"Error in handle_start_stream: {e}")
+        import traceback
+        traceback.print_exc()
+        emit('error', {'message': f'Failed to start stream: {str(e)}'})
+@socketio.on('action')
+def handle_action(data):
+    global latest_action
+    action = int(data.get('action', 1))
+    with stream_lock:
+        latest_action = action
+    emit('action_ack', {'received': action, 'will_apply_to_frame_index': frame_index})
+@socketio.on('stop_stream')
+def handle_stop_stream():
+    print('Stopping stream')
+    stop_stream()
+# --------------------------
+# Entrypoint
+# --------------------------
+if __name__ == '__main__':
+    # Start model initialization in background thread so server starts immediately
+    init_thread = threading.Thread(target=initialize_model, daemon=True)
+    init_thread.start()
+    # Use PORT environment variable for Hugging Face Spaces, default to 7860
+    port = int(os.environ.get('PORT', 7860))
+    print(f"Starting Flask server on http://0.0.0.0:{port}")
+    print("Model will load in background...")
+    socketio.run(app, host='0.0.0.0', port=port, debug=False, allow_unsafe_werkzeug=True, use_reloader=False)

checkpoints/ckpt-step=053700-metric=0.00092727.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f3813cf639d5370bb90be4bc3974de5b6858a9cb4216458f757c0d415537d0d6
+size 235359093

cleanup.sh ADDED Viewed

	@@ -0,0 +1,40 @@

+#!/bin/bash
+# Cleanup script - removes temporary files and scripts from toy-wm-hf-space
+set -e
+echo "🧹 Cleaning up temporary files..."
+echo ""
+CLEANUP_DIR="/share/u/wendler/code/toy-wm-hf-space"
+if [ ! -d "$CLEANUP_DIR" ]; then
+    echo "⚠️  Directory $CLEANUP_DIR doesn't exist, skipping cleanup"
+    exit 0
+fi
+cd "$CLEANUP_DIR"
+echo "Removing temporary scripts..."
+rm -f push.sh push-now.sh push-force.sh setup-git.sh setup.sh fix-remote.sh fix-and-push.sh push-to-hf.py 2>/dev/null || true
+echo "Removing temporary documentation files..."
+rm -f RUN_SETUP.md TROUBLESHOOTING.md SETUP_STEPS.md SETUP_GUIDE.md QUICKSTART.md START_HERE.md 2>/dev/null || true
+echo "Keeping essential files:"
+echo "  ✅ app.py"
+echo "  ✅ Dockerfile"
+echo "  ✅ requirements.txt"
+echo "  ✅ README.md"
+echo "  ✅ DEPLOYMENT.md"
+echo "  ✅ SOURCE_FILES.md"
+echo "  ✅ .gitignore"
+echo "  ✅ All source code and checkpoints"
+echo ""
+echo "✅ Cleanup complete!"
+echo ""
+echo "The toy-wm-hf-space directory still contains your files"
+echo "but temporary scripts have been removed."

configs/inference.yaml ADDED Viewed

	@@ -0,0 +1,50 @@

+model:
+  model_id: "dit_dforce"
+  width: 24
+  height: 24
+  T: 1000
+  in_channels: 3
+  n_window: 30
+  patch_size: 3
+  n_heads: 12
+  d_model: 384
+  n_blocks: 8
+  C: 5000
+  bidirectional: false
+  nocompile: false
+  checkpoint: "checkpoints/ckpt-step=053700-metric=0.00092727.pt"
+  # "experiments/dulcet-disco-547/ckpt-step=000800-metric=0.00384521.pt"
+  #"experiments/radiant-forest-398/ckpt-step=053700-metric=0.00092727.pt"
+  # "experiments/polished-paper-531/model.pt"
+  #"experiments/polished-paper-531/ckpt-step=000200-metric=0.00251065.pt"
+  #"experiments/polished-paper-531/ckpt-step=000800-metric=0.00450636.pt"
+  #checkpoint: "experiments/radiant-forest-398/ckpt-step=053700-metric=0.00092727.pt"
+  # few frame 2-step distilled model experiments/smart-waterfall-528/model.pt
+  # few frame 1-step distilled model experiments/blooming-flower-530/model.pt
+  #"experiments/rich-meadow-488/model.pt"
+  # "experiments/rich-meadow-488/ckpt-step=003700-metric=0.00309512.pt"
+  #checkpoint: "experiments/radiant-forest-398/ckpt-step=053700-metric=0.00092727.pt"
+  #checkpoint: "experiments/glad-water-486/model.pt"
+  #checkpoint: "experiments/dutiful-river-427/ckpt-step=011600-metric=0.00229805.pt"
+  #checkpoint: "experiments/iconic-paper-421/ckpt-step=001600-metric=0.00471355.pt"
+  #"experiments/radiant-forest-398/ckpt-step=053700-metric=0.00092727.pt"
+  #checkpoint: "experiments/frosty-sunset-395/ckpt-step=002100-metric=0.00160125.pt"
+  #checkpoint: "experiments/vivid-sea-390/ckpt-step=000700-metric=0.01958773.pt"
+train:
+  lr1: 0.0002
+  lr2: 1.5e-6
+  betas: [0.9, 0.95]
+  weight_decay: 1.0e-5
+  max_steps: 20000
+  batch_size: 16
+  noclip: false
+  duration: 1
+  fps: 31
+  debug: false
+  p_pretrain: 0.95
+wandb:
+  name: null
+  project: "toy-wm"
+  run_name: "causal-layers8-heads12-d384"

push-and-cleanup.sh ADDED Viewed

	@@ -0,0 +1,69 @@

+#!/bin/bash
+# Complete script: Push from pong directory and clean up
+set -e
+echo "🚀 Neural Pong - Complete Push and Cleanup"
+echo "==========================================="
+echo ""
+# Step 1: Push from pong directory
+echo "Step 1: Pushing from /share/u/wendler/code/pong"
+echo "-----------------------------------------------"
+cd /share/u/wendler/code/pong
+if [ ! -d ".git" ]; then
+    echo "❌ Error: Not a git repository in pong directory"
+    exit 1
+fi
+# Stage and commit
+git add .
+if ! git diff --cached --quiet; then
+    git commit -m "Add Neural Pong application files" || git commit --amend --no-edit
+fi
+# Push
+BRANCH=$(git branch --show-current 2>/dev/null || echo "main")
+echo "Pushing to origin/$BRANCH..."
+if ! git push -u origin $BRANCH 2>&1; then
+    echo "Trying force push..."
+    git push -u origin $BRANCH --force
+fi
+echo ""
+echo "✅ Successfully pushed to Hugging Face Spaces!"
+echo ""
+# Step 2: Cleanup temporary files
+echo "Step 2: Cleaning up temporary files"
+echo "-----------------------------------"
+CLEANUP_DIR="/share/u/wendler/code/toy-wm-hf-space"
+if [ -d "$CLEANUP_DIR" ]; then
+    cd "$CLEANUP_DIR"
+    echo "Removing temporary scripts..."
+    rm -f push.sh push-now.sh push-force.sh setup-git.sh setup.sh \
+         fix-remote.sh fix-and-push.sh push-to-hf.py 2>/dev/null || true
+    echo "Removing temporary documentation..."
+    rm -f RUN_SETUP.md TROUBLESHOOTING.md SETUP_STEPS.md \
+         SETUP_GUIDE.md QUICKSTART.md START_HERE.md 2>/dev/null || true
+    echo "✅ Cleanup complete!"
+else
+    echo "⚠️  Cleanup directory not found, skipping"
+fi
+echo ""
+echo "==========================================="
+echo "✅ All done!"
+echo "==========================================="
+echo ""
+echo "🌐 Your Space: https://huggingface.co/spaces/wendlerc/pong"
+echo "📁 Working directory: /share/u/wendler/code/pong"
+echo ""
+echo "The build should start automatically. Check the Logs tab for progress."

push.sh ADDED Viewed

	@@ -0,0 +1,64 @@

+#!/bin/bash
+# Push script for /share/u/wendler/code/pong
+# This will push all files to Hugging Face Spaces
+set -e
+cd /share/u/wendler/code/pong
+echo "🚀 Pushing Neural Pong to Hugging Face Spaces..."
+echo ""
+# Check if we're in a git repo
+if [ ! -d ".git" ]; then
+    echo "❌ Error: Not a git repository"
+    exit 1
+fi
+# Stage all files
+echo "📁 Staging files..."
+git add .
+# Check status
+echo ""
+echo "📋 Files to be committed:"
+git status --short | head -20
+# Commit changes
+echo ""
+if git diff --cached --quiet; then
+    echo "✅ No changes to commit"
+else
+    echo "💾 Committing changes..."
+    git commit -m "Add Neural Pong application files" || git commit --amend --no-edit
+    echo "✅ Changes committed"
+fi
+# Check remote
+echo ""
+echo "🔗 Checking remote..."
+git remote -v
+# Check branch
+BRANCH=$(git branch --show-current 2>/dev/null || echo "main")
+echo "🌿 Current branch: $BRANCH"
+# Push
+echo ""
+echo "📤 Pushing to Hugging Face Spaces..."
+if git push -u origin $BRANCH 2>&1; then
+    echo ""
+    echo "✅ Successfully pushed!"
+else
+    echo ""
+    echo "⚠️  Push failed, trying force push..."
+    git push -u origin $BRANCH --force
+    echo ""
+    echo "✅ Force pushed successfully!"
+fi
+echo ""
+echo "🌐 Your Space is available at:"
+echo "   https://huggingface.co/spaces/wendlerc/pong"
+echo ""
+echo "The build should start automatically. Check the Logs tab for progress."

requirements.txt ADDED Viewed

	@@ -0,0 +1,25 @@

+# Core ML framework
+torch>=2.0.0
+torchvision>=0.15.0
+# Web framework
+flask>=3.1.0
+flask-cors>=6.0.0
+flask-socketio>=5.5.0
+eventlet>=0.40.0
+# Data processing
+numpy>=1.24.0
+pillow>=10.0.0
+einops>=0.7.0
+# Configuration
+pyyaml>=6.0
+omegaconf>=2.3.0
+# Type hints
+jaxtyping>=0.2.0
+# Hugging Face Hub (for model loading if needed)
+huggingface-hub>=0.20.0

setup.sh ADDED Viewed

	@@ -0,0 +1,55 @@

+#!/bin/bash
+# Quick setup script for Hugging Face Space deployment
+set -e
+echo "🚀 Neural Pong - Hugging Face Space Setup"
+echo "=========================================="
+echo ""
+# Check if we're in the right directory
+if [ ! -f "app.py" ] || [ ! -f "Dockerfile" ]; then
+    echo "❌ Error: Please run this script from the toy-wm-hf-space directory"
+    exit 1
+fi
+# Check if checkpoint exists
+if [ ! -f "checkpoints/ckpt-step=053700-metric=0.00092727.pt" ]; then
+    echo "❌ Error: Checkpoint file not found!"
+    echo "   Expected: checkpoints/ckpt-step=053700-metric=0.00092727.pt"
+    exit 1
+fi
+echo "✅ Checkpoint file found"
+echo "✅ All required files present"
+echo ""
+# Check if git is initialized
+if [ ! -d ".git" ]; then
+    echo "📦 Initializing git repository..."
+    git init
+    echo "✅ Git initialized"
+else
+    echo "✅ Git repository already initialized"
+fi
+echo ""
+echo "📋 Next steps:"
+echo ""
+echo "1. Create a Hugging Face Space:"
+echo "   - Go to https://huggingface.co/spaces"
+echo "   - Click 'Create new Space'"
+echo "   - Name: neural-pong (or your choice)"
+echo "   - SDK: Docker"
+echo "   - Hardware: GPU (T4 small or larger)"
+echo ""
+echo "2. Add the remote and push:"
+echo "   git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME"
+echo "   git add ."
+echo "   git commit -m 'Initial commit'"
+echo "   git push -u origin main"
+echo ""
+echo "3. Wait for build (5-15 minutes)"
+echo ""
+echo "📖 For detailed instructions, see SETUP_GUIDE.md"
+echo ""

src/__init__.py ADDED Viewed

File without changes

src/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (162 Bytes). View file

src/config.py ADDED Viewed

	@@ -0,0 +1,59 @@

+from dataclasses import dataclass, field
+from typing import List, Optional
+import yaml
+from omegaconf import OmegaConf
+@dataclass
+class TransformerConfig:
+    model_id : str = None
+    width : int = 24
+    height : int = 24
+    T : int = 1000
+    in_channels : int = 3
+    n_window : int = 7
+    patch_size : int = 2
+    n_heads : int = 4
+    d_model : int = 64
+    n_blocks : int = 12
+    n_heads : int = 12
+    d_model : int = 384
+    patch_size : int = 1
+    bidirectional : bool = True
+    nocompile : bool = False
+    checkpoint : str = None
+@dataclass
+class TrainingConfig:
+    lr1 : float = 0.002
+    lr2 : float = 3e-5
+    betas : tuple = (0.9, 0.95)
+    weight_decay : float = 1e-5
+    max_steps : int = 26000
+    batch_size : int = 32
+    noclip : bool = False
+    duration : int = 1
+    fps : int = 7
+    in_channels : int = 3
+    debug : bool = False
+@dataclass
+class WANDBConfig:
+    name : str = "toy-wm"
+    project : str = None
+    run_name : str = None
+@dataclass
+class Config:
+    model: TransformerConfig
+    train: TrainingConfig
+    wandb: WANDBConfig
+    @classmethod
+    def from_yaml(cls, path):
+        with open(path) as f:
+            raw_cfg = yaml.safe_load(f)
+        cfg = OmegaConf.create(raw_cfg)
+        return OmegaConf.structured(cls(**cfg))

src/datasets/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # Datasets module
2	+

src/datasets/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (171 Bytes). View file

src/datasets/__pycache__/pong1m.cpython-311.pyc ADDED Viewed

Binary file (4.54 kB). View file

src/datasets/pong1m.py ADDED Viewed

	@@ -0,0 +1,62 @@

+from torch.utils.data import TensorDataset, DataLoader
+from torch import nn
+import torch as t
+import numpy as np
+from einops import rearrange
+mean = t.tensor([[[[[0.0352]],
+                    [[0.1046]],
+                    [[0.1046]]]]])
+std = t.tensor([[[[[0.1066]],
+                    [[0.0995]],
+                    [[0.0995]]]]])
+def fixed2frame(y, lam=1e-6):
+    y = y.clamp(-1, 1) * 0.5 + 0.5
+    frames = (y * 255.0).round().byte()
+    return frames
+def z2frame(y, lam=1e-6, mean=mean, std=std):
+    y = y*std.to(y.dtype).to(y.device) + mean.to(y.dtype).to(y.device)
+    frames = (y.clamp(0, 1) * 255.0).round().byte()
+    return frames
+def get_loader(batch_size=64, fps=30, duration=5, shuffle=True, debug=False, mode="-1,1", mean=mean, std=std, drop_duration=False):
+    frames = t.from_numpy(np.load("./datasets/pong1M/frames.npy"))
+    actions = t.from_numpy(np.load("./datasets/pong1M/actions.npy"))
+    height, width, channels = frames.shape[-3:]
+    n = frames.shape[0]//(fps*duration)
+    frames = frames[:n*fps*duration]
+    frames = frames.reshape(n, fps*duration, height, width, channels)
+    frames = frames.permute(0, 1, 4, 2, 3)
+    actions = actions[:n*fps*duration]
+    actions = actions.reshape(-1, fps*duration)
+    b, dur, c, h, w = frames.shape
+    if mode == "-1,1":
+        z = rearrange(frames, "b dur c h w -> (b dur h w) c")
+        mask = (z == t.tensor([6, 24, 24], dtype=z.dtype)).all(dim=1)
+        z = (z.float()/255.0 - 0.5)*2
+        z[mask] = 0
+        z = rearrange(z, "(b dur h w) c -> b dur c h w", b=b, dur=dur, c=c, h=h, w=w)
+        frames = z
+        pred2frame = fixed2frame
+    elif mode == "z":
+        frames = frames.float()/255.0
+        frames = (frames - mean) / (std + 1e-6)
+        pred2frame = z2frame
+    else:
+        raise ValueError(f"Invalid mode: {mode}")
+    firstf = frames[0]
+    firsta = actions[0]
+    if debug:
+        frames = 0*frames + firstf[None]
+        actions = 0*actions + firsta[None]
+        frames = 0*frames + frames[:,0].unsqueeze(1)
+    if drop_duration:
+        dataset = TensorDataset(frames[:, 0], actions[:,0]*0)
+    else:
+        dataset = TensorDataset(frames, actions)
+    loader = DataLoader(dataset, batch_size=batch_size, shuffle=shuffle)
+    print(f"{frames.shape[0]//batch_size} batches")
+    return loader, pred2frame

src/inference/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ from .sampling import sample, sample_with_grad

src/inference/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (258 Bytes). View file

src/inference/__pycache__/sampling.cpython-311.pyc ADDED Viewed

Binary file (2.13 kB). View file

src/inference/sampling.py ADDED Viewed

	@@ -0,0 +1,23 @@

+import torch as t
+@t.no_grad()
+def sample(v, z, actions, num_steps=10, cfg=0, negative_actions=None):
+    return sample_with_grad(v, z, actions, num_steps, cfg, negative_actions)
+def sample_with_grad(v, z, actions, num_steps=10, cfg=0, negative_actions=None):
+    device = v.device
+    ts = 1 - t.linspace(0, 1, num_steps+1, device=device)
+    ts = 3*ts/(2*ts + 1)
+    z_prev = z.clone()
+    z_prev = z_prev.to(device)
+    for i in range(len(ts)-1):
+        t_cond = ts[i].repeat(z_prev.shape[0], 1)
+        v_pred = v(z_prev.to(device), actions.to(device), t_cond.to(device))
+        if cfg > 0:
+            if negative_actions is not None:
+                v_neg = v(z_prev.to(device), negative_actions.to(device), t_cond.to(device))
+            else:
+                v_neg = v(z_prev.to(device), t.zeros_like(actions, dtype=t.long, device=device), t_cond.to(device))
+            v_pred = v_neg + cfg * (v_pred - v_neg)
+        z_prev = z_prev + (ts[i] - ts[i+1])*v_pred
+    return z_prev

src/models/__init__.py ADDED Viewed

File without changes

src/models/dit_dforce.py ADDED Viewed

	@@ -0,0 +1,206 @@

+import torch as t
+from torch import nn
+import torch.nn.functional as F
+from ..nn.attn import Attention, AttentionEinOps, KVCache
+from ..nn.patch import Patch, UnPatch
+from ..nn.geglu import GEGLU
+from ..nn.pe import FrameRoPE, NumericEncoding, RoPE
+from jaxtyping import Float, Bool, Int
+from torch import Tensor
+from typing import Optional
+import math
+def modulate(x, shift, scale):
+    return x * (1 + scale) + shift
+class CausalBlock(nn.Module):
+    def __init__(self, layer_idx, d_model, expansion, n_heads, rope=None):
+        super().__init__()
+        self.layer_idx = layer_idx
+        self.d_model = d_model
+        self.expansion = expansion
+        self.n_heads = n_heads
+        self.norm1 = nn.LayerNorm(d_model)
+        if t.backends.mps.is_available():
+            self.selfattn = AttentionEinOps(d_model, n_heads, rope=rope)
+        else:
+            self.selfattn = AttentionEinOps(d_model, n_heads, rope=rope) # there is a problem with flexattn i think
+        self.norm2 = nn.LayerNorm(d_model)
+        self.geglu = GEGLU(d_model, expansion*d_model, d_model)
+        self.modulation = nn.Sequential(
+            nn.SiLU(),
+            nn.Linear(d_model, 6 * d_model, bias=True),
+        )
+    def forward(self, z, cond, mask_self, cache: Optional[KVCache] = None):
+        # batch durseq1 d
+        # batch durseq2 d
+        mu1, sigma1, c1, mu2, sigma2, c2 = self.modulation(cond).chunk(6, dim=-1)
+        residual = z
+        z = modulate(self.norm1(z), mu1, sigma1)
+        if cache is not None:
+            k, v = cache.get(self.layer_idx)
+            offset = cache.global_location # this enables to include rope and ln into the cache
+            offset = 0 # this is for reapplying rope again and again to stay more similar to training
+            z, k_new, v_new = self.selfattn(z, z, mask=mask_self, k_cache=k, v_cache=v, offset=offset)
+            cache.extend(self.layer_idx, k_new, v_new)
+        else:
+            z, _, _ = self.selfattn(z, z, mask=mask_self)
+        z = residual + c1*z
+        residual = z
+        z = modulate(self.norm2(z), mu2, sigma2)
+        z = self.geglu(z)
+        z = residual + c2*z
+        return z
+class CausalDit(nn.Module):
+    def __init__(self, height, width, n_window, d_model, T=1000, in_channels=3,
+                       patch_size=2, n_heads=8, expansion=4, n_blocks=6,
+                       n_registers=1, n_actions=4, bidirectional=False,
+                       debug=False,
+                       legacy=False,
+                       frame_rope=False,
+                       rope_C=10000,
+                       rope_tmax=None):
+        super().__init__()
+        self.height = height
+        self.width = width
+        self.n_window = n_window
+        self.d_model = d_model
+        self.n_heads = n_heads
+        self.d_head = self.d_model // self.n_heads
+        self.n_blocks = n_blocks
+        self.expansion = expansion
+        self.n_registers = n_registers
+        self.T = T
+        self.patch_size = patch_size
+        self.debug = debug
+        self.legacy = legacy
+        self.bidirectional = bidirectional
+        self.frame_rope = frame_rope
+        self.toks_per_frame = (height//patch_size)*(width//patch_size) + n_registers
+        self.rope_C = rope_C
+        if frame_rope:
+            print("Using frame rope")
+            print(self.toks_per_frame)
+            self.rope_seq = FrameRoPE(d_model//n_heads, self.n_window, self.toks_per_frame, C=rope_C)
+            self.grid_pe = nn.Parameter(t.randn(self.toks_per_frame - n_registers, d_model) * 1/d_model**0.5)
+        else:
+            if rope_tmax is None:
+                rope_tmax = self.n_window*self.toks_per_frame
+            self.rope_seq = RoPE(d_model//n_heads, rope_tmax, C=rope_C)
+            self.grid_pe = None
+        self.rope_tmax = rope_tmax
+        self.blocks = nn.ModuleList([CausalBlock(lidx, d_model, expansion, n_heads, rope=self.rope_seq) for lidx in range(n_blocks)])
+        self.patch = Patch(in_channels=in_channels, out_channels=d_model, patch_size=patch_size)
+        self.norm = nn.LayerNorm(d_model)
+        self.unpatch = UnPatch(height, width, in_channels=d_model, out_channels=in_channels, patch_size=patch_size)
+        self.action_emb = nn.Embedding(n_actions, d_model)
+        self.registers = nn.Parameter(t.randn(n_registers, d_model) * 1/d_model**0.5)
+        self.time_emb = NumericEncoding(dim=d_model, n_max=T)
+        self.time_emb_mixer = nn.Linear(d_model, d_model)
+        self.modulation = nn.Sequential(
+            nn.SiLU(),
+            nn.Linear(d_model, 2 * d_model, bias=True),
+        )
+        self.cache = None
+    def activate_caching(self, batch_size, max_frames=None, cache_rope=False):
+        self.cache = KVCache(batch_size, self.n_blocks, self.n_heads, self.d_head, self.toks_per_frame, self.n_window, dtype=self.dtype, device=self.device)
+        if max_frames is not None:
+            self.rope_seq = RoPE(self.d_head, max_frames*self.toks_per_frame, C=self.rope_C)
+            print(self.rope_seq.sins.shape, self.rope_seq.coss.shape)
+            self.rope_seq.to(self.device)
+            self.rope_seq.to(self.dtype)
+            for idx, block in enumerate(self.blocks):
+                print("updating rope for block", idx)
+                print(self.blocks[idx].selfattn.rope.sins.shape, self.blocks[idx].selfattn.rope.coss.shape)
+                self.blocks[idx].selfattn.rope = self.rope_seq
+                print(self.blocks[idx].selfattn.rope.sins.shape, self.blocks[idx].selfattn.rope.coss.shape)
+    def deactivate_caching(self):
+        self.cache = None
+    def forward(self,
+                z: Float[Tensor, "batch dur channels height width"],
+                actions: Float[Tensor, "batch dur"],
+                ts: Int[Tensor, "batch dur"]):
+        if ts.shape[1] == 1:
+            ts = ts.repeat(1, z.shape[1])
+        a = self.action_emb(actions) # batch dur d
+        ts_scaled = (ts * self.T).clamp(0, self.T - 1).long()
+        cond = self.time_emb_mixer(self.time_emb(ts_scaled)) + a
+        #print(ts_scaled.shape, a.shape, cond.shape, actions.shape)
+        cond = cond.repeat_interleave(self.toks_per_frame, dim=1)
+        z = self.patch(z) # batch dur seq d
+        if self.grid_pe is not None:
+            z = z + self.grid_pe[None, None]
+        # self.registers is in 1x
+        zr = t.cat((z, self.registers[None, None].repeat([z.shape[0], z.shape[1], 1, 1])), dim=2)# z plus registers
+        if self.bidirectional:
+            mask_self = None
+        else:
+            mask_self = self.causal_mask
+        batch, durzr, seqzr, d = zr.shape
+        zr = zr.reshape(batch, -1, d) # batch durseq d
+        for block in self.blocks:
+            zr = block(zr, cond, mask_self, cache=self.cache)
+        mu, sigma = self.modulation(cond).chunk(2, dim=-1)
+        zr = modulate(self.norm(zr), mu, sigma)
+        zr = zr.reshape(batch, durzr, seqzr, d)
+        out = self.unpatch(zr[:, :, :-self.n_registers])
+        return out # batch dur channels height width
+    @property
+    def causal_mask(self):
+        size = self.n_window
+        m_self = t.tril(t.ones((size, size), dtype=t.int8, device=self.device)) #- t.tril(t.ones((size, size), dtype=t.int8, device=self.device), diagonal=-self.n_window)
+        m_self = t.kron(m_self, t.ones((self.toks_per_frame, self.toks_per_frame), dtype=t.int8, device=self.device))
+        m_self = m_self.to(bool)
+        return ~ m_self # we want to mask out the ones
+    @property
+    def device(self):
+        return self.parameters().__next__().device
+    @property
+    def dtype(self):
+        return self.parameters().__next__().dtype
+def get_model(height, width, n_window=5, d_model=64, T=100, n_blocks=2, patch_size=2, n_heads=8, bidirectional=False, in_channels=3, frame_rope=False, C=10000):
+    return CausalDit(height, width, n_window, d_model, T, in_channels=in_channels, n_blocks=n_blocks, patch_size=patch_size, n_heads=n_heads, bidirectional=bidirectional, frame_rope=frame_rope, rope_C=C)
+if __name__ == "__main__":
+    print("running w/o cache")
+    dit = CausalDit(20, 20, 100, 64, 5, n_blocks=2)
+    z = t.rand((2, 6, 3, 20, 20))
+    actions = t.randint(4, (2, 6))
+    ts = t.rand((2, 6))
+    out = dit(z, actions, ts)
+    print(z.shape)
+    print(out.shape)
+    print("running w cache")
+    dit = CausalDit(20, 20, 10, 64, 5, n_blocks=2)
+    dit.activate_caching(2)
+    print(dit.cache.toks_per_frame)
+    print(dit.cache.size)
+    for i in range(30):
+        print(dit.cache.local_loc)
+        print(dit.cache.global_loc)
+        z = t.rand((2, 1, 3, 20, 20))
+        actions = t.randint(4, (2, 1))
+        ts = t.rand((2, 1))
+        out = dit(z, actions, ts)
+        print(i, z.shape)
+        print(i, out.shape)

src/nn/__init__.py ADDED Viewed

File without changes

src/nn/attn.py ADDED Viewed

	@@ -0,0 +1,473 @@

+from torch import nn
+from torch.nn import functional as F
+import torch as t
+import einops
+from jaxtyping import Float, Bool
+from torch import Tensor
+from typing import Optional
+from torch.nn.attention.flex_attention import flex_attention
+class KVCache(nn.Module):
+    """
+    Rolling KV cache implemented as a ring buffer.
+    - Shapes:
+        keys/values per extend(): (batch_size, T, n_heads, d_head)
+    - Internal storage:
+        (n_layers, batch_size, size, n_heads, d_head) where size = toks_per_frame * n_window
+    - Semantics:
+        Call `extend(layer_idx, k, v)` once per layer for the *same* frame.
+        Call `update_global_location(n_frames)` once after all layers to commit the frame(s).
+    """
+    def __init__(self, batch_size, n_layers, n_heads, d_head, toks_per_frame, n_window, *, dtype=None, device=None, enforce_layer_order=True):
+        super().__init__()
+        self.batch_size = batch_size
+        self.n_layers = n_layers
+        self.n_heads = n_heads
+        self.d_head = d_head
+        self.toks_per_frame = toks_per_frame
+        self.n_window = n_window
+        self.size = (toks_per_frame * n_window) #toks_per_frame # (toks_per_frame * n_window)
+        # Pointers / counters
+        self.curr_layer = 0                 # which layer are we writing for this frame
+        self.global_loc = 0                 # total tokens ever committed
+        self.local_loc = 0                  # valid tokens in buffer (<= size)
+        self._write_ptr = 0                 # ring-buffer write pointer (index of next commit position)
+        # Storage
+        dtype = dtype if dtype is not None else t.float32
+        self.register_buffer('keys',   t.zeros(n_layers, batch_size, self.size, n_heads, d_head, dtype=dtype, device=device))
+        self.register_buffer('values', t.zeros(n_layers, batch_size, self.size, n_heads, d_head, dtype=dtype, device=device))
+        # Misc
+        self.enforce_layer_order = enforce_layer_order
+    # -------------- Public API --------------
+    def get(self, layer_idx):
+        """Return (K, V) for given layer in chronological order: shape (B, L, H, D) where L = local_loc."""
+        self._check_layer(layer_idx)
+        if self.local_loc == 0:
+            # return empty views
+            empty = self.keys[layer_idx, :, :0]
+            return empty, empty
+        start = (self._write_ptr - self.local_loc) % self.size
+        if start + self.local_loc <= self.size:
+            # contiguous slice
+            k = self.keys[layer_idx, :, start:start + self.local_loc]
+            v = self.values[layer_idx, :, start:start + self.local_loc]
+        else:
+            # wrap: concatenate two slices to maintain chronological order
+            first = self.size - start
+            k = t.cat([
+                self.keys[layer_idx, :, start:self.size],
+                self.keys[layer_idx, :, 0:(self.local_loc - first)]
+            ], dim=1)
+            v = t.cat([
+                self.values[layer_idx, :, start:self.size],
+                self.values[layer_idx, :, 0:(self.local_loc - first)]
+            ], dim=1)
+        return k, v
+    @t.no_grad()
+    def extend(self, layer_idx, keys, values):
+        """
+        Stage (but do not commit) tokens for the current frame for the given layer.
+        Call update_global_location(n_frames) to commit after all layers wrote.
+        """
+        assert keys.shape == values.shape, f"keys and values shapes must match, got {keys.shape} vs {values.shape}"
+        self._check_layer(layer_idx)
+        # Expected shape: (B, T, H, D)
+        B, T, H, D = keys.shape
+        assert B == self.batch_size, f"batch mismatch: expected {self.batch_size}, got {B}"
+        assert H == self.n_heads and D == self.d_head, f"heads/d_head mismatch: expected {(self.n_heads, self.d_head)}, got {(H, D)}"
+        assert T > 0 and T <= self.size, f"T must be in 1..{self.size}, got {T}"
+        # Optional: if you only ever append whole frames:
+        # assert T == self.toks_per_frame, f"T must equal toks_per_frame ({self.toks_per_frame}), got {T}"
+        # Cast to buffer dtype/device if needed
+        if keys.dtype != self.keys.dtype or keys.device != self.keys.device:
+            keys = keys.to(dtype=self.keys.dtype, device=self.keys.device)
+        if values.dtype != self.values.dtype or values.device != self.values.device:
+            values = values.to(dtype=self.values.dtype, device=self.values.device)
+        # Write into the ring at the *current* write_ptr (uncommitted until update_global_location)
+        i0 = self._write_ptr
+        i1 = (self._write_ptr + T) % self.size
+        if i0 < i1:
+            self.keys[layer_idx, :, i0:i1] = keys
+            self.values[layer_idx, :, i0:i1] = values
+        else:
+            # wraps: split write
+            split = self.size - i0
+            self.keys[layer_idx, :, i0:self.size] = keys[:, :split]
+            self.values[layer_idx, :, i0:self.size] = values[:, :split]
+            self.keys[layer_idx, :, 0:i1] = keys[:, split:]
+            self.values[layer_idx, :, 0:i1] = values[:, split:]
+        # Advance expected layer (but do *not* advance write_ptr/local_len here)
+        self.curr_layer = (self.curr_layer + 1) % self.n_layers
+    @t.no_grad()
+    def update_global_location(self, n_frames):
+        """
+        Commit staged writes for n_frames (advances the ring write pointer once per frame).
+        Keep calling extend(layer_idx, ...) for each layer before you call this.
+        """
+        assert n_frames >= 0, f"n_frames must be >= 0, got {n_frames}"
+        tokens = n_frames * self.toks_per_frame
+        if tokens == 0:
+            return
+        assert tokens <= self.size, f"Cannot commit {tokens} tokens (> buffer size {self.size})."
+        self.global_loc += tokens
+        # Update valid length (never exceeds capacity)
+        self.local_loc = min(self.size, self.local_loc + tokens)
+        # Advance write pointer
+        self._write_ptr = (self._write_ptr + tokens) % self.size
+    @t.no_grad()
+    def reset(self, zero_memory: bool = True):
+        self.global_loc = 0
+        self.local_loc = 0
+        self.curr_layer = 0
+        self._write_ptr = 0
+        if zero_memory:
+            self.keys.zero_()
+            self.values.zero_()
+    # -------------- Convenience / Introspection --------------
+    @property
+    def local_location(self):
+        return self.local_loc
+    @property
+    def global_location(self):
+        return self.global_loc
+    @property
+    def device(self):
+        return self.keys.device
+    @property
+    def dtype(self):
+        return self.keys.dtype
+    def get_recent(self, layer_idx, last_T):
+        """Return the most recent last_T tokens for a layer (chronological)."""
+        self._check_layer(layer_idx, allow_any=True)
+        last_T = min(last_T, self.local_loc)
+        if last_T == 0:
+            empty = self.keys[layer_idx, :, :0]
+            return empty, empty
+        start = (self._write_ptr - last_T) % self.size
+        if start + last_T <= self.size:
+            k = self.keys[layer_idx, :, start:start + last_T]
+            v = self.values[layer_idx, :, start:start + last_T]
+        else:
+            first = self.size - start
+            k = t.cat([self.keys[layer_idx, :, start:self.size], self.keys[layer_idx, :, 0:(last_T - first)]], dim=1)
+            v = t.cat([self.values[layer_idx, :, start:self.size], self.values[layer_idx, :, 0:(last_T - first)]], dim=1)
+        return k, v
+    # -------------- Internal checks --------------
+    def _check_layer(self, layer_idx, allow_any=False):
+        assert 0 <= layer_idx < self.n_layers, f"layer_idx out of range: 0..{self.n_layers-1}, got {layer_idx}"
+        if self.enforce_layer_order and not allow_any:
+            assert layer_idx == (self.curr_layer % self.n_layers), \
+                f"Layer order mismatch: expected {self.curr_layer % self.n_layers}, got {layer_idx}"
+class KVCacheMine(nn.Module): # this does not work because it destroys the cache of later timesteps when the earlier ones overflow and move to the left. --> fix as an exercise.
+    def __init__(self, batch_size, n_layers, n_heads, d_head, toks_per_frame, n_window):
+        """
+        This is a rolling KVCache
+        """
+        super().__init__()
+        self.batch_size = batch_size
+        self.n_heads = n_heads
+        self.d_head = d_head
+        self.toks_per_frame = toks_per_frame
+        self.n_window = n_window
+        self.size = toks_per_frame * n_window#5*n_window#(n_window + 1)
+        self.n_layers = n_layers
+        self.curr_layer = 0
+        self.global_loc = 0
+        self.local_loc = 0
+        self.register_buffer('keys', t.zeros(n_layers, batch_size, self.size, n_heads, d_head))
+        self.register_buffer('values', t.zeros(n_layers, batch_size, self.size, n_heads, d_head))
+    def get(self, layer_idx):
+        assert layer_idx == self.curr_layer, f"layer idx should be the same as our internal counter but we got {layer_idx} and internal is {self.curr_layer}."
+        return self.keys[layer_idx, :, :self.local_loc], self.values[layer_idx, :, :self.local_loc]
+    def extend(self, layer_idx, keys, values):
+        assert keys.shape == values.shape, f"keys and values shapes must match {self.keys.shape} != {self.values.shape}"
+        assert layer_idx == self.curr_layer, f"layer idx should be the same as our internal counter but we got {layer_idx} and internal is {self.curr_layer}."
+        assert self.local_loc <= self.size, f"the cache size should be between 0 and {self.size}"
+        local_loc = self.local_loc
+        if local_loc == self.size:
+            # move to the left
+            local_loc -= keys.shape[1]
+            assert local_loc >= 0, f"the cache update {keys.shape[1]} was larger than the cache {self.size}, that's not supported for now."
+            assert local_loc % self.toks_per_frame == 0, f"the number of elements in the cache {local_loc} must be a multiple of the number of tokens per frame {self.toks_per_frame}"
+            self.keys[layer_idx, :, :local_loc] = self.keys[layer_idx, :, self.toks_per_frame:local_loc+self.toks_per_frame].clone()
+            self.values[layer_idx, :, :local_loc] = self.values[layer_idx, :, self.toks_per_frame:local_loc+self.toks_per_frame].clone()
+            #self.keys[layer_idx, :, self.toks_per_frame:local_loc+self.toks_per_frame] = self.keys[layer_idx, :, -local_loc:].clone()
+            #self.values[layer_idx, :, self.toks_per_frame:local_loc+self.toks_per_frame] = self.values[layer_idx, :, -local_loc:].clone()
+        assert local_loc + keys.shape[1] <= self.size, f"{local_loc + keys.shape[1]} out of bounds {self.size}"
+        self.keys[layer_idx, :, local_loc:local_loc + keys.shape[1]] = keys
+        self.values[layer_idx, :, local_loc:local_loc + keys.shape[1]] = values
+        self.curr_layer = (self.curr_layer + 1) % self.n_layers
+    def update_global_location(self, n_frames):
+        self.global_loc += n_frames * self.toks_per_frame
+        if self.local_loc < self.size:
+            self.local_loc += n_frames * self.toks_per_frame
+            assert self.local_loc <= self.size, f"the local loc {self.local_loc} should never be bigger than {self.size}, something went wrong."
+    def reset(self):
+        self.global_loc = 0
+        self.local_loc = 0
+        self.curr_layer = 0
+        self.keys.zero_()
+        self.values.zero_()
+    @property
+    def local_location(self):
+        return self.local_loc
+    @property
+    def global_location(self):
+        return self.global_loc
+    @property
+    def device(self):
+        return self.keys.device
+    @property
+    def dtype(self):
+        return self.keys.dtype
+class AttentionEinOps(nn.Module):
+    IGNORE: Float[Tensor, ""]
+    def __init__(self, d_model, n_heads, rope=None):
+        super().__init__()
+        assert d_model % n_heads == 0, f"{d_model} must be divisble by {n_heads}"
+        self.d_head = d_model // n_heads
+        d_head = self.d_head
+        self.W_Q = nn.Parameter(t.empty((n_heads, d_model, d_head)))
+        self.W_K = nn.Parameter(t.empty((n_heads, d_model, d_head)))
+        self.W_V = nn.Parameter(t.empty((n_heads, d_model, d_head)))
+        self.W_O = nn.Parameter(t.empty((n_heads, d_head, d_model)))
+        self.b_Q = nn.Parameter(t.zeros((n_heads, d_head)))
+        self.b_K = nn.Parameter(t.zeros((n_heads, d_head)))
+        self.b_V = nn.Parameter(t.zeros((n_heads, d_head)))
+        self.b_O = nn.Parameter(t.zeros((d_model)))
+        nn.init.normal_(self.W_Q, 1/d_model**0.5)
+        nn.init.normal_(self.W_K, 1/d_model**0.5)
+        nn.init.normal_(self.W_V, 1/d_model**0.5)
+        nn.init.normal_(self.W_O, 1/d_head**0.5)
+        self.register_buffer("IGNORE", t.tensor(float('-inf'), dtype=t.float32))
+        self.rope = rope
+        self.ln1 = nn.LayerNorm(d_head)
+        self.ln2 = nn.LayerNorm(d_head)
+    def forward(
+        self,
+        x_q: Float[Tensor, "batch posq d_model"],
+        x_kv: Float[Tensor, "batch posk d_model"],
+        mask: Bool[Tensor, "posq posk"] = None, # the 1s are removed
+        k_cache: Optional[Float[Tensor, "batch posk n_head d_head"]] = None,
+        v_cache: Optional[Float[Tensor, "batch posk n_head d_head"]] = None,
+        offset: int = 0
+    ) -> Float[Tensor, "batch posq d_model"]:
+        assert (k_cache is None and v_cache is None) or (k_cache is not None and v_cache is not None), "k_cache and v_cache go together."
+        d_head = self.d_head
+        if k_cache is not None and v_cache is not None:
+            q = einops.einsum(x_q, self.W_Q, 'b s d, n d h -> b s n h') + self.b_Q
+            k_new = einops.einsum(x_kv, self.W_K, 'b s d, n d h -> b s n h') + self.b_K
+            v_new = einops.einsum(x_kv, self.W_V, 'b s d, n d h -> b s n h') + self.b_V
+            k = t.cat([k_cache, k_new], dim=1)
+            v = t.cat([v_cache, v_new], dim=1)
+            if self.rope is not None:
+                q = self.rope(q, offset=k_cache.shape[1])
+                k = self.rope(k, offset=0)
+            q = self.ln1(q) # this should be before rope
+            k = self.ln2(k)
+            mask = None
+        else:
+            q = einops.einsum(x_q, self.W_Q, 'b s d, n d h -> b s n h') + self.b_Q
+            k = einops.einsum(x_kv, self.W_K, 'b s d, n d h -> b s n h') + self.b_K
+            v = einops.einsum(x_kv, self.W_V, 'b s d, n d h -> b s n h') + self.b_V
+            if self.rope is not None:
+                q = self.rope(q)
+                k = self.rope(k)
+            q = self.ln1(q)
+            k = self.ln2(k) # this leanrs much faster using layernorm here
+            k_new = k
+            v_new = v
+        attention = einops.einsum(q, k, 'b sq n h, b sk n h -> b n sq sk')
+        if mask is not None and k_cache is not None:
+            attention = t.where(mask[k_cache.shape[1]:k_cache.shape[1]+q.shape[1], :k.shape[1]], self.IGNORE, attention)
+        elif mask is not None:
+            if attention.shape[-1] != mask.shape[-1] or attention.shape[-2] != mask.shape[-2]:
+                #print(f"Warning: attention shape {attention.shape} does not match mask shape {mask.shape}")
+                mask = mask[:attention.shape[-1], :attention.shape[-2]]
+            attention = t.where(mask, self.IGNORE, attention)
+        probas = attention.softmax(dim=3)
+        #plt.imshow(probas[0, 0].cpu().numpy())
+        #plt.show()
+        z = einops.einsum(probas, v, 'b n sq sk, b sk n h -> b sq n h')
+        out = einops.einsum(z, self.W_O, 'b s n h, n h d -> b s n d')
+        out = out.sum(dim=2) + self.b_O
+        return out, k_new, v_new
+class Attention(nn.Module):
+    IGNORE: Float[Tensor, ""]
+    def __init__(self, d_model, n_heads, rope=None, use_flex_attention=False):
+        raise NotImplementedError("Attention is not implemented yet")
+        super().__init__()
+        assert d_model % n_heads == 0, f"{d_model} must be divisble by {n_heads}"
+        self.d_head = d_model // n_heads
+        d_head = self.d_head
+        self.W_Q = nn.Parameter(t.empty((n_heads, d_model, d_head)))
+        self.W_K = nn.Parameter(t.empty((n_heads, d_model, d_head)))
+        self.W_V = nn.Parameter(t.empty((n_heads, d_model, d_head)))
+        self.W_O = nn.Parameter(t.empty((n_heads, d_head, d_model)))
+        #self.b_Q = nn.Parameter(t.zeros((n_heads, d_head)))
+        #self.b_K = nn.Parameter(t.zeros((n_heads, d_head)))
+        #self.b_V = nn.Parameter(t.zeros((n_heads, d_head)))
+        #self.b_O = nn.Parameter(t.zeros((d_model)))
+        nn.init.normal_(self.W_Q, 1/d_model**0.5)
+        nn.init.normal_(self.W_K, 1/d_model**0.5)
+        nn.init.normal_(self.W_V, 1/d_model**0.5)
+        nn.init.normal_(self.W_O, 1/d_head**0.5)
+        self.register_buffer("IGNORE", t.tensor(float('-inf'), dtype=t.float32))
+        self.rope = rope
+        self.use_flex_attention = use_flex_attention
+        self.ln1 = nn.LayerNorm(d_head)
+        self.ln2 = nn.LayerNorm(d_head)
+    def forward(
+        self,
+        x_q: Float[Tensor, "batch posq d_model"],
+        x_kv: Float[Tensor, "batch posk d_model"],
+        mask: Bool[Tensor, "posq posk"] = None, # the 1s are removed
+        k_cache: Optional[Float[Tensor, "batch posk n_head d_head"]] = None,
+        v_cache: Optional[Float[Tensor, "batch posk n_head d_head"]] = None,
+    ) -> Float[Tensor, "batch posq d_model"]:
+        assert (k_cache is None and v_cache is None) or (k_cache is not None and v_cache is not None), "k_cache and v_cache go together."
+        d_head = self.d_head
+        if k_cache is not None and v_cache is not None:
+            raise NotImplementedError("kv cache not implemented yet")
+            q = einops.einsum(x, self.W_Q, 'b s d, n d h -> b s n h')
+            k_new = einops.einsum(x_kv, self.W_K, 'b s d, n d h -> b s n h')
+            v_new = einops.einsum(x_kv, self.W_V, 'b s d, n d h -> b s n h')
+            k = t.cat([k_cache, k_new], dim=1)
+            v = t.cat([v_cache, v_new], dim=1)
+        else:
+            q = einops.einsum(x_q, self.W_Q, 'b s d, n d h -> b s n h')
+            k = einops.einsum(x_kv, self.W_K, 'b s d, n d h -> b s n h')
+            v = einops.einsum(x_kv, self.W_V, 'b s d, n d h -> b s n h')
+        q = self.ln1(q)
+        k = self.ln2(k)
+        if self.rope is not None:
+            q = self.rope(q)
+            k = self.rope(k)
+        # Convert to (batch, num_heads, seq_len, head_dim) format for flex_attention
+        q_perm = q.permute(0, 2, 1, 3)  # (batch, n_heads, posq, d_head)
+        k_perm = k.permute(0, 2, 1, 3)   # (batch, n_heads, posk, d_head)
+        v_perm = v.permute(0, 2, 1, 3)   # (batch, n_heads, posk, d_head)
+        # Ensure tensors are contiguous to avoid flex_attention indexing bugs
+        q_perm = q_perm.contiguous()
+        k_perm = k_perm.contiguous()
+        v_perm = v_perm.contiguous()
+        if self.use_flex_attention:
+            # Handle mask using score_mod if needed
+            if mask is not None:
+                # Store mask and IGNORE for use in score_mod closure
+                mask_tensor = mask  # (posq, posk)
+                ignore_val = self.IGNORE
+                def score_mod(score, b, h, q_idx, kv_idx):
+                    # score_mod operates on individual scalar scores
+                    # Apply mask: where mask is True, set to -inf
+                    # Use torch ops that work in compiled context
+                    mask_val = mask_tensor[q_idx, kv_idx]
+                    return t.where(mask_val, ignore_val, score)
+                z = flex_attention(q_perm, k_perm, v_perm, score_mod=score_mod)
+            else:
+                z = flex_attention(q_perm, k_perm, v_perm)
+        else:
+            condi = mask is None and not self.dtype == t.float32
+            with t.backends.cuda.sdp_kernel(
+                enable_flash=condi,
+                enable_math=not condi,
+                enable_mem_efficient=not condi
+            ):
+                z = F.scaled_dot_product_attention(
+                    q_perm, k_perm, v_perm,
+                    attn_mask = mask.logical_not() if mask is not None else None,
+                    dropout_p = 0.0,
+                    is_causal = False,
+                    scale = 1.0
+                )
+        z = z.permute(0, 2, 1, 3)  # Back to (batch, posq, n_heads, d_head)
+        out = einops.einsum(z, self.W_O, 'b s n h, n h d -> b s n d')
+        out = out.sum(dim=2)
+        #print(f"out {out.shape}, attention {probas.shape}, q {q.shape}, k {k.shape}, v {v.shape}")
+        return out, z, None
+    @property
+    def dtype(self):
+        return self.parameters().__next__().dtype
+    @property
+    def device(self):
+        return self.parameters().__next__().device
+if __name__ == "__main__":
+    from .pe import RoPE
+    import inspect
+    rope = RoPE(256//8, 10000)
+    dtype = t.float32
+    rope = rope.to(dtype)
+    attn_slow = AttentionSlow(d_model=256, n_heads=8, rope=rope)
+    attn = Attention(d_model=256, n_heads=8, rope=rope)
+    attn.load_state_dict(attn_slow.state_dict(), strict=False)
+    attn.to(dtype)
+    attn_slow.to(dtype)
+    x = t.randn(1, 1000, 256, dtype=dtype)*10
+    xkv = t.randn(1, 1000, 256, dtype=dtype)*10
+    mask = t.randint(0, 2, (1000, 1000), dtype=t.bool)
+    y, z, _ = attn(x, xkv, mask=mask)
+    y_slow, z_slow, _ = attn_slow(x, xkv, mask=mask)
+    #assert t.allclose(z, z_slow, atol=1e-5), f"Attention and AttentionSlow should be the same: {(z - z_slow).abs().max()}"
+    #assert t.allclose(y, y_slow, atol=1e-5), f"Attention and AttentionSlow should be the same: {(y - y_slow).abs().max()}"
+    print("Attention and AttentionSlow are the same")
+    loss = t.nn.functional.mse_loss(y, y_slow)
+    loss.backward()
+    print("-"*100)
+    for n, p in attn.named_parameters():
+        print(n, p.grad.shape, p.grad.max(), p.grad.min())
+    print("-"*100)
+    for n, p in attn_slow.named_parameters():
+        print(n, p.grad.shape, p.grad.max(), p.grad.min())

src/nn/geglu.py ADDED Viewed

	@@ -0,0 +1,20 @@

+from torch import nn
+class GEGLU(nn.Module):
+    def __init__(self, d_in, d_mid, d_out):
+        super().__init__()
+        self.d_in = d_in
+        self.d_mid = d_mid
+        self.d_out = d_out
+        self.up_proj = nn.Linear(d_in, d_mid, bias=True)
+        self.up_proj.bias.data.zero_()
+        self.up_gate = nn.Linear(d_in, d_mid, bias=True)
+        self.up_gate.bias.data.zero_()
+        self.down = nn.Linear(d_mid, d_out, bias=True)
+        self.down.bias.data.zero_()
+        self.nonlin = nn.SiLU()
+    def forward(self, x):
+        x = self.up_proj(x) * self.nonlin(self.up_gate(x))
+        x = self.down(x)
+        return x

src/nn/patch.py ADDED Viewed

	@@ -0,0 +1,80 @@

+from torch import nn
+from einops import rearrange
+import torch as t
+class Patch(nn.Module): # adapted from https://github.com/cloneofsimo/minRF
+    def __init__(self, in_channels=3, out_channels=64, patch_size=2):
+        super().__init__()
+        self.patch_size = patch_size
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        dim = out_channels
+        if dim % 32 == 0 and dim > 32:
+            self.init_conv_seq = nn.Sequential(
+                nn.Conv2d(in_channels, dim // 2, kernel_size=5, padding=2, stride=1),
+                nn.SiLU(),
+                nn.GroupNorm(32, dim // 2),
+                nn.Conv2d(dim // 2, dim // 2, kernel_size=5, padding=2, stride=1),
+                nn.SiLU(),
+                nn.GroupNorm(32, dim // 2),
+            )
+        else:
+            self.init_conv_seq = nn.Sequential(
+                nn.Conv2d(in_channels, dim // 2, kernel_size=5, padding=2, stride=1),
+                nn.SiLU(),
+                nn.Conv2d(dim // 2, dim // 2, kernel_size=5, padding=2, stride=1),
+                nn.SiLU(),
+            )
+        self.x_embedder = nn.Linear(patch_size * patch_size * dim // 2, dim, bias=True)
+        nn.init.constant_(self.x_embedder.bias, 0)
+    def forward(self, x):
+        batch, dur, c, h, w = x.shape
+        x = x.reshape(-1, c, h, w)
+        x = self.init_conv_seq(x)
+        x = self.patchify(x)
+        x = self.x_embedder(x)
+        x = x.reshape(batch, dur, -1, self.out_channels)
+        return x
+    def patchify(self, x):
+        B, C, H, W = x.size()
+        x = x.view(
+            B,
+            C,
+            H // self.patch_size,
+            self.patch_size,
+            W // self.patch_size,
+            self.patch_size,
+        )
+        x = x.permute(0, 2, 4, 1, 3, 5).flatten(-3).flatten(1, 2)
+        return x
+class UnPatch(nn.Module):
+    def __init__(self, height, width, in_channels=64, out_channels=3, patch_size=2):
+        super().__init__()
+        self.width = width
+        self.height = height
+        self.patch_size = patch_size
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.unpatch = nn.Linear(in_channels, out_channels*patch_size**2)
+    def forward(self, x):
+        x = self.unpatch(x)
+        batch, dur, seq, d = x.shape
+        x = x.reshape(-1, seq, d)
+        x = self.unpatchify(x)
+        x = x.reshape(batch, dur, self.out_channels, self.height, self.width)
+        return x
+    def unpatchify(self, x):
+        c = self.out_channels
+        p = self.patch_size
+        h = self.height // p
+        w = self.width // p
+        x = x.reshape(shape=(x.shape[0], h, w, p, p, c))
+        x = t.einsum("nhwpqc->nchpwq", x)
+        imgs = x.reshape(shape=(x.shape[0], c, h * p, w * p))
+        return imgs

src/nn/pe.py ADDED Viewed

	@@ -0,0 +1,77 @@

+import torch as t
+import torch.nn as nn
+import math
+from jaxtyping import Float, Bool, Int
+from torch import Tensor
+from typing import Optional
+class NumericEncoding(nn.Module):
+    def __init__(self, C = 1e4, dim = 64, n_max = 10000):
+        super().__init__()
+        args = t.exp(- math.log(C) * t.arange(0, dim, 2)/dim)
+        args = t.arange(n_max)[:, None] * args[None, :]
+        sins = t.sin(args)
+        coss = t.cos(args)
+        pe = t.empty((n_max, dim))
+        pe[:,::2] = sins
+        pe[:,1::2] = coss
+        self.register_buffer("pe", pe)
+    def forward(self, num):
+        """
+        expects integers between 0 and n_max
+        """
+        assert num.dtype == t.int32 or num.dtype == t.int64, f"wrong dtype {num.dtype}"
+        return self.pe[num]
+class RoPE(nn.Module):
+    def __init__(self, d_head, n_ctx, C=10000):
+        super().__init__()
+        thetas = t.exp(-math.log(C)*t.arange(0,d_head,2)/d_head)
+        thetas = thetas.repeat([2,1]).T.flatten()
+        positions = t.arange(n_ctx)
+        all_thetas = positions.unsqueeze(1)*thetas.unsqueeze(0)
+        sins = t.sin(all_thetas)
+        coss = t.cos(all_thetas)
+        self.register_buffer('sins', sins.unsqueeze(0).unsqueeze(2))
+        self.register_buffer('coss', coss.unsqueeze(0).unsqueeze(2))
+    def forward(self, key_or_query: Float[Tensor, "batch sequence n_head d_head"],
+                      offset: int = 0):
+        x = key_or_query
+        # start with doing it for just a single position m
+        x_perm = t.empty(x.shape, device=x.device, dtype=x.dtype) # batch sequence n_head d_head, we perm the last axis
+        even = t.arange(0, x.shape[-1], 2)
+        odd = t.arange(1, x.shape[-1],2)
+        x_perm[:, :, :, even] = -x[:, :, :, odd]
+        x_perm[:, :, :, odd] = x[:, :, :, even]
+        assert x.shape[1] >= 1, f"x.shape[1] must be >= 1, got {x.shape}"
+        return self.coss[:,offset:offset+x.shape[1]]*x + self.sins[:,offset:offset+x.shape[1]]*x_perm
+class FrameRoPE(nn.Module):
+    def __init__(self, d_head, n_ctx, toks_per_frame, C=10000):
+        super().__init__()
+        thetas = t.exp(-math.log(C)*t.arange(0,d_head,2)/d_head)
+        thetas = thetas.repeat([2,1]).T.flatten()
+        positions = t.arange(n_ctx)
+        all_thetas = positions.unsqueeze(1)*thetas.unsqueeze(0)
+        sins = t.sin(all_thetas)
+        coss = t.cos(all_thetas)
+        self.register_buffer('sins', sins.unsqueeze(0).unsqueeze(2))
+        self.register_buffer('coss', coss.unsqueeze(0).unsqueeze(2))
+        self.toks_per_frame = toks_per_frame
+    def forward(self, key_or_query: Float[Tensor, "batch dur*seq n_head d_head"]):
+        x = key_or_query
+        # start with doing it for just a single position m
+        x_perm = t.empty(x.shape, dtype=x.dtype, device=x.device) # batch sequence n_head d_head, we perm the last axis
+        even = t.arange(0, x.shape[-1], 2)
+        odd = t.arange(1, x.shape[-1], 2)
+        x_perm[:, :, :, even] = -x[:, :, :, odd]
+        x_perm[:, :, :, odd] = x[:, :, :, even]
+        idcs = t.arange(0, x.shape[1]//self.toks_per_frame, device=x.device)
+        idcs = idcs[:, None].repeat(1, self.toks_per_frame).flatten()
+        return self.coss[:,idcs]*x + self.sins[:,idcs]*x_perm

src/utils/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ from .logging import log_video
2	+ from .checkpoint import load_model_from_config

src/utils/checkpoint.py ADDED Viewed

	@@ -0,0 +1,283 @@

+import os
+import re
+import json
+import time
+import shutil
+from pathlib import Path
+from tempfile import NamedTemporaryFile
+from typing import Optional, Dict, Any, List
+import torch as t
+from torch import nn
+from ..models.dit_dforce import get_model
+from ..config import Config
+import yaml
+def load_model_from_config(config_path: str, checkpoint_path: str = None, strict: bool = True) -> nn.Module:
+    print(f"loading {config_path}")
+    cmodel = Config.from_yaml(config_path).model
+    model = get_model(cmodel.height, cmodel.width,
+                    n_window=cmodel.n_window,
+                    patch_size=cmodel.patch_size,
+                    n_heads=cmodel.n_heads,d_model=cmodel.d_model,
+                    n_blocks=cmodel.n_blocks,
+                    T=cmodel.T,
+                    in_channels=cmodel.in_channels,
+                    bidirectional=cmodel.bidirectional)
+    if checkpoint_path is None and cmodel.checkpoint is not None:
+        checkpoint_path = cmodel.checkpoint
+    if checkpoint_path is not None:
+        state_dict = t.load(checkpoint_path, weights_only=False)
+        if "model" in state_dict:
+            state_dict = state_dict["model"]
+        if "_orig_mod." in list(state_dict.keys())[0]:
+            state_dict = {k.replace("_orig_mod.", ""): v for k, v in state_dict.items() if k.startswith("_orig_mod.")}
+        model.load_state_dict(state_dict, strict=strict)
+        print('loaded state dict')
+    return model
+class CheckpointManager:
+    """
+    Manage top-K checkpoints by a metric. On each save:
+      - Write a new checkpoint atomically
+      - Keep only the top-K files by metric (max or min)
+      - Delete files not in top-K
+      - Maintain a small JSON index for quick reloads
+    Also scans the directory on init to reconstruct state.
+    Filenames are of the form: ckpt-step=<step>-metric=<metric>.pt
+    """
+    CKPT_PATTERN = re.compile(
+        r"^ckpt-step=(?P<step>\d+)-metric=(?P<metric>[+-]?\d+(?:\.\d+)?(?:e[+-]?\d+)?)\.pt$"
+    )
+    def __init__(
+        self,
+        dirpath: str | Path,
+        k: int = 5,
+        mode: str = "max",  # or "min"
+        metric_name: str = "score",
+        is_main_process: bool = True,
+        index_filename: str = "ckpt_index.json",
+    ):
+        self.dir = Path(dirpath)
+        self.dir.mkdir(parents=True, exist_ok=True)
+        assert mode in {"max", "min"}
+        self.k = int(k)
+        self.mode = mode
+        self.metric_name = metric_name
+        self.is_main = bool(is_main_process)
+        self.index_path = self.dir / index_filename
+        # entries: list of {path(str), step(int), metric(float), ts(float)}
+        self.entries: List[Dict[str, Any]] = []
+        self._load_index()
+        self._scan_and_merge()
+        self._prune_and_persist()
+    # ---------- Public API ----------
+    @property
+    def best(self) -> Optional[Dict[str, Any]]:
+        return self.entries[0] if self.entries else None
+    @property
+    def paths(self) -> List[str]:
+        return [e["path"] for e in self.entries]
+    @property
+    def should_save(self) -> bool:
+        """Use inside DDP loops to gate saving to rank-0 only."""
+        return self.is_main
+    def save(
+        self,
+        *,
+        metric: float,
+        step: int,
+        model: Optional[nn.Module] = None,
+        optimizer: Optional[t.optim.Optimizer] = None,
+        scheduler: Optional[Any] = None,
+        extra: Optional[Dict[str, Any]] = None,
+        state_dict: Optional[Dict[str, Any]] = None,
+    ) -> Dict[str, Any]:
+        """
+        Save a checkpoint and keep only top-K by metric.
+        Provide either `state_dict` or a `model` (optionally optimizer/scheduler).
+        The saved file always contains:
+           - 'model', 'optimizer', 'scheduler' (if provided)
+           - 'step', metric_name, 'timestamp', 'manager'
+        Returns info about the saved file and whether it made the top-K.
+        """
+        if not self.should_save:
+            return {"saved": False, "kept": False, "reason": "not main process"}
+        if state_dict is None:
+            state_dict = {}
+            if model is not None:
+                state_dict["model"] = model.state_dict()
+            if optimizer is not None:
+                state_dict["optimizer"] = optimizer.state_dict()
+            if scheduler is not None:
+                # Some schedulers (e.g., OneCycleLR) have state_dict
+                try:
+                    state_dict["scheduler"] = scheduler.state_dict()
+                except Exception:
+                    pass
+        ts = time.time()
+        filename = f"ckpt-step={int(step):06d}-metric={float(metric):.8f}.pt"
+        fpath = self.dir / filename
+        # Attach metadata for convenience
+        payload = {
+            **state_dict,
+            "step": int(step),
+            self.metric_name: float(metric),
+            "timestamp": ts,
+            "manager": {
+                "mode": self.mode,
+                "k": self.k,
+                "metric_name": self.metric_name,
+                "filename": filename,
+            },
+        }
+        # Atomic write
+        with NamedTemporaryFile(dir=self.dir, delete=False) as tmp:
+            tmp_path = Path(tmp.name)
+        try:
+            t.save(payload, tmp_path)
+            os.replace(tmp_path, fpath)  # atomic on POSIX
+        finally:
+            if tmp_path.exists():
+                try:
+                    tmp_path.unlink()
+                except Exception:
+                    pass
+        # Update entries and prune
+        new_entry = {
+            "path": str(fpath),
+            "step": int(step),
+            "metric": float(metric),
+            "ts": ts,
+        }
+        self.entries.append(new_entry)
+        kept = self._prune_and_persist()  # returns True if new file in top-K
+        return {"saved": True, "kept": kept, "path": str(fpath), "best": self.best}
+    # ---------- Internal helpers ----------
+    def _sort_key(self, e: Dict[str, Any]):
+        # For MAX: better first => sort by (-metric, step)
+        # For MIN: better first => sort by (metric, step)
+        return ((-e["metric"], e["step"]) if self.mode == "max" else (e["metric"], e["step"]))
+    def _load_index(self):
+        if not self.index_path.exists():
+            self.entries = []
+            return
+        try:
+            data = json.loads(self.index_path.read_text())
+            entries = data.get("entries", [])
+            # Drop missing files
+            self.entries = [e for e in entries if Path(e["path"]).exists()]
+            # Normalize types
+            for e in self.entries:
+                e["metric"] = float(e["metric"])
+                e["step"] = int(e["step"])
+                e["ts"] = float(e.get("ts", time.time()))
+        except Exception:
+            # If index is corrupted, fall back to empty and rescan
+            self.entries = []
+    def _scan_and_merge(self):
+        """Scan directory for checkpoint files and merge with current entries."""
+        seen = {Path(e["path"]).name for e in self.entries}
+        for p in self.dir.glob("ckpt-step=*-metric=*.pt"):
+            name = p.name
+            if name in seen:
+                continue
+            m = self.CKPT_PATTERN.match(name)
+            if not m:
+                continue
+            step = int(m.group("step"))
+            try:
+                metric = float(m.group("metric"))
+            except ValueError:
+                continue
+            self.entries.append(
+                {"path": str(p), "step": step, "metric": metric, "ts": p.stat().st_mtime}
+            )
+    def _prune_and_persist(self) -> bool:
+        """Sort by metric, keep top-K, delete the rest. Return True if newest file is kept."""
+        if not self.entries:
+            self._persist_index()
+            return False
+        # Sort best-first
+        self.entries.sort(key=self._sort_key)
+        # Determine which to keep and which to delete
+        keep = self.entries[: self.k]
+        drop = self.entries[self.k :]
+        keep_paths = {e["path"] for e in keep}
+        newest_path = max(self.entries, key=lambda e: e["ts"])["path"]
+        newest_kept = newest_path in keep_paths
+        # Delete files not in top-K
+        for e in drop:
+            try:
+                Path(e["path"]).unlink(missing_ok=True)
+            except Exception:
+                pass
+        # Commit the top-K
+        self.entries = keep
+        self._persist_index()
+        return newest_kept
+    def _persist_index(self):
+        data = {
+            "k": self.k,
+            "mode": self.mode,
+            "metric_name": self.metric_name,
+            "entries": self.entries,
+            "updated_at": time.time(),
+        }
+        tmp = self.index_path.with_suffix(".json.tmp")
+        tmp.write_text(json.dumps(data, indent=2))
+        os.replace(tmp, self.index_path)
+# ---------------------- Example usage ----------------------
+if __name__ == "__main__":
+    # Example (single process). In DDP, construct with is_main_process=(rank==0).
+    mgr = CheckpointManager("checkpoints", k=5, mode="max", metric_name="val_acc")
+    model = nn.Linear(10, 2)
+    opt = t.optim.AdamW(model.parameters(), lr=1e-3)
+    # Fake loop
+    for epoch in range(10):
+        metric = 0.5 + 0.1 * t.rand(1).item()  # pretend validation accuracy
+        info = mgr.save(metric=metric, step=epoch, model=model, optimizer=opt)
+        print(
+            f"epoch {epoch:02d} metric={metric:.4f} saved={info['saved']} kept={info['kept']} "
+            f"best_metric={mgr.best['metric'] if mgr.best else None:.4f}"
+        )
+    print("Top-K paths:", mgr.paths)
+    print("Best:", mgr.best)

static/index.html ADDED Viewed

	@@ -0,0 +1,162 @@

+<!doctype html>
+<html>
+  <head>
+    <meta charset="utf-8" />
+    <title>Pong</title>
+    <meta name="viewport" content="width=device-width, initial-scale=1" />
+    <!-- Socket.IO client library (CDN) -->
+    <script src="https://cdn.socket.io/4.5.4/socket.io.min.js"></script>
+    <style>
+      html, body { margin:0; height:100%; background:#111; color:#eee; font-family: system-ui, sans-serif; }
+      #overlay {
+        position: fixed; inset: 0; display: flex; align-items: center; justify-content: center;
+        background: rgba(0,0,0,0.8); z-index: 9999; transition: opacity 200ms ease;
+      }
+      #overlay.hidden { opacity: 0; pointer-events: none; }
+      .spinner {
+        width: 64px; height: 64px; border: 6px solid #444; border-top-color: #09f; border-radius: 50%;
+        animation: spin 0.9s linear infinite;
+      }
+      @keyframes spin { to { transform: rotate(360deg); } }
+      #statusText { margin-top: 12px; color: #aaa; text-align: center; font-size: 14px; white-space: pre-line; }
+      #app { padding: 16px; }
+      button { padding: 8px 12px; background:#09f; color:#fff; border:none; border-radius:8px; cursor:pointer; }
+      button:disabled { opacity: .5; cursor: not-allowed; }
+      img#frame { image-rendering: pixelated; width: 240px; height: 240px; background:#222; display:block; margin-top:12px; }
+    </style>
+  </head>
+  <body>
+    <div id="overlay">
+      <div>
+        <div class="spinner"></div>
+        <div id="statusText">Loading model…</div>
+      </div>
+    </div>
+    <div id="app">
+      <h1>Pong</h1>
+      <div style="margin-bottom: 12px;">
+        <label style="display: block; margin-bottom: 8px;">
+          FPS: <input type="number" id="fpsInput" value="20" min="1" max="30" step="1" style="width: 60px; padding: 4px; margin-left: 8px;" />
+          <span style="color: #aaa; font-size: 12px; margin-left: 8px;">frames per second</span>
+        </label>
+        <label style="display: block; margin-bottom: 8px;">
+          Steps: <input type="number" id="stepsInput" value="4" min="1" max="10" step="1" style="width: 60px; padding: 4px; margin-left: 8px;" />
+          <span style="color: #aaa; font-size: 12px; margin-left: 8px;">diffusion steps</span>
+        </label>
+      </div>
+      <div>
+        <button id="startBtn" disabled>Start Stream</button>
+        <button id="stopBtn" disabled>Stop Stream</button>
+      </div>
+      <img id="frame" alt="Latest frame" />
+      <div id="actionDisplay" style="margin-top: 12px; font-size: 16px; font-family: monospace;">
+        Action: <span id="actionValue">-</span>
+      </div>
+      <div id="fpsDisplay" style="margin-top: 8px; font-size: 16px; font-family: monospace;">
+        Achieved FPS: <span id="fpsValue">-</span>
+      </div>
+      <div>
+        This is the output of a small frame-autoregressive transformer trained with rectified flow matching to simulate pong frames conditioned on user inputs for the blue paddle. It should reach 20 FPS when using 4 steps for generation unless something else is running on my machine.
+      </div>
+    </div>
+    <script>
+      // If you serve socket.io client at /socket.io/socket.io.js you can use global io():
+      const socket = io({ transports: ['websocket', 'polling'] });
+      const overlay    = document.getElementById('overlay');
+      const statusText = document.getElementById('statusText');
+      const startBtn   = document.getElementById('startBtn');
+      const stopBtn    = document.getElementById('stopBtn');
+      const frameImg   = document.getElementById('frame');
+      function setStatus(isReady) {
+        if (!isReady) {
+          // Model is still loading
+          overlay.classList.remove('hidden');
+          startBtn.disabled = true;
+          stopBtn.disabled  = true;
+          statusText.textContent = 'Loading model…';
+        } else {
+          // Server is ready and available
+          overlay.classList.add('hidden');
+          startBtn.disabled = false;
+          stopBtn.disabled  = false;
+          statusText.textContent = 'Ready';
+        }
+      }
+      // Initial state: assume not ready (show spinner)
+      setStatus(false);
+      socket.on('connect', () => {
+        // server will immediately emit 'server_status' with current readiness
+        console.log('connected');
+      });
+      // Backend broadcasts readiness changes
+      socket.on('server_status', (payload) => {
+        const ready = !!(payload && payload.ready);
+        console.log('Server status:', { ready });
+        setStatus(ready);
+      });
+      // Start/stop controls
+      startBtn.addEventListener('click', () => {
+        const fps = parseInt(document.getElementById('fpsInput').value) || 12;
+        const n_steps = parseInt(document.getElementById('stepsInput').value) || 1;
+        socket.emit('start_stream', { n_steps: n_steps, cfg: 0.0, fps: fps, clamp: true });
+      });
+      stopBtn.addEventListener('click', () => {
+        socket.emit('stop_stream');
+      });
+      const actionValue = document.getElementById('actionValue');
+      const fpsValue = document.getElementById('fpsValue');
+      // Incoming frames
+      socket.on('frame', ({ frame, frame_index, action, fps }) => {
+        frameImg.src = `data:image/png;base64,${frame}`;
+        // Display action: 0=NOOP, 1=UP, 2=DOWN
+        const actionLabels = ['START','NOOP', 'UP', 'DOWN'];
+        actionValue.textContent = `${action} (${actionLabels[action] || 'UNKNOWN'})`;
+        // Display achieved FPS
+        if (fps !== undefined) {
+          fpsValue.textContent = fps.toFixed(1);
+        }
+      });
+      socket.on('error', (e) => {
+        console.warn('server error', e);
+        // The server_status event will handle showing the appropriate overlay
+        // Just log the error for now
+        if (e && e.message) {
+          console.error('Server error message:', e.message);
+        }
+      });
+      // Keyboard controls for paddle
+      // Actions: 0=NOOP, 1=UP, 2=DOWN
+      document.addEventListener('keydown', (e) => {
+        let action = null;
+        if (e.key === 'ArrowUp' || e.key === 'w' || e.key === 'W') {
+          action = 2; // UP
+        } else if (e.key === 'ArrowDown' || e.key === 's' || e.key === 'S') {
+          action = 3; // DOWN
+        }
+        if (action !== null) {
+          socket.emit('action', { action });
+          e.preventDefault();
+        }
+      });
+      document.addEventListener('keyup', (e) => {
+        if (['ArrowUp', 'ArrowDown', 'w', 'W', 's', 'S'].includes(e.key)) {
+          socket.emit('action', { action: 1 }); // NOOP when key released
+          e.preventDefault();
+        }
+      });
+    </script>
+  </body>
+</html>