NON_WORKING_matrix_game_2

Paused

App Files Files Community

Julian Bilcke commited on Aug 13

Commit

6abca92

1 Parent(s): eb94d89

replace files

Browse files

Files changed (7) hide show

Dockerfile +1 -2
README.md +57 -107
api_engine.py +363 -0
api_server.py +649 -0
api_utils.py +202 -0
requirements.txt +2 -0
run_api_on_hf.py +148 -0

Dockerfile CHANGED Viewed

@@ -51,5 +51,4 @@ EXPOSE 8080
 ENV PORT 8080
-# Run the HF space launcher script which sets up the correct paths
-CMD ["python3", "run_hf_space.py"]


51
52	ENV PORT 8080
53
54	+ CMD ["python3", "run_api_on_hf.py"]

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: "Matrix"
 emoji: 🐟
 colorFrom: blue
 colorTo: blue
@@ -11,113 +11,71 @@ app_port: 8080
 disable_embedding: false
 ---
-<!-- markdownlint-disable first-line-h1 -->
-<!-- markdownlint-disable html -->
-<!-- markdownlint-disable no-duplicate-header -->
-# Matrix-Game: Interactive World Foundation Model
-<font size=7><div align='center' >  [[🤗 Huggingface](https://huggingface.co/Skywork/Matrix-Game)] [[📖 Technical Report](https://github.com/SkyworkAI/Matrix-Game/blob/main/assets/report.pdf)] [[🚀 Project Website](https://matrix-game-homepage.github.io/)] </div></font>
-<div align="center">
-  <img src="assets/videos/demo.gif" alt="teaser" />
-</div>
 ## 📝 Overview
-**Matrix-Game** is a 17B-parameter interactive world foundation model for controllable game world generation.
-## ✨ Key Features
-- 🎯 **Feature 1**: **Interactive Generation.**  A diffusion-based image-to-world model that generates high-quality videos conditioned on keyboard and mouse inputs, enabling fine-grained control and dynamic scene evolution.
-- 🚀 **Feature 2**: **GameWorld Score.** A comprehensive benchmark for evaluating Minecraft world models across four key dimensions, including visual quality, temporal quality, action controllability, and physical rule understanding.
-- 💡 **Feature 3**: **Matrix-Game Dataset** A large-scale Minecraft dataset with fine-grained action annotations, supporting scalable training for interactive and physically grounded world modeling.
-## 🔥 Latest Updates
-* [2025-05] 🎉 Initial release of Matrix-Game Model
-## 🚀 Performance Comparison
-### GameWorld Score Benchmark Comparison
-| Model     | Image Quality ↑ | Aesthetic Quality ↑ | Temporal Cons. ↑ | Motion Smooth. ↑ | Keyboard Acc. ↑ | Mouse Acc. ↑ | 3D Cons. ↑ |
-|-----------|------------------|-------------|-------------------|-------------------|------------------|---------------|-------------|
-| Oasis     | 0.65             | 0.48        | 0.94              | **0.98**          | 0.77             | 0.56          | 0.56        |
-| MineWorld | 0.69             | 0.47        | 0.95              | **0.98**          | 0.86             | 0.64          | 0.51        |
-| **Ours**  | **0.72**         | **0.49**    | **0.97**          | **0.98**          | **0.95**         | **0.95**      | **0.76**    |
-**Metric Descriptions**:
-- **Image Quality** / **Aesthetic**: Visual fidelity and perceptual appeal of generated frames
-- **Temporal Consistency** / **Motion Smoothness**: Temporal coherence and smoothness between frames
-- **Keyboard Accuracy** / **Mouse Accuracy**: Accuracy in following user control signals
-- **3D Consistency**: Geometric stability and physical plausibility over time
-  Please check our [GameWorld](https://github.com/SkyworkAI/Matrix-Game/tree/main/GameWorldScore) benchmark for detailed implementation.
-### Human Evaluation
-![Human Win Rate](assets/imgs/human_win_rate.png)
-> Double-blind human evaluation by two independent groups across four key dimensions: **Overall Quality**, **Controllability**, **Visual Quality**, and **Temporal Consistency**.
-> Scores represent the percentage of pairwise comparisons in which each method was preferred. Matrix-Game consistently outperforms prior models across all metrics and both groups.
-## 🚀 Quick Start
 ```
-# clone the repository:
-git clone https://github.com/SkyworkAI/Matrix-Game.git
-cd Matrix-Game
-# install dependencies:
 pip install -r requirements.txt
-# install apex and FlashAttention-3
-# Our project also depends on [apex](https://github.com/NVIDIA/apex) and [FlashAttention-3](https://github.com/Dao-AILab/flash-attention)
-# Run batch inference to generate videos
-bash run_inference.sh
-# Run interactive websocket server
-python server.py --model_root ./models/matrixgame
 ```
-## Interactive WebSocket Server
-We've implemented a real-time interactive WebSocket server that uses the Matrix-Game model to generate game frames based on keyboard and mouse inputs:
-### Features:
-- **Real-time Generation**: Frames are generated on-the-fly based on user inputs
-- **Keyboard & Mouse Control**: Move through the virtual world using WASD keys and mouse movements
-- **Multiple Scenes**: Choose from different environments (forest, desert, beach, hills, etc.)
-- **Fallback Mode**: Automatically falls back to demo mode when GPU resources are unavailable
-### Usage:
-```bash
-# Basic startup
-python server.py
-# With custom model paths
-python server.py --model_root ./models/matrixgame --port 8080
-# With individual model component paths
-python server.py --dit_path ./custom/dit --vae_path ./custom/vae --textenc_path ./custom/textenc
 ```
-### Connection:
-- WebSocket endpoint: ws://localhost:8080/ws
-- Web client: http://localhost:8080/
-### System Requirements:
-- NVIDIA GPU with CUDA support
-- 24GB+ VRAM recommended for smooth frame generation
-## 🔧 Hardware Requirements
-- **GPU**:
-  - NVIDIA A100/H100
-- **VRAM**:
-  - Requires **≥80GB of GPU memory** for a single 65-frame video inference.
 ## ⭐ Acknowledgements
@@ -125,25 +83,17 @@ python server.py --dit_path ./custom/dit --vae_path ./custom/vae --textenc_path
 We would like to express our gratitude to:
 - [Diffusers](https://github.com/huggingface/diffusers) for their excellent diffusion model framework
-- [HunyuanVideo](https://github.com/Tencent/HunyuanVideo) for their strong base model
-- [MineDojo](https://minedojo.org/knowledge_base) for their Minecraft video dataset
 - [MineRL](https://github.com/minerllabs/minerl) for their excellent gym framework
 - [Video-Pre-Training](https://github.com/openai/Video-Pre-Training) for their accurate Inverse Dynamics Model
-- [GameFactory](https://github.com/KwaiVGI/GameFactory) for their idea of action control module
-We are grateful to the broader research community for their open exploration and contributions to the field of interactive world generation.
 ## 📄 License
 This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
-## 📎 Citation
-If you find this project useful, please cite our paper:
-```bibtex
-@article{zhang2025matrixgame,
-  title     = {Matrix-Game: Interactive World Foundation Model},
-  author    = {Yifan Zhang and Chunli Peng and Boyang Wang and Puyi Wang and Qingcheng Zhu and Zedong Gao and Eric Li and Yang Liu and Yahui Zhou},
-  journal   = {arXiv},
-  year      = {2025}
-}
 ```

 ---
+title: "Matrix Game 2"
 emoji: 🐟
 colorFrom: blue
 colorTo: blue
 disable_embedding: false
 ---
+<p align="center">
+<h1 align="center">Matrix-Game 2.0</h1>
+<h3 align="center">An Open-Source, Real-Time, and Streaming Interactive World Model</h3>
+</p>
+<font size=7><div align='center' >  [[🤗 HuggingFace](https://huggingface.co/Skywork/Matrix-Game-2.0)] [[📖 Technical Report](https://matrix-game-v2.github.io/static/pdf/report.pdf)] [[🚀 Project Website](https://matrix-game-v2.github.io/)] </div></font>
+https://github.com/user-attachments/assets/336b0d4a-64f5-4e5c-9b60-6212ddb261c0
 ## 📝 Overview
+**Matrix-Game-2.0** is an interactive world foundation model for real-time long video generation.  Built upon an auto-regressive diffusion-based image-to-world framework, it can generate real-time[25fps] long videos conditioned on keyboard and mouse inputs, enabling fine-grained control and dynamic scene evolution.
+## 🤗 Matrix-Game-2.0 Model
+we provide three pretrained model weights including universal scenes, GTA driving scene and TempleRun game scene. Please refer to our HuggingFace page to reach these resources.
+## Requirements
+We tested this repo on the following setup:
+* Nvidia GPU with at least 24 GB memory (A100, and H100 are tested).
+* Linux operating system.
+* 64 GB RAM.
+## Installation
+Create a conda environment and install dependencies:
 ```
+conda create -n matrix-game-2.0 python=3.10 -y
+conda activate matrix-game-2.0
 pip install -r requirements.txt
+# install apex and FlashAttention
+# Our project also depends on [FlashAttention](https://github.com/Dao-AILab/flash-attention)
+git clone https://github.com/SkyworkAI/Matrix-Game.git
+cd Matrix-Game-2
+python setup.py develop
 ```
+## Quick Start
+### Download checkpoints
+```
+huggingface-cli download Skywork/Matrix-Game-2.0 --local-dir Matrix-Game-2.0
 ```
+### Inference
+After downloading pretrained models, you can use the following command to generate an interactive video with random action trajectories:
+```
+python inference.py \
+    --config_path configs/inference_yaml/{your-config}.yaml \
+    --checkpoint_path {path-to-the-checkpoint} \
+    --img_path {path-to-the-input-image} \
+    --output_folder outputs \
+    --num_output_frames 150 \
+    --seed 42 \
+    --pretrained_model_path {path-to-the-vae-folder}
+```
+Or, you can use the script `inference_streaming.py` for generating the interactive videos with your own input actions and images:
+```
+python inference_streaming.py \
+    --config_path configs/inference_yaml/{your-config}.yaml \
+    --checkpoint_path {path-to-the-checkpoint} \
+    --output_folder outputs \
+    --seed 42 \
+    --pretrained_model_path {path-to-the-vae-folder}
+```
+### Tips
+- In the current version, upward movement for camera may cause brief rendering glitches (e.g., black screens). A fix is planned for future updates. Adjust movement slightly or change direction to resolve it.
 ## ⭐ Acknowledgements
 We would like to express our gratitude to:
 - [Diffusers](https://github.com/huggingface/diffusers) for their excellent diffusion model framework
+- [SkyReels-V2](https://github.com/SkyworkAI/SkyReels-V2) for their strong base model
+- [Self-Forcing](https://github.com/guandeh17/Self-Forcing) for their excellent work
+- [GameFactory](https://github.com/KwaiVGI/GameFactory) for their idea of action control module
 - [MineRL](https://github.com/minerllabs/minerl) for their excellent gym framework
 - [Video-Pre-Training](https://github.com/openai/Video-Pre-Training) for their accurate Inverse Dynamics Model
 ## 📄 License
 This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+## Citation
+If you find this codebase useful for your research, please kindly cite our paper:
+```
 ```

api_engine.py ADDED Viewed

	@@ -0,0 +1,363 @@

+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+MatrixGame Engine
+This module handles the core rendering and model inference for the MatrixGame project.
+"""
+import os
+import logging
+import argparse
+import time
+import torch
+import numpy as np
+from PIL import Image
+import cv2
+from einops import rearrange
+from diffusers.utils import load_image
+from diffusers.video_processor import VideoProcessor
+from typing import Dict, List, Tuple, Any, Optional, Union
+from huggingface_hub import snapshot_download
+# MatrixGame specific imports
+from matrixgame.sample.pipeline_matrixgame import MatrixGameVideoPipeline
+from matrixgame.model_variants import get_dit
+from matrixgame.vae_variants import get_vae
+from matrixgame.encoder_variants import get_text_enc
+from matrixgame.model_variants.matrixgame_dit_src import MGVideoDiffusionTransformerI2V
+from matrixgame.sample.flow_matching_scheduler_matrixgame import FlowMatchDiscreteScheduler
+from teacache_forward import teacache_forward
+# Import utility functions
+from api_utils import (
+    visualize_controls,
+    frame_to_jpeg,
+    load_scene_frames,
+    logger
+)
+class MatrixGameEngine:
+    """
+    Core engine for MatrixGame model inference and frame generation.
+    """
+    def __init__(self, args: Optional[argparse.Namespace] = None):
+        """
+        Initialize the MatrixGame engine with configuration parameters.
+        Args:
+            args: Optional parsed command line arguments for model configuration
+        """
+        # Set default parameters if args not provided
+        # Ensure frame dimensions are compatible with VAE downsampling (8x) and patch size [1,2,2]
+        # Dimensions must be divisible by vae_scale_factor * patch_size = 8 * 2 = 16
+        default_width = getattr(args, 'frame_width', 640)
+        default_height = getattr(args, 'frame_height', 368)  # Changed from 360 to 368 (368/16=23)
+        # Ensure compatibility with VAE and patch size
+        vae_patch_factor = 16  # vae_scale_factor (8) * patch_size (2) for both H and W
+        self.frame_width = (default_width // vae_patch_factor) * vae_patch_factor
+        self.frame_height = (default_height // vae_patch_factor) * vae_patch_factor
+        self.fps = getattr(args, 'fps', 16)
+        self.inference_steps = getattr(args, 'inference_steps', 20)
+        self.guidance_scale = getattr(args, 'guidance_scale', 6.0)
+        self.num_pre_frames = getattr(args, 'num_pre_frames', 3)
+        # Initialize state
+        self.frame_count = 0
+        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+        self.weight_dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32
+        # Model paths from environment or args
+        self.vae_path = os.environ.get("VAE_PATH", "./models/matrixgame/vae/")
+        self.dit_path = os.environ.get("DIT_PATH", "./models/matrixgame/dit/")
+        self.textenc_path = os.environ.get("TEXTENC_PATH", "./models/matrixgame")
+        # Cache scene initial frames
+        self.scenes = {
+            'forest': load_scene_frames('forest', self.frame_width, self.frame_height),
+            'desert': load_scene_frames('desert', self.frame_width, self.frame_height),
+            'beach': load_scene_frames('beach', self.frame_width, self.frame_height),
+            'hills': load_scene_frames('hills', self.frame_width, self.frame_height),
+            'river': load_scene_frames('river', self.frame_width, self.frame_height),
+            'icy': load_scene_frames('icy', self.frame_width, self.frame_height),
+            'mushroom': load_scene_frames('mushroom', self.frame_width, self.frame_height),
+            'plain': load_scene_frames('plain', self.frame_width, self.frame_height)
+        }
+        # Cache initial images for model input
+        self.scene_initial_images = {}
+        # Initialize MatrixGame pipeline
+        self.model_loaded = False
+        if torch.cuda.is_available():
+            try:
+                self._init_models()
+                self.model_loaded = True
+                logger.info("MatrixGame models loaded successfully")
+            except Exception as e:
+                logger.error(f"Failed to initialize MatrixGame models: {str(e)}")
+                logger.info("Falling back to frame cycling mode")
+        else:
+            logger.warning("CUDA not available. Using frame cycling mode only.")
+    def _init_models(self):
+        """Initialize MatrixGame models (VAE, text encoder, transformer)"""
+        # Initialize flow matching scheduler
+        self.scheduler = FlowMatchDiscreteScheduler(
+            shift=15.0,
+            reverse=True,
+            solver="euler"
+        )
+        # Initialize VAE
+        try:
+            self.vae = get_vae("matrixgame", self.vae_path, self.weight_dtype)
+            self.vae.requires_grad_(False)
+            self.vae.eval()
+            self.vae.enable_tiling()
+            logger.info("VAE model loaded successfully")
+        except Exception as e:
+            logger.error(f"Error loading VAE model: {str(e)}")
+            raise
+        # Initialize DIT (Transformer)
+        try:
+            # Check if DIT model exists locally, if not download from Hugging Face
+            if not os.path.exists(self.dit_path) or not os.path.isdir(self.dit_path):
+                logger.info(f"DIT model not found at {self.dit_path}, downloading from Hugging Face...")
+                try:
+                    # Download the DIT subdirectory from Skywork/Matrix-Game-2.0
+                    downloaded_path = snapshot_download(
+                        repo_id="Skywork/Matrix-Game-2.0",
+                        allow_patterns="dit/*",
+                        local_dir=os.path.dirname(self.dit_path) if os.path.dirname(self.dit_path) else "./models/matrixgame"
+                    )
+                    # Point to the dit subdirectory
+                    self.dit_path = os.path.join(downloaded_path, "dit")
+                    logger.info(f"Successfully downloaded DIT model to {self.dit_path}")
+                except Exception as e:
+                    logger.error(f"Failed to download DIT model from Hugging Face: {str(e)}")
+                    raise
+            dit = MGVideoDiffusionTransformerI2V.from_pretrained(self.dit_path)
+            dit.requires_grad_(False)
+            dit.eval()
+            logger.info("DIT model loaded successfully")
+        except Exception as e:
+            logger.error(f"Error loading DIT model: {str(e)}")
+            raise
+        # Initialize text encoder
+        try:
+            self.text_enc = get_text_enc('matrixgame', self.textenc_path, weight_dtype=self.weight_dtype, i2v_type='refiner')
+            logger.info("Text encoder loaded successfully")
+        except Exception as e:
+            logger.error(f"Error loading text encoder: {str(e)}")
+            raise
+        # Initialize pipeline
+        try:
+            self.pipeline = MatrixGameVideoPipeline(
+                vae=self.vae.vae,
+                text_encoder=self.text_enc,
+                transformer=dit,
+                scheduler=self.scheduler,
+            ).to(self.weight_dtype).to(self.device)
+            logger.info("Pipeline initialized successfully")
+        except Exception as e:
+            logger.error(f"Error initializing pipeline: {str(e)}")
+            raise
+        # Configure teacache for the transformer
+        self.pipeline.transformer.__class__.enable_teacache = True
+        self.pipeline.transformer.__class__.cnt = 0
+        self.pipeline.transformer.__class__.num_steps = self.inference_steps
+        self.pipeline.transformer.__class__.accumulated_rel_l1_distance = 0
+        self.pipeline.transformer.__class__.rel_l1_thresh = 0.075
+        self.pipeline.transformer.__class__.previous_modulated_input = None
+        self.pipeline.transformer.__class__.previous_residual = None
+        self.pipeline.transformer.__class__.forward = teacache_forward
+        # Preprocess initial images for all scenes
+        for scene_name, frames in self.scenes.items():
+            if frames:
+                # Use first frame as initial image
+                self.scene_initial_images[scene_name] = self._preprocess_image(frames[0])
+    def _preprocess_image(self, image_array: np.ndarray) -> torch.Tensor:
+        """
+        Preprocess an image for the model.
+        Args:
+            image_array: Input image as numpy array
+        Returns:
+            torch.Tensor: Preprocessed image tensor
+        """
+        # Convert numpy array to PIL Image if needed
+        if isinstance(image_array, np.ndarray):
+            image = Image.fromarray(image_array)
+        else:
+            image = image_array
+        # Preprocess for VAE
+        vae_scale_factor = 2 ** (len(self.vae.config.block_out_channels) - 1) if hasattr(self, 'vae') else 8
+        video_processor = VideoProcessor(vae_scale_factor=vae_scale_factor)
+        initial_image = video_processor.preprocess(image, height=self.frame_height, width=self.frame_width)
+        # Add past frames for stability (use same frame repeated)
+        past_frames = initial_image.repeat(self.num_pre_frames, 1, 1, 1)
+        initial_image = torch.cat([initial_image, past_frames], dim=0)
+        return initial_image
+    def generate_frame(self, scene_name: str, keyboard_condition: Optional[List] = None,
+                      mouse_condition: Optional[List] = None) -> bytes:
+        """
+        Generate the next frame based on current conditions using MatrixGame model.
+        Args:
+            scene_name: Name of the current scene
+            keyboard_condition: Keyboard input state
+            mouse_condition: Mouse input state
+        Returns:
+            bytes: JPEG bytes of the frame
+        """
+        # Check if model is loaded
+        if not self.model_loaded or not torch.cuda.is_available():
+            # Fall back to frame cycling for demo mode or if models failed to load
+            return self._fallback_frame(scene_name, keyboard_condition, mouse_condition)
+        else:
+            # Use MatrixGame model for frame generation
+            try:
+                # Get initial image for this scene
+                initial_image = self.scene_initial_images.get(scene_name)
+                if initial_image is None:
+                    # Use forest as default if we don't have an initial image for this scene
+                    initial_image = self.scene_initial_images.get('forest')
+                    if initial_image is None:
+                        # If we still don't have an initial image, fall back to frame cycling
+                        logger.error(f"No initial image available for scene {scene_name}")
+                        return self._fallback_frame(scene_name, keyboard_condition, mouse_condition)
+                # Prepare input tensors (move to device and format correctly)
+                if keyboard_condition is None:
+                    keyboard_condition = [[0, 0, 0, 0, 0, 0]]
+                if mouse_condition is None:
+                    mouse_condition = [[0, 0]]
+                # Convert conditions to tensors
+                keyboard_tensor = torch.tensor(keyboard_condition, dtype=torch.float32)
+                mouse_tensor = torch.tensor(mouse_condition, dtype=torch.float32)
+                # Move to device and convert to correct dtype
+                keyboard_tensor = keyboard_tensor.to(self.weight_dtype).to(self.device)
+                mouse_tensor = mouse_tensor.to(self.weight_dtype).to(self.device)
+                # Get the first frame from the scene for semantic conditioning
+                scene_frames = self.scenes.get(scene_name, self.scenes['forest'])
+                if not scene_frames:
+                    return self._fallback_frame(scene_name, keyboard_condition, mouse_condition)
+                semantic_image = Image.fromarray(scene_frames[0])
+                # Get PIL image version of the frame for visualization
+                for scene_frame in scene_frames:
+                    if isinstance(scene_frame, np.ndarray):
+                        semantic_image = Image.fromarray(scene_frame)
+                        break
+                # Generate a single frame with the model
+                # Use fewer inference steps for interactive frame generation
+                with torch.no_grad():
+                    # Create args object for pipeline
+                    from types import SimpleNamespace
+                    args = SimpleNamespace()
+                    args.num_pre_frames = self.num_pre_frames
+                    # Generate a short video (we'll just use the first frame)
+                    # We're using a short length (4 frames) for real-time performance
+                    video = self.pipeline(
+                        height=self.frame_height,
+                        width=self.frame_width,
+                        video_length=1,  # Generate a very short video for speed (must be 1 or multiple of 4)
+                        mouse_condition=mouse_tensor,
+                        keyboard_condition=keyboard_tensor,
+                        initial_image=initial_image,
+                        num_inference_steps=self.inference_steps,
+                        guidance_scale=self.guidance_scale,
+                        embedded_guidance_scale=None,
+                        data_type="video",
+                        vae_ver='884-16c-hy',
+                        enable_tiling=True,
+                        generator=torch.Generator(device=self.device).manual_seed(42),
+                        i2v_type='refiner',
+                        semantic_images=semantic_image,
+                        args=args
+                    ).videos[0]
+                # Convert video tensor to numpy array (use first frame)
+                video_frame = video[0].permute(1, 2, 0).cpu().numpy()
+                video_frame = (video_frame * 255).astype(np.uint8)
+                frame = video_frame
+                # Increment frame counter
+                self.frame_count += 1
+            except Exception as e:
+                logger.error(f"Error generating frame with MatrixGame model: {str(e)}")
+                # Fall back to cycling demo frames if model generation fails
+                return self._fallback_frame(scene_name, keyboard_condition, mouse_condition)
+        # Add visualization of input controls
+        frame = visualize_controls(
+            frame, keyboard_condition, mouse_condition,
+            self.frame_width, self.frame_height
+        )
+        # Convert frame to JPEG
+        return frame_to_jpeg(frame, self.frame_height, self.frame_width)
+    def _fallback_frame(self, scene_name: str, keyboard_condition: Optional[List] = None,
+                       mouse_condition: Optional[List] = None) -> bytes:
+        """
+        Generate a fallback frame when model generation fails.
+        Args:
+            scene_name: Name of the current scene
+            keyboard_condition: Keyboard input state
+            mouse_condition: Mouse input state
+        Returns:
+            bytes: JPEG bytes of the frame
+        """
+        scene_frames = self.scenes.get(scene_name, self.scenes['forest'])
+        frame_idx = self.frame_count % len(scene_frames)
+        frame = scene_frames[frame_idx].copy()
+        self.frame_count += 1
+        # Add fallback mode indicator
+        cv2.putText(frame, "Fallback mode",
+                  (10, self.frame_height - 20),
+                  cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1)
+        # Add visualization of input controls
+        frame = visualize_controls(
+            frame, keyboard_condition, mouse_condition,
+            self.frame_width, self.frame_height
+        )
+        # Convert frame to JPEG
+        return frame_to_jpeg(frame, self.frame_height, self.frame_width)
+    def get_valid_scenes(self) -> List[str]:
+        """
+        Get a list of valid scene names.
+        Returns:
+            List[str]: List of valid scene names
+        """
+        return list(self.scenes.keys())

api_server.py ADDED Viewed

	@@ -0,0 +1,649 @@

+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+MatrixGame Websocket Gaming Server
+This script implements a websocket server for the MatrixGame project,
+allowing real-time streaming of game frames based on player inputs.
+"""
+import asyncio
+import json
+import logging
+import os
+import pathlib
+import time
+import uuid
+import base64
+import argparse
+from typing import Dict, List, Any, Optional
+from aiohttp import web, WSMsgType
+# Import the game engine
+from api_engine import MatrixGameEngine
+from api_utils import logger, parse_model_args, setup_gpu_environment
+class GameSession:
+    """
+    Represents a user's gaming session.
+    Each WebSocket connection gets its own session with separate queues.
+    """
+    def __init__(self, user_id: str, ws: web.WebSocketResponse, game_manager):
+        self.user_id = user_id
+        self.ws = ws
+        self.game_manager = game_manager
+        # Create action queue for this user session
+        self.action_queue = asyncio.Queue()
+        # Session creation time
+        self.created_at = time.time()
+        self.last_activity = time.time()
+        # Game state
+        self.current_scene = "forest"  # Default scene
+        self.is_streaming = False
+        self.stream_task = None
+        # Current input state
+        self.keyboard_state = [0, 0, 0, 0, 0, 0]  # forward, back, left, right, jump, attack
+        self.mouse_state = [0, 0]  # x, y
+        self.background_tasks = []
+    async def start(self):
+        """Start all the queue processors for this session"""
+        self.background_tasks = [
+            asyncio.create_task(self._process_action_queue()),
+        ]
+        logger.info(f"Started game session for user {self.user_id}")
+    async def stop(self):
+        """Stop all background tasks for this session"""
+        # Stop streaming if active
+        if self.is_streaming and self.stream_task:
+            self.is_streaming = False
+            self.stream_task.cancel()
+            try:
+                await self.stream_task
+            except asyncio.CancelledError:
+                pass
+        # Cancel other background tasks
+        for task in self.background_tasks:
+            task.cancel()
+        try:
+            # Wait for tasks to complete cancellation
+            await asyncio.gather(*self.background_tasks, return_exceptions=True)
+        except asyncio.CancelledError:
+            pass
+        logger.info(f"Stopped game session for user {self.user_id}")
+    async def _process_action_queue(self):
+        """Process game actions from the queue"""
+        while True:
+            data = await self.action_queue.get()
+            try:
+                action_type = data.get('action')
+                if action_type == 'start_stream':
+                    result = await self._handle_start_stream(data)
+                elif action_type == 'stop_stream':
+                    result = await self._handle_stop_stream(data)
+                elif action_type == 'keyboard_input':
+                    result = await self._handle_keyboard_input(data)
+                elif action_type == 'mouse_input':
+                    result = await self._handle_mouse_input(data)
+                elif action_type == 'change_scene':
+                    result = await self._handle_scene_change(data)
+                else:
+                    result = {
+                        'action': action_type,
+                        'requestId': data.get('requestId'),
+                        'success': False,
+                        'error': f'Unknown action: {action_type}'
+                    }
+                # Send response back to the client
+                await self.ws.send_json(result)
+                # Update last activity time
+                self.last_activity = time.time()
+            except Exception as e:
+                logger.error(f"Error processing action for user {self.user_id}: {str(e)}")
+                try:
+                    await self.ws.send_json({
+                        'action': data.get('action'),
+                        'requestId': data.get('requestId', 'unknown'),
+                        'success': False,
+                        'error': f'Error processing action: {str(e)}'
+                    })
+                except Exception as send_error:
+                    logger.error(f"Error sending error response: {send_error}")
+            finally:
+                self.action_queue.task_done()
+    async def _handle_start_stream(self, data: Dict) -> Dict:
+        """Handle request to start streaming frames"""
+        if self.is_streaming:
+            return {
+                'action': 'start_stream',
+                'requestId': data.get('requestId'),
+                'success': False,
+                'error': 'Stream already active'
+            }
+        fps = data.get('fps', 16)
+        self.is_streaming = True
+        self.stream_task = asyncio.create_task(self._stream_frames(fps))
+        return {
+            'action': 'start_stream',
+            'requestId': data.get('requestId'),
+            'success': True,
+            'message': f'Streaming started at {fps} FPS'
+        }
+    async def _handle_stop_stream(self, data: Dict) -> Dict:
+        """Handle request to stop streaming frames"""
+        if not self.is_streaming:
+            return {
+                'action': 'stop_stream',
+                'requestId': data.get('requestId'),
+                'success': False,
+                'error': 'No active stream to stop'
+            }
+        self.is_streaming = False
+        if self.stream_task:
+            self.stream_task.cancel()
+            try:
+                await self.stream_task
+            except asyncio.CancelledError:
+                pass
+            self.stream_task = None
+        return {
+            'action': 'stop_stream',
+            'requestId': data.get('requestId'),
+            'success': True,
+            'message': 'Streaming stopped'
+        }
+    async def _handle_keyboard_input(self, data: Dict) -> Dict:
+        """Handle keyboard input from client"""
+        key = data.get('key', '')
+        pressed = data.get('pressed', False)
+        # Map key to keyboard state index
+        key_map = {
+            'w': 0, 'forward': 0,
+            's': 1, 'back': 1, 'backward': 1,
+            'a': 2, 'left': 2,
+            'd': 3, 'right': 3,
+            'space': 4, 'jump': 4,
+            'shift': 5, 'attack': 5, 'ctrl': 5
+        }
+        if key.lower() in key_map:
+            key_idx = key_map[key.lower()]
+            self.keyboard_state[key_idx] = 1 if pressed else 0
+        return {
+            'action': 'keyboard_input',
+            'requestId': data.get('requestId'),
+            'success': True,
+            'keyboardState': self.keyboard_state
+        }
+    async def _handle_mouse_input(self, data: Dict) -> Dict:
+        """Handle mouse movement/input from client"""
+        mouse_x = data.get('x', 0)
+        mouse_y = data.get('y', 0)
+        # Update mouse state, normalize values between -1 and 1
+        self.mouse_state = [float(mouse_x), float(mouse_y)]
+        return {
+            'action': 'mouse_input',
+            'requestId': data.get('requestId'),
+            'success': True,
+            'mouseState': self.mouse_state
+        }
+    async def _handle_scene_change(self, data: Dict) -> Dict:
+        """Handle scene change requests"""
+        scene_name = data.get('scene', 'forest')
+        valid_scenes = self.game_manager.valid_scenes
+        if scene_name not in valid_scenes:
+            return {
+                'action': 'change_scene',
+                'requestId': data.get('requestId'),
+                'success': False,
+                'error': f'Invalid scene: {scene_name}. Valid scenes are: {", ".join(valid_scenes)}'
+            }
+        self.current_scene = scene_name
+        return {
+            'action': 'change_scene',
+            'requestId': data.get('requestId'),
+            'success': True,
+            'scene': scene_name
+        }
+    async def _stream_frames(self, fps: int):
+        """Stream frames to the client at the specified FPS"""
+        frame_interval = 1.0 / fps  # Time between frames in seconds
+        try:
+            while self.is_streaming:
+                start_time = time.time()
+                # Generate frame based on current keyboard and mouse state
+                keyboard_condition = [self.keyboard_state]
+                mouse_condition = [self.mouse_state]
+                # Use the engine to generate the next frame
+                frame_bytes = self.game_manager.engine.generate_frame(
+                    self.current_scene, keyboard_condition, mouse_condition
+                )
+                # Encode as base64 for sending in JSON
+                frame_base64 = base64.b64encode(frame_bytes).decode('utf-8')
+                # Send frame to client
+                await self.ws.send_json({
+                    'action': 'frame',
+                    'frameData': frame_base64,
+                    'timestamp': time.time()
+                })
+                # Calculate sleep time to maintain FPS
+                elapsed = time.time() - start_time
+                sleep_time = max(0, frame_interval - elapsed)
+                await asyncio.sleep(sleep_time)
+        except asyncio.CancelledError:
+            logger.info(f"Frame streaming cancelled for user {self.user_id}")
+        except Exception as e:
+            logger.error(f"Error in frame streaming for user {self.user_id}: {str(e)}")
+            if self.ws.closed:
+                logger.info(f"WebSocket closed for user {self.user_id}")
+                return
+            # Notify client of error
+            try:
+                await self.ws.send_json({
+                    'action': 'frame_error',
+                    'error': f'Streaming error: {str(e)}'
+                })
+            except:
+                pass
+            # Stop streaming
+            self.is_streaming = False
+class GameManager:
+    """
+    Manages all active gaming sessions and shared resources.
+    """
+    def __init__(self, args: argparse.Namespace):
+        self.sessions = {}
+        self.session_lock = asyncio.Lock()
+        # Initialize game engine
+        self.engine = MatrixGameEngine(args)
+        # Load valid scenes from engine
+        self.valid_scenes = self.engine.get_valid_scenes()
+    async def create_session(self, user_id: str, ws: web.WebSocketResponse) -> GameSession:
+        """Create a new game session"""
+        async with self.session_lock:
+            # Create a new session for this user
+            session = GameSession(user_id, ws, self)
+            await session.start()
+            self.sessions[user_id] = session
+            return session
+    async def delete_session(self, user_id: str) -> None:
+        """Delete a game session and clean up resources"""
+        async with self.session_lock:
+            if user_id in self.sessions:
+                session = self.sessions[user_id]
+                await session.stop()
+                del self.sessions[user_id]
+                logger.info(f"Deleted game session for user {user_id}")
+    def get_session(self, user_id: str) -> Optional[GameSession]:
+        """Get a game session if it exists"""
+        return self.sessions.get(user_id)
+    async def close_all_sessions(self) -> None:
+        """Close all active sessions (used during shutdown)"""
+        async with self.session_lock:
+            for user_id, session in list(self.sessions.items()):
+                await session.stop()
+            self.sessions.clear()
+            logger.info("Closed all active game sessions")
+    @property
+    def session_count(self) -> int:
+        """Get the number of active sessions"""
+        return len(self.sessions)
+    def get_session_stats(self) -> Dict:
+        """Get statistics about active sessions"""
+        stats = {
+            'total_sessions': len(self.sessions),
+            'active_scenes': {},
+            'streaming_sessions': 0
+        }
+        # Count sessions by scene and streaming status
+        for session in self.sessions.values():
+            scene = session.current_scene
+            stats['active_scenes'][scene] = stats['active_scenes'].get(scene, 0) + 1
+            if session.is_streaming:
+                stats['streaming_sessions'] += 1
+        return stats
+# Create global game manager
+game_manager = None
+async def status_handler(request: web.Request) -> web.Response:
+    """Handler for API status endpoint"""
+    # Get session statistics
+    session_stats = game_manager.get_session_stats()
+    return web.json_response({
+        'product': 'MatrixGame WebSocket Server',
+        'version': '1.0.0',
+        'active_sessions': session_stats,
+        'available_scenes': game_manager.valid_scenes
+    })
+async def root_handler(request: web.Request) -> web.Response:
+    """Handler for serving the client at the root path"""
+    client_path = pathlib.Path(__file__).parent / 'client' / 'index.html'
+    with open(client_path, 'r') as file:
+        html_content = file.read()
+    return web.Response(text=html_content, content_type='text/html')
+async def websocket_handler(request: web.Request) -> web.WebSocketResponse:
+    """Handle WebSocket connections with robust error handling"""
+    logger.info(f"WebSocket connection attempt - PATH: {request.path}, QUERY: {request.query_string}")
+    # Log request headers at debug level only (could contain sensitive information)
+    logger.debug(f"WebSocket request headers: {dict(request.headers)}")
+    # Prepare a WebSocket response with appropriate settings
+    ws = web.WebSocketResponse(
+        max_msg_size=1024*1024*10,  # 10MB max message size
+        timeout=60.0,
+        heartbeat=30.0  # Add heartbeat to keep connection alive
+    )
+    # Check if WebSocket protocol is supported
+    if not ws.can_prepare(request):
+        logger.error("Cannot prepare WebSocket: WebSocket protocol not supported")
+        return web.Response(status=400, text="WebSocket protocol not supported")
+    try:
+        logger.info("Preparing WebSocket connection...")
+        await ws.prepare(request)
+        # Generate a unique user ID for this connection
+        user_id = str(uuid.uuid4())
+        # Get client IP address
+        peername = request.transport.get_extra_info('peername')
+        if peername is not None:
+            client_ip = peername[0]
+        else:
+            client_ip = request.headers.get('X-Forwarded-For', 'unknown').split(',')[0].strip()
+        # Log connection success
+        logger.info(f"Client {user_id} connecting from IP: {client_ip} - WebSocket connection established")
+        # Mark that the session is established
+        is_session_created = False
+        try:
+            # Store the user ID in the websocket for easy access
+            ws.user_id = user_id
+            # Create a new session for this user
+            logger.info(f"Creating game session for user {user_id}")
+            user_session = await game_manager.create_session(user_id, ws)
+            is_session_created = True
+            logger.info(f"Game session created for user {user_id}")
+        except Exception as session_error:
+            logger.error(f"Error creating game session: {str(session_error)}", exc_info=True)
+            if not ws.closed:
+                await ws.close(code=1011, message=f"Server error: {str(session_error)}".encode())
+            if is_session_created:
+                await game_manager.delete_session(user_id)
+            return ws
+    except Exception as e:
+        logger.error(f"Error establishing WebSocket connection: {str(e)}", exc_info=True)
+        if not ws.closed and ws.prepared:
+            await ws.close(code=1011, message=f"Server error: {str(e)}".encode())
+        return ws
+    # Send initial welcome message
+    try:
+        await ws.send_json({
+            'action': 'welcome',
+            'userId': user_id,
+            'message': 'Welcome to the MatrixGame WebSocket server!',
+            'scenes': game_manager.valid_scenes
+        })
+        logger.info(f"Sent welcome message to user {user_id}")
+    except Exception as welcome_error:
+        logger.error(f"Error sending welcome message: {str(welcome_error)}")
+        if not ws.closed:
+            await ws.close(code=1011, message=b"Failed to send welcome message")
+        await game_manager.delete_session(user_id)
+        return ws
+    try:
+        async for msg in ws:
+            if msg.type == WSMsgType.TEXT:
+                try:
+                    data = json.loads(msg.data)
+                    action = data.get('action')
+                    logger.debug(f"Received {action} message from user {user_id}")
+                    if action == 'ping':
+                        # Respond to ping immediately
+                        await ws.send_json({
+                            'action': 'pong',
+                            'requestId': data.get('requestId'),
+                            'timestamp': time.time()
+                        })
+                    else:
+                        # Route game actions to the session's action queue
+                        await user_session.action_queue.put(data)
+                except json.JSONDecodeError:
+                    logger.error(f"Invalid JSON from user {user_id}: {msg.data}")
+                    if not ws.closed:
+                        await ws.send_json({
+                            'error': 'Invalid JSON message',
+                            'success': False
+                        })
+                except Exception as e:
+                    logger.error(f"Error processing WebSocket message for user {user_id}: {str(e)}")
+                    if not ws.closed:
+                        await ws.send_json({
+                            'action': data.get('action') if 'data' in locals() else 'unknown',
+                            'success': False,
+                            'error': f'Error processing message: {str(e)}'
+                        })
+            elif msg.type == WSMsgType.ERROR:
+                logger.error(f"WebSocket error for user {user_id}: {ws.exception()}")
+                break
+            elif msg.type == WSMsgType.CLOSE:
+                logger.info(f"WebSocket close received for user {user_id} (code: {msg.data}, message: {msg.extra})")
+                break
+            elif msg.type == WSMsgType.CLOSING:
+                logger.info(f"WebSocket closing for user {user_id}")
+                break
+            elif msg.type == WSMsgType.CLOSED:
+                logger.info(f"WebSocket already closed for user {user_id}")
+                break
+    except Exception as ws_error:
+        logger.error(f"Unexpected WebSocket error for user {user_id}: {str(ws_error)}", exc_info=True)
+    finally:
+        # Cleanup session
+        try:
+            logger.info(f"Cleaning up session for user {user_id}")
+            await game_manager.delete_session(user_id)
+            logger.info(f"Connection closed for user {user_id}")
+        except Exception as cleanup_error:
+            logger.error(f"Error during session cleanup for user {user_id}: {str(cleanup_error)}")
+    return ws
+async def init_app(args, base_path="") -> web.Application:
+    """Initialize the web application"""
+    global game_manager
+    # Initialize game manager with command line args
+    game_manager = GameManager(args)
+    app = web.Application(
+        client_max_size=1024**2*10  # 10MB max size
+    )
+    # Add cleanup logic
+    async def cleanup(app):
+        logger.info("Shutting down server, closing all sessions...")
+        await game_manager.close_all_sessions()
+    app.on_shutdown.append(cleanup)
+    # Add routes with CORS headers for WebSockets
+    # Configure CORS for all routes
+    @web.middleware
+    async def cors_middleware(request, handler):
+        if request.method == 'OPTIONS':
+            # Handle preflight requests
+            resp = web.Response()
+            resp.headers['Access-Control-Allow-Origin'] = '*'
+            resp.headers['Access-Control-Allow-Methods'] = 'GET, POST, OPTIONS'
+            resp.headers['Access-Control-Allow-Headers'] = 'Content-Type, X-Requested-With'
+            return resp
+        # Normal request, call the handler
+        resp = await handler(request)
+        # Add CORS headers to the response
+        resp.headers['Access-Control-Allow-Origin'] = '*'
+        resp.headers['Access-Control-Allow-Methods'] = 'GET, POST, OPTIONS'
+        resp.headers['Access-Control-Allow-Headers'] = 'Content-Type, X-Requested-With'
+        return resp
+    app.middlewares.append(cors_middleware)
+    # Add a debug endpoint to help diagnose WebSocket issues
+    async def debug_handler(request):
+        client_ip = request.remote
+        headers = dict(request.headers)
+        server_host = request.host
+        debug_info = {
+            "client_ip": client_ip,
+            "server_host": server_host,
+            "headers": headers,
+            "request_path": request.path,
+            "server_time": time.time(),
+            "base_path": base_path,
+            "websocket_route": f"{base_path}/ws",
+            "all_routes": [route.name for route in app.router.routes() if route.name],
+            "server_info": {
+                "active_sessions": game_manager.session_count,
+                "available_scenes": game_manager.valid_scenes
+            }
+        }
+        return web.json_response(debug_info)
+    # Set up routes with the base_path
+    # Add multiple WebSocket routes to ensure compatibility
+    logger.info(f"Setting up WebSocket route at {base_path}/ws")
+    app.router.add_get(f'{base_path}/ws', websocket_handler, name='ws_handler')
+    # Also add WebSocket route at the root for Hugging Face compatibility
+    if base_path:
+        logger.info(f"Adding additional WebSocket route at /ws")
+        app.router.add_get('/ws', websocket_handler, name='ws_root_handler')
+    # Add routes for API and debug endpoints
+    app.router.add_get(f'{base_path}/api/status', status_handler, name='status_handler')
+    app.router.add_get(f'{base_path}/api/debug', debug_handler, name='debug_handler')
+    # Serve the client at both the base path and root path for compatibility
+    app.router.add_get(f'{base_path}/', root_handler, name='root_handler')
+    # Always serve at the root path for Hugging Face Spaces compatibility
+    if base_path:
+        app.router.add_get('/', root_handler, name='root_handler_no_base')
+    # Set up static file serving for the client assets
+    app.router.add_static(f'{base_path}/assets', pathlib.Path(__file__).parent / 'client', name='static_handler')
+    # Add static file serving at root for compatibility
+    if base_path:
+        app.router.add_static('/assets', pathlib.Path(__file__).parent / 'client', name='static_handler_no_base')
+    return app
+def parse_args() -> argparse.Namespace:
+    """Parse server-specific command line arguments"""
+    parser = argparse.ArgumentParser(description="MatrixGame WebSocket Server")
+    parser.add_argument("--host", type=str, default="0.0.0.0", help="Host IP to bind to")
+    parser.add_argument("--port", type=int, default=8080, help="Port to listen on")
+    parser.add_argument("--path", type=str, default="", help="Base path for the server (for proxy setups)")
+    # Parse server args first
+    server_args, remaining_args = parser.parse_known_args()
+    # Parse model args and combine
+    model_args = parse_model_args()
+    # Combine all args
+    combined_args = argparse.Namespace(**vars(server_args), **vars(model_args))
+    return combined_args
+if __name__ == '__main__':
+    # Configure GPU environment
+    setup_gpu_environment()
+    # Parse command line arguments
+    args = parse_args()
+    # Initialize app
+    loop = asyncio.get_event_loop()
+    app = loop.run_until_complete(init_app(args, base_path=args.path))
+    # Start server
+    logger.info(f"Starting MatrixGame WebSocket Server at {args.host}:{args.port}")
+    web.run_app(app, host=args.host, port=args.port)

api_utils.py ADDED Viewed

	@@ -0,0 +1,202 @@

+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+MatrixGame Utility Functions
+This module contains helper functions and utilities for the MatrixGame project.
+"""
+import os
+import logging
+import argparse
+import torch
+import numpy as np
+import cv2
+from PIL import Image
+from typing import Dict, List, Tuple, Any, Optional, Union
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+)
+logger = logging.getLogger(__name__)
+def setup_gpu_environment():
+    """
+    Configure the GPU environment and log GPU information.
+    Returns:
+        bool: True if CUDA is available, False otherwise
+    """
+    # Set CUDA memory allocation environment variable for better performance
+    os.environ.setdefault("PYTORCH_CUDA_ALLOC_CONF", "expandable_segments:True")
+    # Check if CUDA is available and log information
+    if torch.cuda.is_available():
+        gpu_count = torch.cuda.device_count()
+        gpu_info = []
+        for i in range(gpu_count):
+            gpu_name = torch.cuda.get_device_name(i)
+            gpu_memory = torch.cuda.get_device_properties(i).total_memory / (1024**3)  # Convert to GB
+            gpu_info.append(f"GPU {i}: {gpu_name} ({gpu_memory:.2f} GB)")
+        logger.info(f"CUDA is available. Found {gpu_count} GPU(s):")
+        for info in gpu_info:
+            logger.info(f"  {info}")
+        return True
+    else:
+        logger.warning("CUDA is not available. Running in CPU-only mode.")
+        return False
+def parse_model_args() -> argparse.Namespace:
+    """
+    Parse command line arguments for model paths and configuration.
+    Returns:
+        argparse.Namespace: Parsed arguments
+    """
+    parser = argparse.ArgumentParser(description="MatrixGame Model Configuration")
+    # Model paths
+    parser.add_argument("--model_root", type=str, default="./models/matrixgame",
+                        help="Root directory for model files")
+    parser.add_argument("--dit_path", type=str, default=None,
+                        help="Path to DIT model. If not provided, will use MODEL_ROOT/dit/")
+    parser.add_argument("--vae_path", type=str, default=None,
+                        help="Path to VAE model. If not provided, will use MODEL_ROOT/vae/")
+    parser.add_argument("--textenc_path", type=str, default=None,
+                        help="Path to text encoder model. If not provided, will use MODEL_ROOT")
+    # Model settings
+    parser.add_argument("--inference_steps", type=int, default=20,
+                        help="Number of inference steps for frame generation (lower is faster)")
+    parser.add_argument("--guidance_scale", type=float, default=6.0,
+                        help="Guidance scale for generation")
+    parser.add_argument("--frame_width", type=int, default=640,
+                        help="Width of the generated frames")
+    parser.add_argument("--frame_height", type=int, default=360,
+                        help="Height of the generated frames")
+    parser.add_argument("--num_pre_frames", type=int, default=3,
+                        help="Number of pre-frames for conditioning")
+    parser.add_argument("--fps", type=int, default=16,
+                        help="Frames per second for video")
+    args = parser.parse_args()
+    # Set environment variables for model paths if provided
+    if args.model_root:
+        os.environ.setdefault("MODEL_ROOT", args.model_root)
+    if args.dit_path:
+        os.environ.setdefault("DIT_PATH", args.dit_path)
+    else:
+        os.environ.setdefault("DIT_PATH", os.path.join(os.environ.get("MODEL_ROOT", "./models/matrixgame"), "dit/"))
+    if args.vae_path:
+        os.environ.setdefault("VAE_PATH", args.vae_path)
+    else:
+        os.environ.setdefault("VAE_PATH", os.path.join(os.environ.get("MODEL_ROOT", "./models/matrixgame"), "vae/"))
+    if args.textenc_path:
+        os.environ.setdefault("TEXTENC_PATH", args.textenc_path)
+    else:
+        os.environ.setdefault("TEXTENC_PATH", os.environ.get("MODEL_ROOT", "./models/matrixgame"))
+    return args
+def visualize_controls(frame: np.ndarray, keyboard_condition: List, mouse_condition: List,
+                       frame_width: int, frame_height: int) -> np.ndarray:
+    """
+    Visualize keyboard and mouse controls on the frame.
+    Args:
+        frame: The video frame to visualize on
+        keyboard_condition: Keyboard state as a list
+        mouse_condition: Mouse state as a list
+        frame_width: Width of the frame
+        frame_height: Height of the frame
+    Returns:
+        np.ndarray: Frame with visualized controls
+    """
+    # Clone the frame to avoid modifying the original
+    frame = frame.copy()
+    # If we have keyboard/mouse conditions, visualize them on the frame
+    if keyboard_condition:
+        # Visualize keyboard inputs
+        keys = ["W", "S", "A", "D", "JUMP", "ATTACK"]
+        for i, key_pressed in enumerate(keyboard_condition[0]):
+            color = (0, 255, 0) if key_pressed else (100, 100, 100)
+            cv2.putText(frame, keys[i], (20 + i*100, 30),
+                      cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2)
+    if mouse_condition:
+        # Visualize mouse movement
+        mouse_x, mouse_y = mouse_condition[0]
+        # Scale mouse values for visualization
+        offset_x = int(mouse_x * 100)
+        offset_y = int(mouse_y * 100)
+        center_x, center_y = frame_width // 2, frame_height // 2
+        cv2.circle(frame, (center_x + offset_x, center_y - offset_y), 10, (255, 0, 0), -1)
+        cv2.putText(frame, f"Mouse: {mouse_x:.2f}, {mouse_y:.2f}",
+                   (frame_width - 250, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 0, 0), 2)
+    return frame
+def frame_to_jpeg(frame: np.ndarray, frame_height: int, frame_width: int) -> bytes:
+    """
+    Convert a frame to JPEG bytes.
+    Args:
+        frame: The video frame to convert
+        frame_height: Height of the frame for fallback
+        frame_width: Width of the frame for fallback
+    Returns:
+        bytes: JPEG bytes of the frame
+    """
+    success, buffer = cv2.imencode('.jpg', frame)
+    if not success:
+        logger.error("Failed to encode frame as JPEG")
+        # Return a blank frame
+        blank = np.ones((frame_height, frame_width, 3), dtype=np.uint8) * 100
+        success, buffer = cv2.imencode('.jpg', blank)
+    return buffer.tobytes()
+def load_scene_frames(scene_name: str, frame_width: int, frame_height: int) -> List[np.ndarray]:
+    """
+    Load initial frames for a scene from asset directory.
+    Args:
+        scene_name: Name of the scene
+        frame_width: Width to resize frames to
+        frame_height: Height to resize frames to
+    Returns:
+        List[np.ndarray]: List of frames as numpy arrays
+    """
+    frames = []
+    scene_dir = f"./GameWorldScore/asset/init_image/{scene_name}"
+    if os.path.exists(scene_dir):
+        image_files = sorted([f for f in os.listdir(scene_dir) if f.endswith('.png') or f.endswith('.jpg')])
+        for img_file in image_files:
+            try:
+                img_path = os.path.join(scene_dir, img_file)
+                img = Image.open(img_path).convert("RGB")
+                img = img.resize((frame_width, frame_height))
+                frames.append(np.array(img))
+            except Exception as e:
+                logger.error(f"Error loading image {img_file}: {str(e)}")
+    # If no frames were loaded, create a default colored frame with text
+    if not frames:
+        frame = np.ones((frame_height, frame_height, 3), dtype=np.uint8) * 100
+        # Add scene name as text
+        cv2.putText(frame, f"Scene: {scene_name}", (50, 180),
+                   cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
+        frames.append(frame)
+    return frames

requirements.txt CHANGED Viewed

@@ -1,3 +1,5 @@
 torch>=2.4.0
 torchvision>=0.19.0
 opencv-python>=4.9.0.80

+flash-attn @ https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
 torch>=2.4.0
 torchvision>=0.19.0
 opencv-python>=4.9.0.80

run_api_on_hf.py ADDED Viewed

	@@ -0,0 +1,148 @@

+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+Hugging Face Space launcher for MatrixGame WebSocket Server
+This script launches the server with the appropriate configuration for Hugging Face Spaces.
+"""
+import os
+import sys
+import subprocess
+import logging
+import asyncio
+from aiohttp import web
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+)
+logger = logging.getLogger(__name__)
+def install_apex():
+    """Install NVIDIA Apex at runtime with CUDA support"""
+    try:
+        logger.info("Installing NVIDIA Apex...")
+        # Clone the Apex repository
+        subprocess.check_call([
+            "git", "clone", "https://github.com/NVIDIA/apex"
+        ])
+        # Change to apex directory and install
+        os.chdir("apex")
+        # Try to install with CUDA extensions first
+        try:
+            logger.info("Attempting to install Apex with CUDA extensions...")
+            subprocess.check_call([
+                sys.executable, "-m", "pip", "install", "-v",
+                "--disable-pip-version-check", "--no-cache-dir",
+                "--no-build-isolation", "--global-option=--cpp_ext",
+                "--global-option=--cuda_ext", "./"
+            ])
+            logger.info("Apex installed successfully with CUDA extensions!")
+        except subprocess.CalledProcessError as e:
+            logger.warning(f"Failed to install Apex with CUDA extensions: {e}")
+            logger.info("Falling back to Python-only build...")
+            # Fall back to Python-only build
+            subprocess.check_call([
+                sys.executable, "-m", "pip", "install", "-v",
+                "--disable-pip-version-check", "--no-build-isolation",
+                "--no-cache-dir", "./"
+            ])
+            logger.info("Apex installed successfully (Python-only build)!")
+    except subprocess.CalledProcessError as e:
+        logger.error(f"Failed to install Apex. Error: {e}")
+        # Don't fail the entire startup if Apex installation fails
+        logger.warning("Continuing without Apex...")
+    except Exception as e:
+        logger.error(f"Unexpected error during Apex installation: {e}")
+        logger.warning("Continuing without Apex...")
+    finally:
+        # Change back to original directory
+        os.chdir("..")
+install_apex()
+from api_server import init_app, parse_args
+async def run_async():
+    """Run the server using the async API directly for better stability"""
+    # Get the port from environment variable in Hugging Face Space
+    port = int(os.environ.get("PORT", 7860))
+    # Determine if this is running in a Hugging Face Space
+    is_hf_space = os.environ.get("SPACE_ID") is not None
+    # Set the base path if in a space
+    base_path = ""
+    if is_hf_space:
+        # In Hugging Face Spaces, we're usually behind a proxy
+        space_id = os.environ.get('SPACE_ID', '')
+        # Use empty base path for better WebSocket compatibility
+        # WebSockets often have trouble with subpaths in proxied environments
+        base_path = ""  # or f"/{space_id}" if needed
+        logger.info(f"Running in Hugging Face Space {space_id}")
+    logger.info(f"Initializing application with base_path='{base_path}', port={port}")
+    # Parse default args and override for HF Space
+    args = parse_args()
+    args.port = port
+    args.host = "0.0.0.0"
+    args.path = base_path
+    # Initialize the application
+    app = await init_app(args, base_path=base_path)
+    # Log all routes for debugging
+    routes = sorted([f"{route.method} {route.resource}" for route in app.router.routes() if hasattr(route, 'resource')])
+    logger.info(f"Registered {len(routes)} routes:")
+    for route in routes:
+        logger.info(f" - {route}")
+    # Start the server
+    logger.info(f"Starting server on 0.0.0.0:{port}")
+    runner = web.AppRunner(app)
+    await runner.setup()
+    site = web.TCPSite(runner, '0.0.0.0', port)
+    await site.start()
+    # Keep the server running
+    logger.info("Server started, running indefinitely...")
+    while True:
+        await asyncio.sleep(3600)  # Sleep for an hour
+def main():
+    """Run using the subprocess method (fallback)"""
+    # Get the port from environment variable in Hugging Face Space
+    port = int(os.environ.get("PORT", 7860))
+    # Determine if this is running in a Hugging Face Space
+    is_hf_space = os.environ.get("SPACE_ID") is not None
+    # Pass the appropriate path if in a space
+    path_arg = ""
+    if is_hf_space:
+        # In a space, we're usually behind a proxy, so we need to handle base path
+        # We use empty base path for better WebSocket compatibility
+        path_arg = ""  # or f"--path /{os.environ.get('SPACE_ID', '')}" if needed
+    # Construct and run the command
+    cmd = f"{sys.executable} server.py --host 0.0.0.0 --port {port} {path_arg}"
+    print(f"Running command: {cmd}")
+    subprocess.run(cmd, shell=True)
+if __name__ == "__main__":
+    # First try to run using the async API
+    try:
+        logger.info("Starting server using async API")
+        asyncio.run(run_async())
+    except Exception as e:
+        logger.error(f"Failed to run using async API: {e}", exc_info=True)
+        logger.info("Falling back to subprocess method")
+        main()