NitroGen: An Open Foundation Model for Generalist Gaming Agents
Paper
•
2601.02427
•
Published
•
34
Note Vision-Action model: - Trained on gameplay showing the controller with inputs. NOT REINFORCED LEARNING. - Controller Extraction using SIFT to detect the layout (Zero-shot when compared to YOLO) - Synthetic data showcasing controllers with pressed buttons to detect inputs - SeqFormer with the generated images to learn to detect buttons. - Uses Difussion model based on https://arxiv.org/abs/2503.14734 to generate the inputs using images as conditional variables for the denoising.