OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Transformer

Model Details

Model type: Spatial Foundation Model for 3D Geometry Reconstruction

Model date: November 2025

Paper: OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Transformer

Code: https://github.com/Livioni/OmniVGGT-official

Authors:

Haosong Peng*, Hao Li*, Yalun Dai, Yushi Lan, Yihang Luo, Tianyu Qi, Zhengshen Zhang, Yufeng Zhan†, Junfei Zhang†, Wenchao Xu†, Ziwei Liu

* Equal Contribution, † Corresponding Author

Model Description

OmniVGGT is a spatial foundation model that can effectively benefit from an arbitrary number of auxiliary geometric modalities (depth, camera intrinsics, and pose) to obtain high-quality 3D geometric results. The model achieves state-of-the-art performance across various downstream tasks and further improves performance on robot manipulation tasks.

Direct Use

Quick Start

import torch
from omnivggt.models.omnivggt import OmniVGGT

# Load model
model = OmniVGGT()
model.load_state_dict(torch.load('path/to/model.pth'))
model.eval()

# Prepare inputs
inputs = {
    'images': images,  # torch.Tensor [B, N, 3, H, W]
    'extrinsics': extrinsics,  # optional
    'intrinsics': intrinsics,  # optional
    'depth': depth,  # optional
    'mask': mask,  # optional
}

# Run inference
with torch.no_grad():
    predictions = model(**inputs)

Command Line Usage

# Basic usage - only images required
python inference.py --image_folder path/to/images/

# With auxiliary camera and depth information
python inference.py \
    --image_folder path/to/images/ \
    --camera_folder path/to/cameras/ \
    --depth_folder path/to/depths/

Technical Specifications

Requirements

Python 3.10+
PyTorch 2.7.0+
CUDA-compatible GPU (recommended)
8GB+ RAM
4GB+ GPU memory

Installation

conda create -n omnivggt python=3.10
conda activate omnivggt
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt

Citation

@article{peng2025omnivggt,
  title={OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Transformer},
  author={Peng, Haosong and Li, Hao and Dai, Yalun and Lan, Yushi and Luo, Yihang and Qi, Tianyu and Zhang, Zhengshen and Zhan, Yufeng and Zhang, Junfei and Xu, Wenchao and Liu, Ziwei},
  journal={arXiv preprint arXiv:2511.10560},
  year={2025}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-to-3D

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Livioni/OmniVGGT

Base model

facebook/VGGT-1B

Finetuned

(3)

this model