OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams

OmniStream is a unified streaming visual backbone that effectively perceives, reconstructs, and acts from diverse visual inputs. By incorporating causal spatiotemporal attention and 3D rotary positional embeddings (3D-RoPE), the model supports efficient, frame-by-frame online processing of video streams via a persistent KV-cache.

Sample Usage

The following code snippet demonstrates how to use OmniStream for feature extraction. Note that this requires the model.py file from the official repository to be present in your environment.

from model import OmnistreamMultiFrameTransformer
from transformers import AutoImageProcessor
import torch
import numpy as np

# Load processor and model
processor = AutoImageProcessor.from_pretrained("StreamFormer/OmniStream")
model = OmnistreamMultiFrameTransformer.from_pretrained("StreamFormer/OmniStream").to("cuda")

model.eval()

# Prepare dummy input: 16 frames of 512x512 RGB images (Batch x Time, Height, Width, Channels)
fake_pixel = np.random.randn(16, 512, 512, 3) 
fake_input = processor(images=fake_pixel, return_tensors="pt").to("cuda") 

# Reshape to (Batch, Time, Channels, Height, Width)
fake_input["pixel_values"] = fake_input["pixel_values"].unsqueeze(0).float() 

with torch.no_grad():
    output = model(**fake_input, return_dict=True)

print(output.keys())
print(output["last_hidden_state"].shape) # last layer's hidden states
print(output["pooler_output"].shape)      # cls token
print(output["patch_start_idx"])         # index of the first patch of each frame

Citation

@article{yan2026omnistream,
  title={OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams}, 
  author={Yibin Yan and Jilan Xu and Shangzhe Di and Haoning Wu and Weidi Xie},
  journal={arXiv preprint arXiv:2603.12265},
  year={2026},
  url={https://arxiv.org/abs/2603.12265}
}
Downloads last month
43
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for StreamFormer/OmniStream