arxiv:2603.15026

Training-free Detection of Generated Videos via Spatial-Temporal Likelihoods

Published on Mar 16

· Submitted by

Yossi levi on Mar 17

Technion Israel institute of technology

Upvote

Authors:

Meir Yossef Levi ,

Levi Kassel ,

Abstract

Following major advances in text and image generation, the video domain has surged, producing highly realistic and controllable sequences. Along with this progress, these models also raise serious concerns about misinformation, making reliable detection of synthetic videos increasingly crucial. Image-based detectors are fundamentally limited because they operate per frame and ignore temporal dynamics, while supervised video detectors generalize poorly to unseen generators, a critical drawback given the rapid emergence of new models. These challenges motivate zero-shot approaches, which avoid synthetic data and instead score content against real-data statistics, enabling training-free, model-agnostic detection. We introduce STALL, a simple, training-free, theoretically justified detector that provides likelihood-based scoring for videos, jointly modeling spatial and temporal evidence within a probabilistic framework. We evaluate STALL on two public benchmarks and introduce ComGenVid, a new benchmark with state-of-the-art generative models. STALL consistently outperforms prior image- and video-based baselines. Code and data are available at https://omerbenhayun.github.io/stall-video.

View arXiv page View PDF Project page GitHub 2 Add to collection

Community

Yossilevii100

Paper author Paper submitter 1 day ago

How can we distinguish generated videos from real ones? In our AI era, generated videos can be great, barely distinguishable by a human-eye, can you think of how this can be used for bad things? Thus, it is inevitable to develop a good practices to distinguish these videos, and even mark them. We develop a unique and theory-backed method to catch them.