Training-free Detection of Generated Videos via Spatial-Temporal Likelihoods
Abstract
Following major advances in text and image generation, the video domain has surged, producing highly realistic and controllable sequences. Along with this progress, these models also raise serious concerns about misinformation, making reliable detection of synthetic videos increasingly crucial. Image-based detectors are fundamentally limited because they operate per frame and ignore temporal dynamics, while supervised video detectors generalize poorly to unseen generators, a critical drawback given the rapid emergence of new models. These challenges motivate zero-shot approaches, which avoid synthetic data and instead score content against real-data statistics, enabling training-free, model-agnostic detection. We introduce STALL, a simple, training-free, theoretically justified detector that provides likelihood-based scoring for videos, jointly modeling spatial and temporal evidence within a probabilistic framework. We evaluate STALL on two public benchmarks and introduce ComGenVid, a new benchmark with state-of-the-art generative models. STALL consistently outperforms prior image- and video-based baselines. Code and data are available at https://omerbenhayun.github.io/stall-video.
Community
How can we distinguish generated videos from real ones? In our AI era, generated videos can be great, barely distinguishable by a human-eye, can you think of how this can be used for bad things? Thus, it is inevitable to develop a good practices to distinguish these videos, and even mark them. We develop a unique and theory-backed method to catch them.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- EA-Swin: An Embedding-Agnostic Swin Transformer for AI-Generated Video Detection (2026)
- SynthForensics: A Multi-Generator Benchmark for Detecting Synthetic Video Deepfakes (2026)
- RealStats: A Rigorous Real-Only Statistical Framework for Fake Image Detection (2026)
- Novel Semantic Prompting for Zero-Shot Action Recognition (2026)
- Consistency-Preserving Diverse Video Generation (2026)
- When Detectors Forget Forensics: Blocking Semantic Shortcuts for Generalizable AI-Generated Image Detection (2026)
- SemanticMoments: Training-Free Motion Similarity via Third Moment Features (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper