OpenGVLab
/

VideoMamba

Video Classification

Model card Files Files and versions

VideoMamba / README.md

ynhe's picture

Update README.md

69502a7 verified almost 2 years ago

|

history blame contribute delete

1.64 kB

	---
	license: apache-2.0
	datasets:
	- AlexFierro9/Kinetics400
	- imagenet-1k
	- HuggingFaceM4/something_something_v2
	language:
	- en
	pipeline_tag: video-classification
	extra_gated_fields:
	Name: text
	Company/Organization: text
	Country: text
	E-Mail: text
	---



	<br>

	# VideoMamba

	## Model Details

	VideoMamba is a purely SSM-based model for video understanding.

	- Developed by: [OpenGVLab](https://github.com/OpenGVLab)
	- Model type: An efficient backbone based on the bidirectional state space model.
	- License: Non-commercial license


	### Model Sources

	- Repository: https://github.com/OpenGVLab/VideoMamba
	- Paper: https://arxiv.org/abs/2403.06977

	## Uses

	The primary use of VideoMamba is research on image and video tasks, e.g., image classification, action recognition, long-term video understanding, and video-text retrieval, with an SSM-based backbone.
	The primary intended users of the model are researchers and hobbyists in computer vision, machine learning, and artificial intelligence.

	## How to Get Started with the Model

	- You can replace the backbone for video tasks with the proposed VideoMamba: https://github.com/OpenGVLab/VideoMamba/blob/main/videomamba/video_sm/models/videomamba.py
	- Then you can load this checkpoint and start training.


	### Citation Information

	```
	@misc{li2024videomamba,
	title={VideoMamba: State Space Model for Efficient Video Understanding},
	author={Kunchang Li and Xinhao Li and Yi Wang and Yinan He and Yali Wang and Limin Wang and Yu Qiao},
	year={2024},
	eprint={2403.06977},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```