YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Causal World Modeling for Robot Control

LingBot-VA has focused on:

  • Autoregressive Video-Action World Modeling: Architecturally unifies visual dynamics prediction and action inference within a single interleaved sequence while maintaining their conceptual distinction.
  • High-efficiency Execution: A dual-stream mixture-of-transformers(MoT) architecture with Asynchronous Execution and KV Cache.
  • Long-Horizon Performance and Generalization: High improvements in sample efficiency, long-horizon success rates, and generalization to novel scenes.

Model Sources


📦 Model Download

  • Pretrained Checkpoints for Post-Training
Model Name Huggingface Repository ModelScope Repository Description
lingbot-va-base   🤗 robbyant/lingbot-va-base   🤖 Robbyant/lingbot-va-base   LingBot-VA w/ shared backbone
lingbot-va-posttrain-robotwin   🤗 robbyant/lingbot-va-posttrain-robotwin   🤖 Robbyant/lingbot-va-posttrain-robotwin   LingBot-VA-Posttrain-Robotwin w/ shared backbone

📚Citation

@article{lingbot-va2026,
  title={Causal World Modeling for Robot Control},
  author={Li, Lin and Zhang, Qihang and Luo, Yiming and Yang, Shuai and Wang, Ruilin and Han, Fei and Yu, Mingrui and Gao, Zelin and Xue, Nan and Zhu, Xing and Shen, Yujun and Xu, Yinghao},
  journal={arXiv preprint arXiv:[xxxx]},
  year={2026}
}

🪪 License

This project is released under the Apache License 2.0. See LICENSE file for details.

🧩 Acknowledgments

This work builds upon several excellent open-source projects:

  • Wan-Video - Vision transformer backbone
  • MoT - Mixture-of-Transformers architecture
  • The broader open-source computer vision and robotics communities

For questions, discussions, or collaborations:

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including robbyant/lingbot-va-base