Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Anurag Yadav's picture

1 1

Anurag Yadav

harryadav3

·

AI & ML interests

None yet

Organizations

None yet

harryadav3 's collections 10

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26, 2025 • 139
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Paper • 2510.14528 • Published Oct 16, 2025 • 111
MinerU: An Open-Source Solution for Precise Document Content Extraction

Paper • 2409.18839 • Published Sep 27, 2024 • 37

WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents

Paper • 2509.06501 • Published Sep 8, 2025 • 79
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research

Paper • 2509.13312 • Published Sep 16, 2025 • 105

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 306
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30, 2025 • 544

3D and 4D World Modeling: A Survey

Paper • 2509.07996 • Published Sep 4, 2025 • 58
StealthAttack: Robust 3D Gaussian Splatting Poisoning via Density-Guided Illusions

Paper • 2510.02314 • Published Oct 2, 2025 • 60

A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

Paper • 2508.21148 • Published Aug 28, 2025 • 140
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 306

Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation

Paper • 2510.01284 • Published Sep 30, 2025 • 34
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

Paper • 2410.17799 • Published Oct 23, 2024 • 8

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 316
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 263

Reconstruction Alignment Improves Unified Multimodal Models

Paper • 2509.07295 • Published Sep 8, 2025 • 40
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

Paper • 2509.03867 • Published Sep 4, 2025 • 210

videogeneration

HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

Paper • 2509.08519 • Published Sep 10, 2025 • 128
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26, 2025 • 139

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2, 2025 • 228
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications

Paper • 2508.16279 • Published Aug 22, 2025 • 53
Scaling Agents via Continual Pre-training

Paper • 2509.13310 • Published Sep 16, 2025 • 117
Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5, 2025 • 122

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26, 2025 • 139
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Paper • 2510.14528 • Published Oct 16, 2025 • 111
MinerU: An Open-Source Solution for Precise Document Content Extraction

Paper • 2409.18839 • Published Sep 27, 2024 • 37

Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation

Paper • 2510.01284 • Published Sep 30, 2025 • 34
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

Paper • 2410.17799 • Published Oct 23, 2024 • 8

WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents

Paper • 2509.06501 • Published Sep 8, 2025 • 79
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research

Paper • 2509.13312 • Published Sep 16, 2025 • 105

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 316
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 263

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 306
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30, 2025 • 544

Reconstruction Alignment Improves Unified Multimodal Models

Paper • 2509.07295 • Published Sep 8, 2025 • 40
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

Paper • 2509.03867 • Published Sep 4, 2025 • 210

3D and 4D World Modeling: A Survey

Paper • 2509.07996 • Published Sep 4, 2025 • 58
StealthAttack: Robust 3D Gaussian Splatting Poisoning via Density-Guided Illusions

Paper • 2510.02314 • Published Oct 2, 2025 • 60

videogeneration

HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

Paper • 2509.08519 • Published Sep 10, 2025 • 128
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26, 2025 • 139

A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

Paper • 2508.21148 • Published Aug 28, 2025 • 140
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 306

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2, 2025 • 228
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications

Paper • 2508.16279 • Published Aug 22, 2025 • 53
Scaling Agents via Continual Pre-training

Paper • 2509.13310 • Published Sep 16, 2025 • 117
Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5, 2025 • 122

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs