Jacob LIU

JacobLIU1024

Jacob-liu1996

AI & ML interests

None yet

Recent Activity

liked a model 10 days ago

openbmb/MiniCPM-SALA

liked a dataset 13 days ago

openbmb/UltraData-Math

liked a model 18 days ago

openbmb/MiniCPM-o-4_5

View all activity

Organizations

None yet

liked a model 10 days ago

openbmb/MiniCPM-SALA

Text Generation • Updated 10 days ago • 4.66k • 470

liked a dataset 13 days ago

openbmb/UltraData-Math

Viewer • Updated 1 day ago • 181M • 40.8k • 239

liked a model 18 days ago

openbmb/MiniCPM-o-4_5

Any-to-Any • Updated 8 days ago • 77.6k • 870

liked 2 models about 1 month ago

openbmb/AgentCPM-Report

Text Generation • 8B • Updated 11 days ago • 5.01k • 243

openbmb/AgentCPM-Explore

Text Generation • Updated Jan 18 • 922 • 325

upvoted 3 articles 2 months ago

Article

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

Mar 20, 2024

•

109

Article

Cosmopedia：如何为大语言模型预训练构建大规模合成数据集

Mar 20, 2024

•

Article

We Got Claude to Fine-Tune an Open Source LLM

Dec 4, 2025

•

599

liked 2 Spaces 3 months ago

FineWeb: decanting the web for the finest text data at scale

🍷

1.3k

Generate a curated web‑text dataset for LLM training

The Smol Training Playbook

📚

The secrets to building world-class LLMs

upvoted a collection 4 months ago

Qwen3

Collection

84 items • Updated Dec 31, 2025 • 1.67k

liked a model 5 months ago

openbmb/VoxCPM-0.5B

Text-to-Speech • Updated Sep 19, 2025 • 786 • 766

liked 2 models 6 months ago

openbmb/MiniCPM4.1-8B

Text Generation • Updated Oct 24, 2025 • 22.4k • 383

openbmb/MiniCPM-V-4_5

Image-Text-to-Text • Updated Dec 18, 2025 • 54.7k • 1.07k

upvoted a collection 6 months ago

MiniCPM-o & MiniCPM-V

Collection

Multimodal models with leading performance. • 31 items • Updated 13 days ago • 67

liked a model over 1 year ago

Salesforce/blip-image-captioning-base

Image-to-Text • Updated Feb 3, 2025 • 2.38M • 842

Jacob LIU

AI & ML interests

Recent Activity

Organizations

JacobLIU1024's activity

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

Cosmopedia：如何为大语言模型预训练构建大规模合成数据集

We Got Claude to Fine-Tune an Open Source LLM

FineWeb: decanting the web for the finest text data at scale

The Smol Training Playbook