arxiv:2511.16397
Qiu Jiantao
qiujiantao
ยท
AI & ML interests
None yet
Recent Activity
authored
a paper
about 20 hours ago
Unsupervised Topic Models are Data Mixers for Pre-training Language
Models
authored
a paper
about 20 hours ago
AICC: Parse HTML Finer, Make Models Better -- A 7.3T AI-Ready Corpus Built by a Model-Based HTML Parser
liked
a model
8 days ago
opendatalab/MinerU-HTML