wo-datacraft
's Collections
Toolkit - Papers
updated
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper
•
2511.18538
•
Published
•
251
Neural Machine Translation by Jointly Learning to Align and Translate
Paper
•
1409.0473
•
Published
•
7
Attention Is All You Need
Paper
•
1706.03762
•
Published
•
105
BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
Paper
•
1810.04805
•
Published
•
24
Hierarchical Reasoning Model
Paper
•
2506.21734
•
Published
•
46
Scaling Laws for Neural Language Models
Paper
•
2001.08361
•
Published
•
9
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
lighter
Paper
•
1910.01108
•
Published
•
21
Language Models are Few-Shot Learners
Paper
•
2005.14165
•
Published
•
18
LoRA: Low-Rank Adaptation of Large Language Models
Paper
•
2106.09685
•
Published
•
54
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Paper
•
2005.11401
•
Published
•
14
Training language models to follow instructions with human feedback
Paper
•
2203.02155
•
Published
•
24
Switch Transformers: Scaling to Trillion Parameter Models with Simple
and Efficient Sparsity
Paper
•
2101.03961
•
Published
•
13
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Paper
•
2208.07339
•
Published
•
5
PaLM: Scaling Language Modeling with Pathways
Paper
•
2204.02311
•
Published
•
3
A Survey on Large Language Model based Autonomous Agents
Paper
•
2308.11432
•
Published
•
3
Paper
•
2303.08774
•
Published
•
7
Large Language Models are Zero-Shot Reasoners
Paper
•
2205.11916
•
Published
•
3
Principled Instructions Are All You Need for Questioning LLaMA-1/2,
GPT-3.5/4
Paper
•
2312.16171
•
Published
•
37
Toolformer: Language Models Can Teach Themselves to Use Tools
Paper
•
2302.04761
•
Published
•
12
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts
Language Model
Paper
•
2405.04434
•
Published
•
24
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
•
2501.12948
•
Published
•
429
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
•
2505.03335
•
Published
•
188
Paper
•
2505.09388
•
Published
•
317
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future
Directions
Paper
•
2505.00675
•
Published
•
3
Small Language Models are the Future of Agentic AI
Paper
•
2506.02153
•
Published
•
22
gpt-oss-120b & gpt-oss-20b Model Card
Paper
•
2508.10925
•
Published
•
12
Large Language Diffusion Models
Paper
•
2502.09992
•
Published
•
123
Less is More: Recursive Reasoning with Tiny Networks
Paper
•
2510.04871
•
Published
•
497
Poisoning Attacks on LLMs Require a Near-constant Number of Poison
Samples
Paper
•
2510.07192
•
Published
•
5
A Survey of Vibe Coding with Large Language Models
Paper
•
2510.12399
•
Published
•
48
Denoising Diffusion Probabilistic Models
Paper
•
2006.11239
•
Published
•
6
Denoising Diffusion Implicit Models
Paper
•
2010.02502
•
Published
•
4
Score-Based Generative Modeling through Stochastic Differential
Equations
Paper
•
2011.13456
•
Published
•
2
Learning Transferable Visual Models From Natural Language Supervision
Paper
•
2103.00020
•
Published
•
19
Hierarchical Text-Conditional Image Generation with CLIP Latents
Paper
•
2204.06125
•
Published
•
3
Classifier-Free Diffusion Guidance
Paper
•
2207.12598
•
Published
•
4