Collections
Discover the best community collections!
Collections including paper arxiv:2305.07759
-
vincentkoc/tiny_qa_benchmark
Viewer • Updated • 52 • 50 • 1 -
vincentkoc/tiny_qa_benchmark_pp
Viewer • Updated • 662 • 1.7k • 1 -
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation
Paper • 2505.12058 • Published • 6 -
roneneldan/TinyStories
Viewer • Updated • 2.14M • 57k • 790
-
AgentInstruct: Toward Generative Teaching with Agentic Flows
Paper • 2407.03502 • Published • 51 -
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Paper • 2406.08464 • Published • 71 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 259 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 31
-
Textbooks Are All You Need
Paper • 2306.11644 • Published • 149 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper • 2309.05463 • Published • 88 -
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Paper • 2305.07759 • Published • 36 -
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Paper • 2406.20094 • Published • 104
-
Better Synthetic Data by Retrieving and Transforming Existing Datasets
Paper • 2404.14361 • Published • 2 -
Generative AI for Synthetic Data Generation: Methods, Challenges and the Future
Paper • 2403.04190 • Published • 1 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 31 -
A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models
Paper • 2404.14445 • Published
-
vincentkoc/tiny_qa_benchmark
Viewer • Updated • 52 • 50 • 1 -
vincentkoc/tiny_qa_benchmark_pp
Viewer • Updated • 662 • 1.7k • 1 -
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation
Paper • 2505.12058 • Published • 6 -
roneneldan/TinyStories
Viewer • Updated • 2.14M • 57k • 790
-
AgentInstruct: Toward Generative Teaching with Agentic Flows
Paper • 2407.03502 • Published • 51 -
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Paper • 2406.08464 • Published • 71 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 259 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 31
-
Textbooks Are All You Need
Paper • 2306.11644 • Published • 149 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper • 2309.05463 • Published • 88 -
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Paper • 2305.07759 • Published • 36 -
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Paper • 2406.20094 • Published • 104
-
Better Synthetic Data by Retrieving and Transforming Existing Datasets
Paper • 2404.14361 • Published • 2 -
Generative AI for Synthetic Data Generation: Methods, Challenges and the Future
Paper • 2403.04190 • Published • 1 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 31 -
A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models
Paper • 2404.14445 • Published