Collections
Discover the best community collections!
Collections including paper arxiv:2402.09371
-
SELF: Language-Driven Self-Evolution for Large Language Model
Paper • 2310.00533 • Published • 2 -
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length
Paper • 2310.00576 • Published • 2 -
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
Paper • 2305.13169 • Published • 3 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 24 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15 -
Triple-Encoders: Representations That Fire Together, Wire Together
Paper • 2402.12332 • Published • 2 -
BERTs are Generative In-Context Learners
Paper • 2406.04823 • Published • 1
-
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 31 -
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
Paper • 2402.10790 • Published • 42
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 24 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15 -
A Thorough Examination of Decoding Methods in the Era of LLMs
Paper • 2402.06925 • Published • 1 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108
-
Measuring the Effects of Data Parallelism on Neural Network Training
Paper • 1811.03600 • Published • 2 -
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Paper • 1804.04235 • Published • 2 -
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Paper • 1905.11946 • Published • 3 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 65
-
In-Context Language Learning: Architectures and Algorithms
Paper • 2401.12973 • Published • 4 -
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15 -
Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers
Paper • 2412.12276 • Published • 15
-
SELF: Language-Driven Self-Evolution for Large Language Model
Paper • 2310.00533 • Published • 2 -
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length
Paper • 2310.00576 • Published • 2 -
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
Paper • 2305.13169 • Published • 3 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 24 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15 -
A Thorough Examination of Decoding Methods in the Era of LLMs
Paper • 2402.06925 • Published • 1 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 24 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15 -
Triple-Encoders: Representations That Fire Together, Wire Together
Paper • 2402.12332 • Published • 2 -
BERTs are Generative In-Context Learners
Paper • 2406.04823 • Published • 1
-
Measuring the Effects of Data Parallelism on Neural Network Training
Paper • 1811.03600 • Published • 2 -
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Paper • 1804.04235 • Published • 2 -
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Paper • 1905.11946 • Published • 3 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 65
-
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 31 -
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
Paper • 2402.10790 • Published • 42
-
In-Context Language Learning: Architectures and Algorithms
Paper • 2401.12973 • Published • 4 -
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15 -
Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers
Paper • 2412.12276 • Published • 15