Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2504.16929

I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23 • 29
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities

Paper • 2504.16078 • Published Apr 22 • 21
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents

Paper • 2504.15785 • Published Apr 22 • 21
OTC: Optimal Tool Calls via Reinforcement Learning

Paper • 2504.14870 • Published Apr 21 • 35

I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23 • 29

microsoft/bitnet-b1.58-2B-4T

Text Generation • 0.8B • Updated May 1 • 7.84k • 1.22k
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14 • 15
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct

Text Generation • 8B • Updated Apr 17 • 140 • 15
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15 • 63

Foundational Deep Learning - Architecture

Forgetting Transformer: Softmax Attention with a Forget Gate

Paper • 2503.02130 • Published Mar 3 • 32
L^2M: Mutual Information Scaling Law for Long-Context Language Modeling

Paper • 2503.04725 • Published Mar 6 • 21
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 171
I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23 • 29

Theory, Conceptualization, Paradigms

Distillation Scaling Laws

Paper • 2502.08606 • Published Feb 12 • 47
I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23 • 29
Chain-of-Model Learning for Language Model

Paper • 2505.11820 • Published May 17 • 121

I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23 • 29

Theory and Representation learning

about 1 month ago

I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23 • 29
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens

Paper • 2508.05305 • Published Aug 7 • 46
The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms

Paper • 2511.04217 • Published Nov 6 • 15

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2 • 9
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10 • 43
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14 • 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14 • 85

Redundancy Principles for MLLMs Benchmarks

Paper • 2501.13953 • Published Jan 20 • 29
Autonomy-of-Experts Models

Paper • 2501.13074 • Published Jan 22 • 44
Distillation Scaling Laws

Paper • 2502.08606 • Published Feb 12 • 47
Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14 • 123

deepseek-ai/DeepSeek-R1

Text Generation • 685B • Updated Mar 27 • 1.21M • • 12.9k
deepseek-ai/DeepSeek-V3

Text Generation • 685B • Updated Mar 27 • 721k • • 4k
mistralai/Mistral-Small-24B-Instruct-2501

24B • Updated Jul 28 • 426k • 948
deepseek-ai/Janus-Pro-1B

Any-to-Any • Updated Feb 1 • 8.81k • 465

I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23 • 29
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities

Paper • 2504.16078 • Published Apr 22 • 21
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents

Paper • 2504.15785 • Published Apr 22 • 21
OTC: Optimal Tool Calls via Reinforcement Learning

Paper • 2504.14870 • Published Apr 21 • 35

I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23 • 29

I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23 • 29

Theory and Representation learning

about 1 month ago

I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23 • 29
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens

Paper • 2508.05305 • Published Aug 7 • 46
The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms

Paper • 2511.04217 • Published Nov 6 • 15

microsoft/bitnet-b1.58-2B-4T

Text Generation • 0.8B • Updated May 1 • 7.84k • 1.22k
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14 • 15
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct

Text Generation • 8B • Updated Apr 17 • 140 • 15
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15 • 63

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2 • 9
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10 • 43
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14 • 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14 • 85

Foundational Deep Learning - Architecture

Forgetting Transformer: Softmax Attention with a Forget Gate

Paper • 2503.02130 • Published Mar 3 • 32
L^2M: Mutual Information Scaling Law for Long-Context Language Modeling

Paper • 2503.04725 • Published Mar 6 • 21
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 171
I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23 • 29

Redundancy Principles for MLLMs Benchmarks

Paper • 2501.13953 • Published Jan 20 • 29
Autonomy-of-Experts Models

Paper • 2501.13074 • Published Jan 22 • 44
Distillation Scaling Laws

Paper • 2502.08606 • Published Feb 12 • 47
Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14 • 123

Theory, Conceptualization, Paradigms

Distillation Scaling Laws

Paper • 2502.08606 • Published Feb 12 • 47
I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23 • 29
Chain-of-Model Learning for Language Model

Paper • 2505.11820 • Published May 17 • 121

deepseek-ai/DeepSeek-R1

Text Generation • 685B • Updated Mar 27 • 1.21M • • 12.9k
deepseek-ai/DeepSeek-V3

Text Generation • 685B • Updated Mar 27 • 721k • • 4k
mistralai/Mistral-Small-24B-Instruct-2501

24B • Updated Jul 28 • 426k • 948
deepseek-ai/Janus-Pro-1B

Any-to-Any • Updated Feb 1 • 8.81k • 465

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs