-
-
-
-
-
-
Inference Providers
Active filters:
grpo
snap-stanford/humanlm-opinion
Text Generation
•
8B
•
Updated
•
83
•
9
LightningRodLabs/Trump-Forecaster
Text Generation
•
Updated
•
103
•
4
LightningRodLabs/Golf-Forecaster
Text Generation
•
Updated
•
26
•
4
webxos/microclaw-for-openclaw-version-2026.2.17
Text Generation
•
Updated
•
139
•
2
Image-Text-to-Text
•
4B
•
Updated
•
21
•
2
lightx2v/Wan2.1-T2V-1.3B-longcat-step500
Text-to-Video
•
Updated
•
91
•
4
mradermacher/MetaphorStar-3B-GGUF
Reinforcement Learning
•
3B
•
Updated
•
669
•
1
mradermacher/MetaphorStar-3B-i1-GGUF
Reinforcement Learning
•
3B
•
Updated
•
2.58k
•
1
Text Generation
•
0.1B
•
Updated
•
5
8B
•
Updated
sergiopaniego/Qwen2-0.5B-GRPO-test
Updated
Novaciano/ESP-NSFW-GRPO-1B-Sin_Censura-GGUF
1B
•
Updated
•
68
•
4
nbd22/Llama-3.1-8B-Instruct-GRPO-gsm8k-ft-lora
Updated
sergiopaniego/Qwen2-0.5B-GRPO
Updated
philschmid/qwen-2.5-3b-r1-countdown
Text Generation
•
3B
•
Updated
•
9
•
8
spinech/qwen-2.5-3b-r1-countdown
Text Generation
•
3B
•
Updated
•
4
Dongwei/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
•
2B
•
Updated
•
3
•
1
spinech/qwen2.5-3b-r1-rearc-stage1
Text Generation
•
3B
•
Updated
•
9
Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO
Text Generation
•
8B
•
Updated
•
11
•
1
MasterControlAIML/DeepSeek-R1-Strategy-Qwen-2.5-1.5b-Unstructured-To-Structured
Text Generation
•
2B
•
Updated
•
11
•
5
mradermacher/DeepSeek-R1-Strategy-Qwen-2.5-1.5b-Unstructured-To-Structured-GGUF
2B
•
Updated
•
198
•
2
hyunw3/qwen-2.5-0.5b-r1-countdown
Text Generation
•
0.5B
•
Updated
hyunw3/qwen-2.5-0.5b-r1-countdown_lr1.0e-6
Text Generation
•
0.5B
•
Updated
•
4
mgaimm/qwen-2.5-3b-r1-countdown
Text Generation
•
3B
•
Updated
•
1
MasterControlAIML/DeepSeek-R1-Qwen-2.5-1.5b-Latest-Unstructured-To-Structured
Text Generation
•
Updated
•
18
•
5
tuyentx/qwen-2.5-3b-r1-countdown
Text Generation
•
3B
•
Updated
pablo-chocobar/qwen-2.5-3b-r1-countdown
Text Generation
•
3B
•
Updated
•
2