Models for "RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments" - https://arxiv.org/abs/2511.07317
Hamish Ivison
hamishivi
AI & ML interests
NLP :)
Recent Activity
updated
a model about 20 hours ago
hamishivi/tmax-qwen3.5-4b-sft-20260313-mlx published
a model about 20 hours ago
hamishivi/tmax-qwen3.5-4b-sft-20260313-mlx updated
a model 1 day ago
hamishivi/random_rewards_8401_step2500 Organizations
RLVE
Models for "RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments" - https://arxiv.org/abs/2511.07317
Large-Scale Data Selection for Instruction Tuning
Datasets and models associated with the paper "Large-Scale Data Selection for Instruction Tuning" (https://arxiv.org/abs/2503.01807)
models 240
hamishivi/tmax-qwen3.5-4b-sft-20260313-mlx
Text Generation • 4B • Updated
• 52
hamishivi/random_rewards_8401_step2500
8B • Updated
• 30
hamishivi/random_rewards_step1_5k
8B • Updated
• 33
hamishivi/random_rewards_step2k
8B • Updated
• 33
hamishivi/step500_test
196k • Updated
• 30
hamishivi/random_rewards_step1000
8B • Updated
• 27
hamishivi/rl_rag_random_rewards_step500
8B • Updated
• 28
hamishivi/1412_rl_rag_open_judge_citation_step2500
8B • Updated
• 3
hamishivi/1412_rl_rag_open_judge_citation_step_2000
8B • Updated
• 1
hamishivi/1412_rl_rag_open_judge_citation_1237_step1500
8B • Updated
• 4
datasets 199
hamishivi/rlenv-appworld-eval
Viewer
• Updated
• 57 • 31
hamishivi/rlenv-appworld-train
Viewer
• Updated
• 90 • 29
hamishivi/rlenv-appworld-eval-nothink
Viewer
• Updated
• 57 • 7
hamishivi/rlenv-appworld-train-nothink
Viewer
• Updated
• 90 • 10
hamishivi/rlenv-guess-number-nothink
Viewer
• Updated
• 100 • 25
hamishivi/rlenv-counter-nothink
Viewer
• Updated
• 100 • 21
hamishivi/agent-task-combined
Preview
• Updated
• 146
hamishivi/rlenv-guess-number
Viewer
• Updated
• 100 • 22
hamishivi/rlenv-counter
Viewer
• Updated
• 100 • 10
hamishivi/rlenv-wordle-nothink
Viewer
• Updated
• 2k • 99