siddhant singh

siddhantsingh

https://huggingface.co/ProspectFutures

AI & ML interests

Interested in dependency management and quasi-simulative, bidirectional world foundation models.

Recent Activity

liked a model 25 days ago

Qwen/Qwen3-Omni-30B-A3B-Instruct

reacted to KingNish's post with 🔥 26 days ago

Muon vs MuonClip vs Muon+Adamw Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out. Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW. Takeaway: for small-scale fine-tuning, hybrid = practical and reliable. Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out. Full Blog Link: https://huggingface.co/blog/KingNish/optimizer-part1

liked a model about 1 month ago

facebook/mms-tts-rus

View all activity

Organizations

liked a model 25 days ago

Qwen/Qwen3-Omni-30B-A3B-Instruct

Any-to-Any • 35B • Updated Sep 22, 2025 • 181k • 791

liked 3 models about 1 month ago

liked a dataset about 1 month ago

nvidia/PhysicalAI-Autonomous-Vehicle-Cosmos-Drive-Dreams

Updated Jun 15, 2025 • 81.4k • 33

liked 5 models about 1 month ago

Qwen/Qwen-Image-Edit-2509

Image-to-Image • Updated Sep 22, 2025 • 440k • • 1.04k

openai/gpt-oss-20b

Text Generation • 22B • Updated Aug 26, 2025 • 6.62M • • 4.17k

openai/gpt-oss-120b

Text Generation • 120B • Updated Aug 26, 2025 • 3.45M • • 4.31k

openai/whisper-large-v3

Automatic Speech Recognition • 2B • Updated Aug 12, 2024 • 6.99M • • 5.27k

moonshotai/Kimi-K2-Thinking

Text Generation • Updated Nov 8, 2025 • 323k • • 1.59k

liked a model about 2 months ago

baidu/ERNIE-4.5-VL-28B-A3B-Thinking

Image-Text-to-Text • 30B • Updated 14 days ago • 631 • 515

liked a Space over 1 year ago

Live Portrait

🤪

3.68k

Apply the motion of a video on a portrait

liked a dataset over 2 years ago

euirim/goodwiki

Viewer • Updated Sep 11, 2023 • 44.8k • 121 • 53

liked a model over 2 years ago

jondurbin/airoboros-7b-gpt4-1.2

Text Generation • Updated Jun 22, 2023 • 914 • 28

siddhant singh

AI & ML interests

Recent Activity

Organizations

siddhantsingh's activity

Live Portrait