Open to Collab

6 130 171

Shyam Sunder Kumar

theainerd

AI & ML interests

Natural Language Processing

Recent Activity

liked a model 1 day ago

microsoft/VibeVoice-Realtime-0.5B

upvoted a collection 3 days ago

Inference Optimized Checkpoints (with Model Optimizer)

liked a model 3 days ago

nvidia/multitalker-parakeet-streaming-0.6b-v1

View all activity

Organizations

liked a model 1 day ago

microsoft/VibeVoice-Realtime-0.5B

Text-to-Speech • 1B • Updated 1 day ago • 20.1k • 348

upvoted a collection 3 days ago

Inference Optimized Checkpoints (with Model Optimizer)

Collection

A collection of generative models quantized and optimized for inference with TensorRT Model Optimizer. • 45 items • Updated 3 days ago • 62

liked a model 3 days ago

nvidia/multitalker-parakeet-streaming-0.6b-v1

Audio Classification • Updated about 20 hours ago • 94 • 28

updated a collection 3 days ago

Safety & Security

Collection

18 items • Updated 3 days ago

liked a model 3 days ago

mistralai/Mistral-Large-3-675B-Instruct-2512

Updated 3 days ago • 274 • 161

upvoted a collection 3 days ago

Mistral Large 3

Collection

A state-of-the-art, open-weight, general-purpose multimodal model with a granular Mixture-of-Experts architecture. • 4 items • Updated 4 days ago • 69

liked a dataset 4 days ago

nvidia/Nemotron-AIQ-Agentic-Safety-Dataset-1.0

Viewer • Updated 23 days ago • 10.8k • 376 • 6

liked a model 5 days ago

deepseek-ai/DeepSeek-V3.2-Speciale

Text Generation • 685B • Updated 5 days ago • 4.72k • 519

liked a model 7 days ago

nvidia/Nemotron-Orchestrator-8B

Text Generation • 8B • Updated 4 days ago • 2.53k • 345

reacted to danielhanchen's post with 🔥 7 days ago

Post

8141

Qwen3-Next can now be Run locally! (30GB RAM)
Instruct GGUF: unsloth/Qwen3-Next-80B-A3B-Instruct-GGUF

The models come in Thinking and Instruct versions and utilize a new architecture, allowing it to have ~10x faster inference than Qwen32B.
💜 Step-by-step Guide: https://docs.unsloth.ai/models/qwen3-next

Thinking GGUF: unsloth/Qwen3-Next-80B-A3B-Thinking-GGUF

liked a model 9 days ago

deepseek-ai/DeepSeek-Math-V2

Text Generation • 685B • Updated 9 days ago • 8.96k • 636

upvoted an article 10 days ago

Article

Continuous batching from first principles

12 days ago

•

240

liked a model 10 days ago

black-forest-labs/FLUX.2-dev

Image-to-Image • Updated 9 days ago • 196k • • 909

liked 2 datasets 11 days ago

Tevatron/browsecomp-plus-corpus

Viewer • Updated Aug 23 • 100k • 12.3k • 14

ScaleAI/SWE-bench_Pro

Viewer • Updated Sep 25 • 731 • 15.2k • 39

reacted to mitkox's post with 🔥 12 days ago

Post

3023

I run 20 AI coding agents locally on my desktop workstation at 400+ tokens/sec with MiniMax-M2. It’s a Sonnet drop-in replacement in my Cursor, Claude Code, Droid, Kilo and Cline peak at 11k tok/sec input and 433 tok/s output, can generate 1B+ tok/m.All with 196k context window. I'm running it for 6 days now with this config.

Today max performance was stable at 490.2 tokens/sec across 48 concurrent clients and MiniMax M2.

Z8 Fury G5, Xeon 3455, 4xA6K. Aibrix 0.5.0, vLLM 0.11.2,