Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Open to Work
3
2
18
Tarun Reddi
PRO
Teen-Different
Follow
charvi020's profile picture
1 follower
·
16 following
https://redditarun.github.io/
_TeenDifferent
REDDITARUN
tarunreddi
AI & ML interests
Generative AI, Modular AI Systems, Reinforcement Learning
Recent Activity
posted
an
update
1 day ago
Interesting... looked into Apple's DiffuCoder and the masked diffusion approach is actually hitting SOTA parity... basicallly proving global MDLM can work for code https://arxiv.org/pdf/2506.20639 but then you look at Tiny-A2D results and it’s the complete opposite...BD3LM (block diffusion) totally outperforms MDLM... and then both MDLM and BD3LM models struggle hard compared to the AR baselines... https://github.com/ZHZisZZ/dllm/tree/main/examples/a2d digging into the why and i think it comes down to the adaptation method....tiny-A2D just SFT’d an AR model adaption to force it into diffusion... asking a model wired for left to right causal attention to suddenly think bidirectionally is a massive shock... it struggles to unlearn that strong AR inductive bias ...that explains why BD3LM worked better in their case... since it generates in chunks it preserves some sequential order... acts like a bridge or crutch that feels more natural to the original Qwen weights contrast that with Apple... they didn't just SFT...they pre-trained/adapted on 130B tokens... fundamentally rewiring the model to understand global dependencies from the ground up my theory is if we want MDLM to actually work we can’t just SFT... we need that heavy adaptation or full pre-training phase to break the causal priors... otherwise the model just gets confused
posted
an
update
4 days ago
STRAW: Sample-Tuned Rank-Augmented Weights Dropped a write-up on a weird direction I’ve been exploring lately. It’s basically a lightweight, low-rank hypernetwork that rewrites a model’s weights per sample instead of running everything through a frozen graph. I’m calling it STRAW (Sample-Tuned Rank-Augmented Weights), a tiny modulator that generates dynamic LoRA-style updates on the fly, so the model morphs its own parameters based on the input’s “vibe.” The good part is it actually trains stably and outperforms a static baseline in low-data regimes. Not SOTA (obviously), but the fact that real-time weight modulation didn’t collapse was the whole point. Next step is making the modulator actually understand geometry instead of just reacting to pixels. This is still early more “research log” than polished result but it opens up a fun direction toward liquid-ish, context-shaped networks instead of rigid ones. Blog: https://teendifferent.substack.com/p/sample-tuned-rank-augmented-weights
posted
an
update
17 days ago
Revisiting LIME (Local Interpretable Model-agnostic Explanations) in the age of foundation models feels archaic, but the intuition holds: accuracy isn't a proxy for understanding. I ran an experiment pitting CNNs against Transformers using a custom SLIC perturbation pipeline to see what they were actually looking at. The results say, models are lazy students. • ViT didn’t see a "Jeep"; it recognized a "muddy road" and used a dataset shortcut to guess the vehicle. • EfficientNet hallucinated a "toaster" just because it saw a white counter. High confidence based on background noise is a liability. If you aren't visually auditing your decision boundaries, you're just hoping for the best. Full breakdown of the "Clever Hans" effect below. 👇 https://teendifferent.substack.com/p/your-features-arent-what-you-think
View all activity
Organizations
Teen-Different
's models
1
Sort: Recently updated
Teen-Different/F.E.A.S.T
Object Detection
•
Updated
Mar 30