AI & ML interests

Accelerating DL

Recent Activity

Jingyaย  updated a model 14 minutes ago
optimum/bge-m3-neuron-bs16-sl1024
Jingyaย  published a model 44 minutes ago
optimum/bge-m3-neuron-bs16-sl1024
Jingyaย  published a model about 2 hours ago
optimum/bge-base-en-v1.5-neuronx-bs-32
View all activity

IlyasMoutawwakilย 
posted an update 14 days ago
view post
Post
2974
Transformers v5 just landed! ๐Ÿš€
It significantly unifies and reduces modeling code across architectures, while opening the door to a whole new class of performance optimizations.

My favorite new feature? ๐Ÿค”
The new dynamic weight loader + converter. Hereโ€™s why ๐Ÿ‘‡

Over the last few months, the core Transformers maintainers built an incredibly fast weight loader, capable of converting tensors on the fly while loading them in parallel threads. This means weโ€™re no longer constrained by how parameters are laid out inside the safetensors weight files.

In practice, this unlocks two big things:
- Much more modular modeling code. You can now clearly see how architectures build on top of each other (DeepSeek v2 โ†’ v3, Qwen v2 โ†’ v3 โ†’ MoE, etc.). This makes shared bottlenecks obvious and lets us optimize the right building blocks once, for all model families.
- Performance optimizations beyond what torch.compile can do alone. torch.compile operates on the computation graph, but it canโ€™t change parameter layouts. With the new loader, we can restructure weights at load time: fusing MoE expert projections, merging attention QKV projections, and enabling more compute-dense kernels that simply werenโ€™t possible before.

Personally, I'm honored to have contributed in this direction, including the work on optimizing MoE implementations and making modeling code more torch-exportable, so these optimizations can be ported cleanly across runtimes.

Overall, Transformers v5 is a strong signal of where the community and industry are converging: Modularity and Performance, without sacrificing Flexibility.

Transformers v5 makes its signature from_pretrained an entrypoint where you can mix and match:
- Parallelism
- Quantization
- Custom kernels
- Flash/Paged attention
- Continuous batching
- ...

Kudos to everyone involved! I highly recommend the:
Release notes: https://github.com/huggingface/transformers/releases/tag/v5.0.0
Blog post: https://huggingface.co/blog/transformers-v5
ยท
IlyasMoutawwakilย 
posted an update 19 days ago
view post
Post
2346
After 2 months of refinement, I'm happy to announce that a lot of Transformers' modeling code is now significantly more torch-compile & export-friendly ๐Ÿ”ฅ

Why it had to be done ๐Ÿ‘‡
PyTorch's Dynamo compiler is increasingly becoming the default interoperability layer for ML systems. Anything that relies on torch.export or torch.compile, from model optimization to cross-framework integrations, benefits directly when models can be captured as a single dynamo-traced graph !

Transformers models are now easier to:
โš™๏ธ Compile end-to-end with torch.compile backends
๐Ÿ“ฆ Export reliably via torch.export and torch.onnx.export
๐Ÿš€ Deploy to ONNX / ONNX Runtime, Intel Corporation's OpenVINO, NVIDIA AutoDeploy (TRT-LLM), AMD's Quark, Meta's Executorch and more hardware-specific runtimes.

This work aims at unblocking entire TorchDynamo-based toolchains that rely on exporting Transformers across runtimes and accelerators.

We are doubling down on Transformers commitment to be a first-class citizen of the PyTorch ecosystem, more exportable, more optimizable, and easier to deploy everywhere.

There are definitely some edge-cases that we still haven't addressed so don't hesitate to try compiling / exporting your favorite transformers and to open issues / PRs.

PR in the comments ! More updates coming coming soon !
  • 1 reply
ยท
badaouiย 
posted an update 3 months ago
view post
Post
501
Building high-performance, reproducible kernels forย AMD ROCmย just got a lot easier.

I've put together a guide on building, testing, and sharingย ROCm-compatible kernelsย using the Hugging Faceย kernel-builder and kernelsย libraries; so you can focus onย optimizing performanceย rather than spending time on setup.

Learn how to:

- Use Nix for reproducible builds
- Integrate kernels as native PyTorch operators
- Share your kernels on the Hub for anyone to use withย kernels.get_kernel()

We use the ๐Ÿ† award-winning RadeonFlow GEMM kernel as a practical example.

๐Ÿ“œ Check out the full guide here : https://huggingface.co/blog/build-rocm-kernels
pagezyhfย 
posted an update 3 months ago
view post
Post
2904
๐Ÿš€ Big news for AI builders!

Weโ€™re thrilled to announce that the Qwen3-VL family of vision-language models is now available on Azure AI Foundry, thanks to our collaboration with Microsoft.

We bring open-source innovation to enterprise-grade AI infrastructure, making it easier than ever for enterprise to deploy and scale the latest and greatest from models from hugging Face securely within Azure.

๐Ÿ” Highlights:

- Deploy Qwen3-VL instantly via managed endpoints
- Built-in governance, telemetry, and lifecycle management
- True multimodal reasoning โ€” vision, language, and code understanding
- State-of-the-art performance, outperforming closed-source models like Gemini 2.5 Pro and GPT-5
- Available in both *Instruct* and *Thinking* modes, across 24 model sizes

๐Ÿ‘‰ Get started today: search for Qwen3-VL in the Hugging Face Collection on Azure AI Foundry.
  • 1 reply
ยท
pagezyhfย 
posted an update 5 months ago
view post
Post
860
Whatโ€™s your biggest headache deploying Hugging Face models to the cloudโ€”and how can we fix it for you?
ยท
pagezyhfย 
posted an update 5 months ago
pagezyhfย 
posted an update 5 months ago
view post
Post
3924
๐Ÿค Collaborating with AMD to ensure Hugging Face Transformers runs smoothly on AMD GPUs!

We run daily CI on AMD MI325 to track the health of the most important model architectures and weโ€™ve just made our internal dashboard public.

By making this easily accessible, we hope to spark community contributions and improve support for everyone!
  • 2 replies
ยท
badaouiย 
posted an update 5 months ago
view post
Post
435
๐Ÿš€ Optimum libraries keep growing, and Optimum v2 is just around the corner!

I recently added ONNX export support for a bunch of new models in the optimum-onnx library, including: DeepSeek-V3, Cohere, Nemotron, Arcee, StableLM โ€ฆ and more!

โšก With ONNX export, you can run your favorite models faster and more efficiently across different hardware backends, making deployment and experimentation much smoother.

๐Ÿ’ก Have a model youโ€™d love to see supported? Contributions are super welcome โ€” letโ€™s make Optimum even better together!

#ONNX #Optimum #HuggingFace #OpenSource #AI