-
embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead
Image-Text-to-Text • 2B • Updated • 777 • 6 -
embedl/Cosmos-Reason2-2B-NVFP4A16
Image-Text-to-Text • 2B • Updated • 322 • 1 -
embedl/Cosmos-Reason2-2B-W4A16
Image-Text-to-Text • 2B • Updated • 9.17k • 6 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2
Image-Text-to-Text • 2B • Updated • 17.4k • 11
AI & ML interests
None defined yet.
Recent Activity
Embedl
Embedl develops advanced tools and algorithms for Edge AI. Our mission is to make AI models run faster, more energy-efficient, and reliably across diverse hardware platforms, while significantly reducing development time.
We help teams deploy high-performance AI on real-world, resource-constrained devices.
Embedl Models (Community)
Pre-optimized models that can be used off-the-shelf or customized for specific hardware target supported by the embedl-models package.
First release highlights:
- The fastest Small Language Models (SLMs) using FlashHead, a novel architectural improvement to the language-model head
- Works with popular models like Llama, Gemma, and Qwen
- Provides speedups on top of:
- Quantization
- Flash Attention
- Other standard optimizations
Device: Nvidia Jetson Thor
| Model | Generation speed (tokens/s) |
|---|---|
| embedl/Llama-3.2-3B-Instruct-FlashHead-W4A16 | 100 |
| Llama-3.2-3B-Instruct-W4A16* | 80 |
| RedHatAI/Llama-3.2-3B-Instruct-FP8 | 64 |
| meta-llama/Llama-3.2-3B-Instruct | 37 |
*Embedl quantized model for benchmarking similar to the FlashHead-W4A16 but without the faster FlashHead and custom generation loop.
Contact
Headquarters (Sweden)
Gamla Almedalsvägen 39
412 63 Gothenburg, Sweden
Email: contact@embedl.com
-
embedl/Cosmos-Reason2-2B-W4A16-Edge2
Image-Text-to-Text • 2B • Updated • 17.4k • 11 -
embedl/Cosmos-Reason2-2B-W4A16
Image-Text-to-Text • 2B • Updated • 9.17k • 6 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead
Image-Text-to-Text • 2B • Updated • 777 • 6 -
Edge Inference Benchmarks
🚀4On-Device benchmarks across devices and models.
-
embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead
Image-Text-to-Text • 2B • Updated • 777 • 6 -
embedl/Cosmos-Reason2-2B-NVFP4A16
Image-Text-to-Text • 2B • Updated • 322 • 1 -
embedl/Cosmos-Reason2-2B-W4A16
Image-Text-to-Text • 2B • Updated • 9.17k • 6 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2
Image-Text-to-Text • 2B • Updated • 17.4k • 11
-
embedl/Cosmos-Reason2-2B-W4A16-Edge2
Image-Text-to-Text • 2B • Updated • 17.4k • 11 -
embedl/Cosmos-Reason2-2B-W4A16
Image-Text-to-Text • 2B • Updated • 9.17k • 6 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead
Image-Text-to-Text • 2B • Updated • 777 • 6 -
Edge Inference Benchmarks
🚀4On-Device benchmarks across devices and models.