Jonna Matthiesen
AI & ML interests
None yet
Recent Activity
posted an update about 4 hours ago
โก FlashHead benchmarks for Llama 3.2, Gemma 3, and Qwen3 are now on https://huggingface.co/spaces/embedl/Edge-Inference-Benchmarks !
These are some of the models used in the FlashHead paper - now easier to explore and compare interactively.
๐ Jetson AGX Thor (tok/s, batch=1):
- Llama-3.2-1B: 77 โ 285 (FlashHead+W4A16, 3.7x)
- Llama-3.2-3B: 34 โ 112 (3.3x)
- Gemma-3-1B: 79 โ 153 (1.9x)
- Qwen3-1.7B: 49 โ 189 (3.8x)
- Qwen3-0.6B: 140 โ 177 (1.3x)
โ
Accuracy matches baseline on MMLU-Pro, IFEval, BBH, TruthfulQA, GSM8K.
updated a model about 5 hours ago
embedl/Qwen3-1.7B-FlashHead-W4A16 updated a model about 5 hours ago
embedl/gemma-3-1b-it-FlashHead-W4A16