Jonna Matthiesen's picture

Jonna Matthiesen

JonnaMat

embedl

·

AI & ML interests

None yet

Recent Activity

liked a model about 7 hours ago

embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead

posted an update about 16 hours ago

⚡ FlashHead benchmarks for Llama 3.2, Gemma 3, and Qwen3 are now on https://huggingface.co/spaces/embedl/Edge-Inference-Benchmarks ! These are some of the models used in the FlashHead paper - now easier to explore and compare interactively. 🚀 Jetson AGX Thor (tok/s, batch=1): - Llama-3.2-1B: 77 → 285 (FlashHead+W4A16, 3.7x) - Llama-3.2-3B: 34 → 112 (3.3x) - Gemma-3-1B: 79 → 153 (1.9x) - Qwen3-1.7B: 49 → 189 (3.8x) - Qwen3-0.6B: 140 → 177 (1.3x) ✅ Accuracy matches baseline on MMLU-Pro, IFEval, BBH, TruthfulQA, GSM8K.

updated a model about 17 hours ago

embedl/Qwen3-1.7B-FlashHead-W4A16

View all activity

Organizations

published an article 6 days ago

Article

How to Build a vLLM Plugin: A Guide to the general_plugins Entry Point

6 days ago

•

1

published an article about 1 month ago

Article

FlashHead: Accelerating Language Model Inference ~ Efficient drop-in replacement for the classification head

Mar 11

•

2

published an article about 2 months ago

Article

Benchmarks + Report: Optimized Cosmos-Reason2 (Qwen3-VL) for on-device inference on 8GB RAM (Jetson Orin Nano Super)

Feb 28