view article Article How to Build a vLLM Plugin: A Guide to the general_plugins Entry Point 6 days ago โข 1
view article Article FlashHead: Accelerating Language Model Inference ~ *Efficient drop-in replacement for the classification head* Mar 11 โข 2
view article Article Benchmarks + Report: Optimized Cosmos-Reason2 (Qwen3-VL) for on-device inference on 8GB RAM (Jetson Orin Nano Super) Feb 28