Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
JonnaMatΒ 
posted an update 5 days ago
Post
1586
🀯 Edge-Grade Vision Reasoning. Now Practically Lossless. 🀯

Introducing
πŸ‘‰ embedl/Cosmos-Reason2-2B-W4A16-Edge2
Optimized for Jetson Orin Nano Super and AGX Orin
nvidia
.

πŸš„ Try it out on Jetson (image+video+text):
docker run --rm -it \
  --network host \
  --shm-size=8g \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  --runtime=nvidia \
  --name=vllm-serve \
  -e HF_TOKEN=hf_*** \
  -e HF_HOME=/root/.cache/huggingface \
  ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin \
  vllm serve "embedl/Cosmos-Reason2-2B-W4A16-Edge2" \
    --max-model-len 8192 \
    --gpu-memory-utilization 0.75 \
    --max-num-seqs 2


πŸ€“ What is Edge2? Most weights β†’ INT4 | Activations β†’ FP16 | Select sensitive layers β†’ kept in FP16.
Edge2 preserves precision where it matters most; while keeping the model small and fast enough for edge GPUs. 😎
In this post