MagmaAI (Multimodal AI Agents)

posted an update 2 months ago

Post

3701

Learn how to deploy Microsoft Research VibeVoice ASR on Microsoft Azure Foundry with Hugging Face to generate rich audio transcriptions with Who, When, and What! 💥

> 🕒 60-minute single-pass processing, no chunking or stitching
> 👤 Customized hotwords to guide recognition on domain-specific content
> 📝 Rich transcription: joint ASR + diarization + timestamping in one pass
> 🌍 50+ languages with automatic detection and code-switching support
> 🤗 Deployed on Microsoft Foundry via an OpenAI-compatible Chat Completions API

https://huggingface.co/docs/microsoft-azure/foundry/examples/deploy-vibevoice-asr

alvarobartt

posted an update 3 months ago

Post

3248

💥 hf-mem v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the --experimental flag!

uvx hf-mem --model-id ... --experimental will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.

💡 Alternatively, you can also set the --max-model-len, --batch-size and --kv-cache-dtype arguments (à la vLLM) manually if preferred.

1 reply

·

jw2yang

updated a dataset about 1 year ago

MagmaAI/Magma-AITW-SoM

Viewer • Updated Apr 29, 2025 • 19k • 66 • 2

jw2yang

published a dataset about 1 year ago

MagmaAI/Magma-AITW-SoM

Viewer • Updated Apr 29, 2025 • 19k • 66 • 2

jw2yang

updated a dataset about 1 year ago

MagmaAI/Magma-Mind2Web-SoM

Viewer • Updated Apr 29, 2025 • 7.21k • 125 • 2

jw2yang

published a dataset about 1 year ago

MagmaAI/Magma-Mind2Web-SoM

Viewer • Updated Apr 29, 2025 • 7.21k • 125 • 2

jw2yang

updated a dataset about 1 year ago

MagmaAI/Magma-Video-ToM

Viewer • Updated Apr 12, 2025 • 2.21M • 585 • 3

jw2yang

published a dataset about 1 year ago

MagmaAI/Magma-Video-ToM

Viewer • Updated Apr 12, 2025 • 2.21M • 585 • 3

jw2yang

updated a dataset about 1 year ago

MagmaAI/Magma-OXE-ToM

Viewer • Updated Apr 6, 2025 • 6.13M • 450 • 3

jw2yang

published 2 datasets about 1 year ago

MagmaAI/Magma-OXE-ToM

Viewer • Updated Apr 6, 2025 • 6.13M • 450 • 3

MagmaAI/Magma-820K

Updated Mar 9, 2025 • 26 • 5

jw2yang

updated a dataset about 1 year ago

MagmaAI/Magma-820K

Updated Mar 9, 2025 • 26 • 5

alvarobartt

posted an update about 1 year ago

Post

3644

🔥 Agents can do anything! @microsoft Research just announced the release of Magma 8B!

Magma is a new Visual Language Model (VLM) with 8B parameters for multi-modal agents designed to handle complex interactions across virtual and real environments; and it's MIT licensed!

Magma comes with exciting new features such as:
- Introduces the Set-of-Mark and Trace-of-Mark techniques for fine-tuning
- Leverages a large amount of unlabeled video data to learn the spatial-temporal grounding and planning
- A strong generalization and ability to be fine-tuned for other agentic tasks
- SOTA in different multi-modal benchmarks spanning across UI navigation, robotics manipulation, image / video understanding and spatial understanding and reasoning
- Generates goal-driven visual plans and actions for agentic use cases

Model: microsoft/Magma-8B
Technical Report: Magma: A Foundation Model for Multimodal AI Agents (2502.13130)

jw2yang

authored 5 papers over 1 year ago

ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

Paper • 2501.05452 • Published Jan 9, 2025 • 15

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies

Paper • 2412.10345 • Published Dec 13, 2024 • 2

OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation

Paper • 2412.09585 • Published Dec 12, 2024 • 11

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Paper • 2412.04424 • Published Dec 5, 2024 • 62

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

Paper • 2410.10818 • Published Oct 14, 2024 • 16

alvarobartt

posted an update over 1 year ago

Post

3042

🤗 Serving Meta Llama 3.1 405B on Google Cloud is now possible via the Hugging Face Deep Learning Containers (DLCs) for Text Generation Inference (TGI)

In this post, we showcase how to deploy https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 on an A3 instance with 8 x H100 GPUs on Vertex AI

Thanks to the Hugging Face DLCs for TGI and Google Cloud Vertex AI, deploying a high-performance text generation container for serving Large Language Models (LLMs) has never been easier. And we’re not going to stop here – stay tuned as we enable more experiences to build AI with open models on Google Cloud!

Read the full post at https://huggingface.co/blog/llama31-on-vertex-ai

jw2yang

authored a paper almost 2 years ago

OmniParser for Pure Vision Based GUI Agent

Paper • 2408.00203 • Published Aug 1, 2024 • 24

AI & ML interests

Team members 2

MagmaAI's activity