3morixd's picture
Professional model card upgrade: benchmarks, code examples, usage guide
10c28c4 verified
|
Raw
History Blame Contribute Delete
1.15 kB
metadata
language:
  - en
license: apache-2.0
tags:
  - mobile
  - edge-ai
  - vision-language
  - multimodal
  - quantized
  - gguf
pipeline_tag: image-text-to-text

MiniCPM-V 2.6 - Mobile Vision-Language Model (GGUF)

OpenBMB's MiniCPM-V 2.6, a vision-language model that can SEE and THINK. Compressed for mobile deployment.

Property Value
Base openbmb/MiniCPM-V-2_6
Parameters ~2.8 billion
Size ~1.4 GB (GGUF)
Format GGUF (llama.cpp)
License Apache 2.0

Why This Model?

Run multimodal AI (vision + language) on a phone. Image understanding, VQA, visual chatbots - all on-device.

Performance

  • ~18 tok/s on Samsung S20 FE CPU
  • ~2.1 GB peak memory use
  • ~93% quality retention vs base model

Use Cases

  • Visual Q&A on mobile devices
  • Image captioning from camera photos
  • Document understanding (scan + analyze)
  • Multimodal chatbots
  • Accessibility features (describe images)

Quick Start

huggingface-cli download dispatchAI/MiniCPM-V-4.6-mobile --local-dir ./models
./build/bin/main -m ./models/model.gguf -p "Describe this image" --image photo.jpg