Router-Suggest
Collection
Finetuned checkpoints of VLMs for Multimodal Auto-completion โข 7 items โข Updated
YAML Metadata Warning: empty or missing yaml metadata in repo card
Check out the documentation for more information.
This model generates conversational responses conditioned on both textual and visual context. It is suitable for:
The model is not intended for:
Example usage with Hugging Face Transformers:
from transformers import AutoProcessor, AutoModelForVision2Seq
processor = AutoProcessor.from_pretrained("devichand/MiniCPM_V_Noimg_ImgChat-7B")
model = AutoModelForVision2Seq.from_pretrained("devichand/MiniCPM_V_Noimg_ImgChat-7B")
inputs = processor(images=your_image,
text="Describe the image.",
return_tensors="pt")
outputs = model.generate(**inputs)
print(processor.decode(outputs[0]))