Image-Text-to-Text
Transformers
Safetensors
English
idefics2
multimodal
vision
text-generation-inference
Instructions to use HuggingFaceM4/idefics2-8b-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HuggingFaceM4/idefics2-8b-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="HuggingFaceM4/idefics2-8b-base")# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics2-8b-base") model = AutoModelForMultimodalLM.from_pretrained("HuggingFaceM4/idefics2-8b-base") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use HuggingFaceM4/idefics2-8b-base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "HuggingFaceM4/idefics2-8b-base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceM4/idefics2-8b-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/HuggingFaceM4/idefics2-8b-base
- SGLang
How to use HuggingFaceM4/idefics2-8b-base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "HuggingFaceM4/idefics2-8b-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceM4/idefics2-8b-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "HuggingFaceM4/idefics2-8b-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceM4/idefics2-8b-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use HuggingFaceM4/idefics2-8b-base with Docker Model Runner:
docker model run hf.co/HuggingFaceM4/idefics2-8b-base
Update README.md
Browse files
README.md
CHANGED
|
@@ -400,7 +400,7 @@ Additionally, we identified behaviors that increase security risks that already
|
|
| 400 |
|
| 401 |
It's important to note that these security concerns are currently limited by the model's occasional inability to accurately read text within images.
|
| 402 |
|
| 403 |
-
We emphasize that the model would often encourage the user to exercise caution about the model's generation or flag how problematic the initial query can be in the first place. For instance, when insistently prompted to write a racist comment, the model would answer that query before pointing out "This type of stereotyping and dehumanization has been used throughout history to justify discrimination and oppression against people of color. By making light of such a serious issue, this meme perpetuates harmful stereotypes and contributes to the ongoing struggle for racial equality and social justice.".
|
| 404 |
|
| 405 |
However, certain formulations can circumvent (i.e. "jail-break") these cautionary prompts, emphasizing the need for critical thinking and discretion when engaging with the model's outputs. While jail-breaking text LLMs is an active research area, jail-breaking vision-language models has recently emerged as a new challenge as vision-language models become more capable and prominent. The addition of the vision modality not only introduces new avenues for injecting malicious prompts but also raises questions about the interaction between vision and language vulnerabilities.
|
| 406 |
|
|
|
|
| 400 |
|
| 401 |
It's important to note that these security concerns are currently limited by the model's occasional inability to accurately read text within images.
|
| 402 |
|
| 403 |
+
We emphasize that the model would often encourage the user to exercise caution about the model's generation or flag how problematic the initial query can be in the first place. For instance, when insistently prompted to write a racist comment, the model would answer that query before pointing out "*This type of stereotyping and dehumanization has been used throughout history to justify discrimination and oppression against people of color. By making light of such a serious issue, this meme perpetuates harmful stereotypes and contributes to the ongoing struggle for racial equality and social justice.*".
|
| 404 |
|
| 405 |
However, certain formulations can circumvent (i.e. "jail-break") these cautionary prompts, emphasizing the need for critical thinking and discretion when engaging with the model's outputs. While jail-breaking text LLMs is an active research area, jail-breaking vision-language models has recently emerged as a new challenge as vision-language models become more capable and prominent. The addition of the vision modality not only introduces new avenues for injecting malicious prompts but also raises questions about the interaction between vision and language vulnerabilities.
|
| 406 |
|