Instructions to use LeroyDyer/SpydazWeb_VisonEncoderDecoder_Project with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LeroyDyer/SpydazWeb_VisonEncoderDecoder_Project with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="LeroyDyer/SpydazWeb_VisonEncoderDecoder_Project")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/SpydazWeb_VisonEncoderDecoder_Project")
model = AutoModelForCausalLM.from_pretrained("LeroyDyer/SpydazWeb_VisonEncoderDecoder_Project", device_map="auto")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use LeroyDyer/SpydazWeb_VisonEncoderDecoder_Project with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LeroyDyer/SpydazWeb_VisonEncoderDecoder_Project"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LeroyDyer/SpydazWeb_VisonEncoderDecoder_Project",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/LeroyDyer/SpydazWeb_VisonEncoderDecoder_Project

SGLang

How to use LeroyDyer/SpydazWeb_VisonEncoderDecoder_Project with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LeroyDyer/SpydazWeb_VisonEncoderDecoder_Project" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LeroyDyer/SpydazWeb_VisonEncoderDecoder_Project",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LeroyDyer/SpydazWeb_VisonEncoderDecoder_Project" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LeroyDyer/SpydazWeb_VisonEncoderDecoder_Project",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use LeroyDyer/SpydazWeb_VisonEncoderDecoder_Project with Docker Model Runner:
```
docker model run hf.co/LeroyDyer/SpydazWeb_VisonEncoderDecoder_Project
```

LeroyDyer commited on Apr 8, 2024

Commit

d4b7c4e

verified ·

1 Parent(s): 15519b0

Update README.md

Browse files

Files changed (1) hide show

README.md +25 -20

README.md CHANGED Viewed

@@ -36,25 +36,6 @@ transformers.AutoModelForCausalLM.from_pretrained
 ```
 ### Model Description
-This is the model card of a 🤗 transformers model that has been pushed on the Hub.
-Previous vision models have been 50/50 as the multimodel model actully requires a lot of memory and gpu and harddrive space to create;
-the past versions have been attempts to Merge the capabilitys into the main mistral model whilst still retaining its mistral tag!
-After reading many hugging face articles:
-The BackBone Issue is the main cause of creating multi modals !:
-with the advent of tiny models we are able to leverage the decoder abilitys as a single expert-ish... within the model :
-by reducing the size to a fully trainined tiny model!
-this will only produce decodings and not conversations so it needs to be smart and respond with defined answers: but in general it will produce captions: but as domain based it may be specialized in medical or art etc:
-The main llm still needs to retain these models within hence the back bone method of instigating a VisionEncoderDecoder model: istead of a llava model which still need wrangling to work correctly without spoiling the original transformers installation:
-Previous experiments proved that the mistral large model could be used as a decoder but the total model jumped to 13b so the when applying the tiny model it was only effected by the weight of the model 248M
 This is an experiment in vision - the model has been created as a mistral/VisionEncoder/Decoder
 Customized from:
@@ -80,6 +61,25 @@ Encoder:
 - **Language(s) (NLP):** [English]
 ## How to Get Started with the Model
@@ -168,7 +168,12 @@ loss = model(pixel_values=pixel_values, labels=labels).loss
 ```
-### Model Architecture and Objective
 ``` python

 ```
 ### Model Description
 This is an experiment in vision - the model has been created as a mistral/VisionEncoder/Decoder
 Customized from:
 - **Language(s) (NLP):** [English]
+## Summary
+This is the model card of a 🤗 transformers model that has been pushed on the Hub.
+Previous vision models have been 50/50 as the multimodel model actully requires a lot of memory and gpu and harddrive space to create;
+the past versions have been attempts to Merge the capabilitys into the main mistral model whilst still retaining its mistral tag!
+After reading many hugging face articles:
+The BackBone Issue is the main cause of creating multi modals !:
+with the advent of tiny models we are able to leverage the decoder abilitys as a single expert-ish... within the model :
+by reducing the size to a fully trainined tiny model!
+this will only produce decodings and not conversations so it needs to be smart and respond with defined answers: but in general it will produce captions: but as domain based it may be specialized in medical or art etc:
+The main llm still needs to retain these models within hence the back bone method of instigating a VisionEncoderDecoder model: istead of a llava model which still need wrangling to work correctly without spoiling the original transformers installation:
+Previous experiments proved that the mistral large model could be used as a decoder but the total model jumped to 13b so the when applying the tiny model it was only effected by the weight of the model 248M
 ## How to Get Started with the Model
 ```
+### Model Architecture
+Aha !!! Here is how you create such a model ::
 ``` python