| | --- |
| | license: mit |
| | language: |
| | - en |
| | library_name: transformers |
| | tags: |
| | - art |
| | - medical |
| | - biology |
| | - code |
| | - chemistry |
| | metrics: |
| | - code_eval |
| | - chrf |
| | - charcut_mt |
| | - cer |
| | - brier_score |
| | - bleurt |
| | - bertscore |
| | - accuracy |
| | pipeline_tag: image-text-to-text |
| | --- |
| | |
| | # MULTI-MODAL-MODEL |
| | ## LeroyDyer/Mixtral_AI_Vision-Instruct_X |
| | |
| | |
| | |
| | |
| | currently in test mode |
| | |
| | |
| | # Vision/multimodal capabilities: |
| | |
| | If you want to use vision functionality: |
| | |
| | * You must use the latest versions of [Koboldcpp](https://github.com/LostRuins/koboldcpp). |
| | |
| | To use the multimodal capabilities of this model and use **vision** you need to load the specified **mmproj** file, this can be found inside this model repo. ([LeroyDyer/Mixtral_AI_Vision-Instruct_X](https://huggingface.co/LeroyDyer/Mixtral_AI_Vision-Instruct_X)) |
| | |
| | * You can load the **mmproj** by using the corresponding section in the interface: |
| | |
| |  |
| | |
| | ## Vision/multimodal capabilities: |
| | |
| | * For loading 4-bit use 4-bit mmproj file.- mmproj-Mixtral_AI_Vision-Instruct_X-Q4_0 |
| | |
| | * For loading 8-bit use 8 bit mmproj file - mmproj-Mixtral_AI_Vision-Instruct_X-Q8_0 |
| | |
| | * For loading 8-bit use 8 bit mmproj file - mmproj-Mixtral_AI_Vision-Instruct_X-f16 |
| |
|
| |
|
| |
|
| | ## Extended capabilities: |
| |
|
| | ``` |
| | * mistralai/Mistral-7B-Instruct-v0.1 - Prime-Base |
| | |
| | * ChaoticNeutrals/Eris-LelantaclesV2-7b - role play |
| | |
| | * ChaoticNeutrals/Eris_PrimeV3-Vision-7B - vision |
| | |
| | * rvv-karma/BASH-Coder-Mistral-7B - coding |
| | |
| | * Locutusque/Hercules-3.1-Mistral-7B - Unhinging |
| | |
| | * KoboldAI/Mistral-7B-Erebus-v3 - NSFW |
| | |
| | * Locutusque/Hyperion-2.1-Mistral-7B - CHAT |
| | |
| | * Severian/Nexus-IKM-Mistral-7B-Pytorch - Thinking |
| | |
| | * NousResearch/Hermes-2-Pro-Mistral-7B - Generalizing |
| | |
| | * mistralai/Mistral-7B-Instruct-v0.2 - BASE |
| | |
| | * Nitral-AI/ProdigyXBioMistral_7B - medical |
| | |
| | * Nitral-AI/Infinite-Mika-7b - 128k - Context Expansion enforcement |
| | |
| | * Nous-Yarn-Mistral-7b-128k - 128k - Context Expansion |
| | |
| | * yanismiraoui/Yarn-Mistral-7b-128k-sharded |
| | |
| | * ChaoticNeutrals/Eris_Prime-V2-7B - Roleplay |
| | |
| | ``` |
| |
|
| | # "image-text-text" |
| |
|
| |
|
| | ## using transformers |
| |
|
| | ``` python |
| | from transformers import AutoProcessor, LlavaForConditionalGeneration |
| | from transformers import BitsAndBytesConfig |
| | import torch |
| | |
| | quantization_config = BitsAndBytesConfig( |
| | load_in_4bit=True, |
| | bnb_4bit_compute_dtype=torch.float16 |
| | ) |
| | |
| | |
| | model_id = "LeroyDyer/Mixtral_AI_Vision-Instruct_X" |
| | |
| | processor = AutoProcessor.from_pretrained(model_id) |
| | model = LlavaForConditionalGeneration.from_pretrained(model_id, quantization_config=quantization_config, device_map="auto") |
| | |
| | |
| | import requests |
| | from PIL import Image |
| | |
| | image1 = Image.open(requests.get("https://llava-vl.github.io/static/images/view.jpg", stream=True).raw) |
| | image2 = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw) |
| | display(image1) |
| | display(image2) |
| | |
| | prompts = [ |
| | "USER: <image>\nWhat are the things I should be cautious about when I visit this place? What should I bring with me?\nASSISTANT:", |
| | "USER: <image>\nPlease describe this image\nASSISTANT:", |
| | ] |
| | |
| | inputs = processor(prompts, images=[image1, image2], padding=True, return_tensors="pt").to("cuda") |
| | for k,v in inputs.items(): |
| | print(k,v.shape) |
| | |
| | ``` |
| |
|
| | ## Using pipeline |
| |
|
| | ``` python |
| | |
| | from transformers import pipeline |
| | from PIL import Image |
| | import requests |
| | |
| | model_id = LeroyDyer/Mixtral_AI_Vision-Instruct_X |
| | pipe = pipeline("image-to-text", model=model_id) |
| | url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg" |
| | |
| | image = Image.open(requests.get(url, stream=True).raw) |
| | question = "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud" |
| | prompt = f"A chat between a curious human and an artificial intelligence assistant. |
| | The assistant gives helpful, detailed, and polite answers to the human's questions.###Human: <image>\n{question}###Assistant:" |
| | |
| | outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 200}) |
| | print(outputs) |
| | ``` |
| |
|
| |
|
| |
|
| | |
| |
|
| |
|
| | ## Mistral ChatTemplating |
| | Instruction format |
| | In order to leverage instruction fine-tuning, |
| | your prompt should be surrounded by [INST] and [/INST] tokens. |
| | The very first instruction should begin with a begin of sentence id. The next instructions should not. |
| | The assistant generation will be ended by the end-of-sentence token id. |
| |
|
| |
|
| |
|
| | ```python |
| | from transformers import AutoTokenizer |
| | tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X") |
| | |
| | chat = [ |
| | {"role": "user", "content": "Hello, how are you?"}, |
| | {"role": "assistant", "content": "I'm doing great. How can I help you today?"}, |
| | {"role": "user", "content": "I'd like to show off how chat templating works!"}, |
| | ] |
| | |
| | tokenizer.apply_chat_template(chat, tokenize=False) |
| | |
| | ``` |
| |
|
| | # TextToText |
| |
|
| | ``` python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | device = "cuda" # the device to load the model onto |
| | |
| | model = AutoModelForCausalLM.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X") |
| | tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X") |
| | |
| | messages = [ |
| | {"role": "user", "content": "What is your favourite condiment?"}, |
| | {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"}, |
| | {"role": "user", "content": "Do you have mayonnaise recipes?"} |
| | ] |
| | |
| | encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt") |
| | |
| | model_inputs = encodeds.to(device) |
| | model.to(device) |
| | |
| | generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True) |
| | decoded = tokenizer.batch_decode(generated_ids) |
| | print(decoded[0]) |
| | ``` |
| |
|