Instructions to use cloudyu/Mixtral_7Bx2_MoE with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use cloudyu/Mixtral_7Bx2_MoE with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="cloudyu/Mixtral_7Bx2_MoE")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("cloudyu/Mixtral_7Bx2_MoE") model = AutoModelForCausalLM.from_pretrained("cloudyu/Mixtral_7Bx2_MoE") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use cloudyu/Mixtral_7Bx2_MoE with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "cloudyu/Mixtral_7Bx2_MoE" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cloudyu/Mixtral_7Bx2_MoE", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/cloudyu/Mixtral_7Bx2_MoE
- SGLang
How to use cloudyu/Mixtral_7Bx2_MoE with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "cloudyu/Mixtral_7Bx2_MoE" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cloudyu/Mixtral_7Bx2_MoE", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "cloudyu/Mixtral_7Bx2_MoE" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cloudyu/Mixtral_7Bx2_MoE", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use cloudyu/Mixtral_7Bx2_MoE with Docker Model Runner:
docker model run hf.co/cloudyu/Mixtral_7Bx2_MoE
How to merge models into moe?
Hi,
Just curious how do you create custom mixtral style models? Do they all have to be mistral derivatives and same size?
Thanks!
@Yhyu13
Before you try this make sure you have a lot of ram or a big swap file (this will make it take forever)
I think you can do this with llama models, but all the models used have to be of the same type and size.
So you can't mix llama models with mistral models, or 7B with 13B.
git clone https://github.com/cg123/mergekit
cd mergekit
git switch mixtral
git pull
# Use python venv or conda
pip install -e .
# if you want to use the --load-in-4bit or --load-in-8bit flag
pip install scipy bitsandbytes
@Yhyu13
Before you try this make sure you have a lot of ram or a big swap file (this will make it take forever)I think you can do this with llama models, but all the models used have to be of the same type and size.
So you can't mix llama models with mistral models, or 7B with 13B.git clone https://github.com/cg123/mergekit cd mergekit git switch mixtral git pull # Use python venv or conda pip install -e . # if you want to use the --load-in-4bit or --load-in-8bit flag pip install scipy bitsandbytes
yes, I totally agree. thanks for your replay.