Instructions to use Undi95/dbrx-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Undi95/dbrx-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Undi95/dbrx-base", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Undi95/dbrx-base", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Undi95/dbrx-base", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Undi95/dbrx-base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Undi95/dbrx-base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Undi95/dbrx-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Undi95/dbrx-base
- SGLang
How to use Undi95/dbrx-base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Undi95/dbrx-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Undi95/dbrx-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Undi95/dbrx-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Undi95/dbrx-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Undi95/dbrx-base with Docker Model Runner:
docker model run hf.co/Undi95/dbrx-base
Errors During Training for the Original Implementation and the Fixes for the Errors
#7
pinned
by v2ray - opened
https://huggingface.co/v2ray/dbrx-base-fixed
The original DBRX implementation code has a few bugs which only affect training, which I fixed in my re-upload.
I re-uploaded because the changes require the weights files to be converted, so if anyone want to use the fix you need to re-download the entire weights!
The issues - How I fixed them:
- Error when using gradient checkpointing - Fixed by using positional arguments instead because
_gradient_checkpointing_funcdoesn't support kwargs. - VRAM usage go zoom and
CUDA Out of Memorywhen backpropping through the MLP layer - Fixed by separating the experts' weights into different tensors instead of using a single tensor for all the experts. IDK why this fixed it but maybe it's because torch is trying to compute gradient for every expert at once, which shouldn't happen since it's a MoE model.
Hey thanks for this.
I will not fix this on my side since you have done it, and will try to keep the repo as 1:1 from the original.
Nice work tho!
Undi95 pinned discussion