Instructions to use haoranxu/ALMA-13B-R with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use haoranxu/ALMA-13B-R with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="haoranxu/ALMA-13B-R")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("haoranxu/ALMA-13B-R") model = AutoModelForCausalLM.from_pretrained("haoranxu/ALMA-13B-R") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use haoranxu/ALMA-13B-R with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "haoranxu/ALMA-13B-R" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "haoranxu/ALMA-13B-R", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/haoranxu/ALMA-13B-R
- SGLang
How to use haoranxu/ALMA-13B-R with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "haoranxu/ALMA-13B-R" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "haoranxu/ALMA-13B-R", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "haoranxu/ALMA-13B-R" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "haoranxu/ALMA-13B-R", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use haoranxu/ALMA-13B-R with Docker Model Runner:
docker model run hf.co/haoranxu/ALMA-13B-R
New language possible?
Hi,
When it says the model supports the 10 directions of translation (x language pairs), is it still possible to re-train the model on a completely new language and get good results or is the underlying pre-trained model using those original languages and therefore any additional training/fine tuning in a new language wouldn't work?
If it's possible to use this model for a new language pair (English<>X), what steps would that involve? Thanks.
Hi,
Thanks for your interest!
Yes, it is still possible to re-train the model on a completely new language! You can just fine-tune the model on the monolingual data of your target language first and then fine-tune on the parallel data. This process should give you good translation performance. But a good monolingual fine-tuning strategy could be also adding small sampling ratios for languages that ALMA already supported to avoid catastrophic forgetting.
Thanks!
Hi,
When it says the model supports the 10 directions of translation (x language pairs), is it still possible to re-train the model on a completely new language and get good results or is the underlying pre-trained model using those original languages and therefore any additional training/fine tuning in a new language wouldn't work?
If it's possible to use this model for a new language pair (English<>X), what steps would that involve? Thanks.
I used a model that had been already finetuned on the target language besides English by someone else to save time and resources, then added parallel data finetuning. However, this sacrifices the other language pairs by not using the base ALMA model, if that matters for your use-case.
Hi! How many tokens I should have to add new language?