Text Generation
Transformers
Safetensors
English
code
helion-osc
mathematics
reasoning
algorithm
causal-lm
conversational
bitsandbytes
Instructions to use DeepXR/Helion-OSC with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use DeepXR/Helion-OSC with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="DeepXR/Helion-OSC") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("DeepXR/Helion-OSC", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use DeepXR/Helion-OSC with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "DeepXR/Helion-OSC" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DeepXR/Helion-OSC", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/DeepXR/Helion-OSC
- SGLang
How to use DeepXR/Helion-OSC with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "DeepXR/Helion-OSC" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DeepXR/Helion-OSC", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "DeepXR/Helion-OSC" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DeepXR/Helion-OSC", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use DeepXR/Helion-OSC with Docker Model Runner:
docker model run hf.co/DeepXR/Helion-OSC
| # Helion-OSC Docker Image | |
| FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04 | |
| # Set environment variables | |
| ENV DEBIAN_FRONTEND=noninteractive | |
| ENV PYTHONUNBUFFERED=1 | |
| ENV CUDA_HOME=/usr/local/cuda | |
| ENV PATH=${CUDA_HOME}/bin:${PATH} | |
| ENV LD_LIBRARY_PATH=${CUDA_HOME}/lib64:${LD_LIBRARY_PATH} | |
| # Install system dependencies | |
| RUN apt-get update && apt-get install -y \ | |
| python3.10 \ | |
| python3-pip \ | |
| git \ | |
| wget \ | |
| curl \ | |
| vim \ | |
| && rm -rf /var/lib/apt/lists/* | |
| # Upgrade pip | |
| RUN pip3 install --upgrade pip setuptools wheel | |
| # Set working directory | |
| WORKDIR /app | |
| # Copy requirements | |
| COPY requirements.txt . | |
| # Install Python dependencies | |
| RUN pip3 install --no-cache-dir -r requirements.txt | |
| # Install additional dependencies for API server | |
| RUN pip3 install --no-cache-dir \ | |
| fastapi \ | |
| uvicorn[standard] \ | |
| pydantic \ | |
| python-multipart | |
| # Copy application files | |
| COPY . . | |
| # Create directories for models and cache | |
| RUN mkdir -p /app/models /app/cache /app/outputs | |
| # Set Hugging Face cache directory | |
| ENV HF_HOME=/app/cache | |
| ENV TRANSFORMERS_CACHE=/app/cache | |
| # Expose port for API server | |
| EXPOSE 8000 | |
| # Health check | |
| HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \ | |
| CMD curl -f http://localhost:8000/health || exit 1 | |
| # Default command (can be overridden) | |
| CMD ["python3", "api_server.py", "--host", "0.0.0.0", "--port", "8000"] |