Instructions to use prithivMLmods/QwQ-Math-IO-500M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/QwQ-Math-IO-500M with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="prithivMLmods/QwQ-Math-IO-500M")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/QwQ-Math-IO-500M")
model = AutoModelForCausalLM.from_pretrained("prithivMLmods/QwQ-Math-IO-500M")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use prithivMLmods/QwQ-Math-IO-500M with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prithivMLmods/QwQ-Math-IO-500M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/QwQ-Math-IO-500M",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/prithivMLmods/QwQ-Math-IO-500M

SGLang

How to use prithivMLmods/QwQ-Math-IO-500M with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prithivMLmods/QwQ-Math-IO-500M" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/QwQ-Math-IO-500M",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prithivMLmods/QwQ-Math-IO-500M" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/QwQ-Math-IO-500M",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use prithivMLmods/QwQ-Math-IO-500M with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/QwQ-Math-IO-500M
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

QwQ-Math-IO-500M [ Qwen Base ]

QwQ-Math-IO-500M is a fine-tuned variant of Qwen2.5-0.5B, specifically optimized for mathematical problem-solving, input-output reasoning, and text generation tasks. This model contains 494 million parameters and uses FP16 tensor type for efficient inference. It leverages the robust architecture of Qwen2.5 and has undergone further enhancements to excel in structured reasoning, complex mathematical operations, and multilingual support.

Key Features

Base Model: Derived from Qwen/Qwen2.5-0.5B.
Finetuned on Instruction and Math Data: Built upon Qwen2.5-0.5B-Instruct with specialized datasets for better instruction-following and mathematical reasoning.
Specialization:
- Advanced mathematical problem-solving and reasoning.
- Enhanced input-output tasks for structured outputs (JSON, tables).
- Support for long-form content generation.
- Multilingual capabilities (over 29 languages).
Optimized for Long Context: Supports input contexts up to 128K tokens with generation capability up to 8K tokens.

Running the Model

To run the model using the Transformers library:

# Install necessary libraries
# pip install transformers torch

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/QwQ-Math-IO-500M")
model = AutoModelForCausalLM.from_pretrained(
    "prithivMLmods/QwQ-Math-IO-500M",
    torch_dtype=torch.float16,
    device_map="auto",
)

input_text = "Solve the equation: 2x + 5 = 15."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Limitations

Bias and Fairness: Despite fine-tuning efforts, biases from the training data may persist. Users should critically assess model outputs.
Contextual Understanding: While optimized for long contexts, the model may still occasionally misinterpret highly ambiguous prompts.
Mathematical Accuracy: Although fine-tuned for math tasks, complex or highly specialized problems may require verification.
Real-Time Knowledge: The model's knowledge is limited to its training data and does not include real-time or post-training updates.
Safety Considerations: Safety alignment has been performed, but users should monitor outputs to avoid inappropriate content.
Resource Requirements: Running the model efficiently requires a GPU with sufficient memory.

Intended Use Cases

Mathematical Assistance: Solving equations, performing calculations, and explaining mathematical concepts.
Conversational AI: Enhanced dialogue capabilities with nuanced understanding and context retention.
Educational Assistance: Generating detailed explanations, tutorials, and step-by-step guides.
Content Creation: Assisting in writing blogs, articles, and creative content.
Multilingual Applications: Supporting content generation and translation across multiple languages.
Data Generation: Producing structured outputs such as JSON and tables for various applications.

Downloads last month: 6

Safetensors

Model size

0.5B params

Tensor type

F16

Model tree for prithivMLmods/QwQ-Math-IO-500M

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Finetuned

prithivMLmods/QWQ-500M

Finetuned

(1)

this model

Quantizations

4 models

prithivMLmods
/

QwQ-Math-IO-500M

QwQ-Math-IO-500M [ Qwen Base ]

Key Features

Running the Model

Limitations

Intended Use Cases

Model tree for prithivMLmods/QwQ-Math-IO-500M

Spaces using prithivMLmods/QwQ-Math-IO-500M 15