Text Generation
Transformers
Safetensors
PEFT
English
qwen2
text-to-sql
text2sql
sql
qlora
unsloth
qwen2.5
conversational
text-generation-inference
Instructions to use junmingg/qwen2.5-coder-7b-text2sql with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use junmingg/qwen2.5-coder-7b-text2sql with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="junmingg/qwen2.5-coder-7b-text2sql") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("junmingg/qwen2.5-coder-7b-text2sql") model = AutoModelForCausalLM.from_pretrained("junmingg/qwen2.5-coder-7b-text2sql") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - PEFT
How to use junmingg/qwen2.5-coder-7b-text2sql with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use junmingg/qwen2.5-coder-7b-text2sql with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "junmingg/qwen2.5-coder-7b-text2sql" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "junmingg/qwen2.5-coder-7b-text2sql", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/junmingg/qwen2.5-coder-7b-text2sql
- SGLang
How to use junmingg/qwen2.5-coder-7b-text2sql with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "junmingg/qwen2.5-coder-7b-text2sql" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "junmingg/qwen2.5-coder-7b-text2sql", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "junmingg/qwen2.5-coder-7b-text2sql" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "junmingg/qwen2.5-coder-7b-text2sql", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use junmingg/qwen2.5-coder-7b-text2sql with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for junmingg/qwen2.5-coder-7b-text2sql to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for junmingg/qwen2.5-coder-7b-text2sql to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for junmingg/qwen2.5-coder-7b-text2sql to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="junmingg/qwen2.5-coder-7b-text2sql", max_seq_length=2048, ) - Docker Model Runner
How to use junmingg/qwen2.5-coder-7b-text2sql with Docker Model Runner:
docker model run hf.co/junmingg/qwen2.5-coder-7b-text2sql
| license: apache-2.0 | |
| base_model: Qwen/Qwen2.5-Coder-7B-Instruct | |
| datasets: | |
| - b-mc2/sql-create-context | |
| language: | |
| - en | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| tags: | |
| - text-to-sql | |
| - text2sql | |
| - sql | |
| - qlora | |
| - peft | |
| - unsloth | |
| - qwen2.5 | |
| # Qwen2.5-Coder-7B Text-to-SQL (QLoRA) | |
| A QLoRA fine-tune of [`Qwen/Qwen2.5-Coder-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) | |
| that turns a **SQL schema + a natural-language question** into **a single SQL query**. Trained on a single | |
| RTX 3090 with [Unsloth](https://github.com/unslothai/unsloth) + TRL, completion-only loss. | |
| > Trained and evaluated by `junmingg`. Code + reproducible eval harness: | |
| > https://github.com/junmingg/Unsloth-Qwen2.5-Coder-7b-Text-to-SQL-SFT | |
| ## Results (held-out 500 examples) | |
| | Model | Exact match | Semantic equiv. | SQL validity | | |
| |---|---|---|---| | |
| | Base `Qwen2.5-Coder-7B-Instruct` (zero-shot) | 3.8% | 67.0% | 100.0% | | |
| | **+ QLoRA (this model)** | **78.8%** | **86.2%** | **99.2%** | | |
|  | |
| - **Exact match** = canonicalized string match (via `sqlglot`); a strict *lower bound* β different-but-correct | |
| SQL fails it. | |
| - **Semantic equivalence** = an independent LLM judge (GLM-5-Turbo) decides whether the predicted SQL is | |
| equivalent to the gold for the question. This is the headline correctness metric. | |
| - **SQL validity** = fraction of predictions that parse under `sqlglot`. | |
| Fine-tuning lifted exact-match by **+75.0 points** and semantic-equivalence by **+19.2 points** on prompts the | |
| model never saw during training (0/500 test prompts appear in the training data). The large exact-match jump | |
| is the model learning the dataset's SQL conventions (quoting, value formatting); the semantic jump is genuine | |
| correctness beyond formatting. | |
| ### Label-noise ablation | |
| `sql-create-context` carries a little label noise β 0.32% of train golds (and 0.60% of test golds) won't | |
| parse under `sqlglot` (WikiSQL artifacts). A controlled variant | |
| ([`junmingg/qwen2.5-coder-7b-text2sql-filtered`](https://huggingface.co/junmingg/qwen2.5-coder-7b-text2sql-filtered)) | |
| drops those 80 noisy rows from the training set β **same recipe, same held-out test set**: | |
| | Model | Exact match | Semantic equiv. | SQL validity | | |
| |---|---|---|---| | |
| | Base (zero-shot) | 3.8% | 67.0% | 100.0% | | |
| | SFT β full 25k | 78.8% | 86.2% | 99.2% | | |
| | SFT β filtered 24.9k | 78.6% | 85.4% | **99.8%** | | |
|  | |
| Cleaning 0.32% of train labels nudged **validity up, accuracy unchanged** (within run-to-run noise). On the | |
| valid-reference subset the filtered model reaches 100% validity. Full write-up in the | |
| [repo README](https://github.com/junmingg/Unsloth-Qwen2.5-Coder-7b-Text-to-SQL-SFT). | |
| ## Usage | |
| The model was trained with a specific system prompt and ChatML format β **use the same formatting** for best | |
| results. | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| import torch | |
| REPO = "junmingg/qwen2.5-coder-7b-text2sql" | |
| model = AutoModelForCausalLM.from_pretrained(REPO, torch_dtype=torch.bfloat16, device_map="auto") | |
| tok = AutoTokenizer.from_pretrained(REPO) | |
| SYSTEM = ("You are a precise text-to-SQL engine. Given a SQL schema and a natural-language " | |
| "question, respond with a single valid SQL query and nothing else.") | |
| schema = "CREATE TABLE head (age INTEGER, name TEXT)" | |
| question = "How many heads of the departments are older than 56?" | |
| messages = [ | |
| {"role": "system", "content": SYSTEM}, | |
| {"role": "user", "content": f"Schema:\n{schema}\n\nQuestion: {question}"}, | |
| ] | |
| inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device) | |
| out = model.generate(inputs, max_new_tokens=256, do_sample=False) | |
| print(tok.decode(out[0][inputs.shape[1]:], skip_special_tokens=True)) | |
| # -> SELECT COUNT(*) FROM head WHERE age > 56 | |
| ``` | |
| LoRA adapters only (smaller): `junmingg/qwen2.5-coder-7b-text2sql-lora`. | |
| GGUF quants (`q4_k_m`, `q8_0`) for llama.cpp / Ollama: `junmingg/qwen2.5-coder-7b-text2sql-GGUF`. | |
| ## Training | |
| | | | | |
| |---|---| | |
| | Base | `Qwen/Qwen2.5-Coder-7B-Instruct` (4-bit QLoRA) | | |
| | Data | [`b-mc2/sql-create-context`](https://huggingface.co/datasets/b-mc2/sql-create-context) (~78.6k rows total). Shuffled (seed 42), then a **25k train subset** + a fixed **500 held-out test** set. The remaining ~53k rows were intentionally unused β 1 epoch over 25k already saturates this task; adding the rest gives negligible lift while ~doubling/tripling train time. | | |
| | LoRA | r=16, Ξ±=16, dropout=0, all linear layers | | |
| | Schedule | 1 epoch (~1,557 steps), effective batch 16, lr 2e-4 cosine, warmup 0.03 | | |
| | Precision / optim | bf16, `adamw_8bit`, `max_seq_length=2048` | | |
| | Loss | completion-only (prompt masked; loss on the SQL answer only) | | |
| | Hardware / time | 1Γ RTX 3090, ~59 min | | |
| ## Evaluation methodology | |
| Both base and fine-tuned models are evaluated with **identical** prompting and greedy decoding on the **same** | |
| held-out 500-example test set (disjoint from train; verified 0 prompt overlap). Semantic equivalence is scored | |
| by an independent LLM judge (GLM-5-Turbo). Reporting both exact-match and semantic-equivalence is deliberate: | |
| exact-match is a lower bound, semantic-equivalence is the fairer correctness signal. The full harness is in the | |
| linked repo (`python -m src.eval`, `python -m src.judge`). | |
| ## Limitations | |
| - Trained on synthetic, largely single-table schemas (`sql-create-context`, derived from WikiSQL + Spider); | |
| not evaluated for SQL-injection safety or complex multi-join queries. | |
| - The dataset contains a small amount of label noise (~0.3β0.6% of gold answers are themselves unparseable β | |
| WikiSQL artifacts); the model's rare "invalid" outputs are predominantly faithful reproductions of those | |
| malformed references rather than novel errors. | |
| - Outputs should be validated/parameterized before execution against a real database. | |
| ## License & attribution | |
| Apache-2.0 (inherited from the base model). Training data: `b-mc2/sql-create-context` (CC-BY-4.0). | |