Text Generation
Transformers
Safetensors
MLX
English
code
code-review
programming
qwen2.5
bug-detection
Instructions to use xunker/CodeLens-7B-MLX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use xunker/CodeLens-7B-MLX with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="xunker/CodeLens-7B-MLX")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("xunker/CodeLens-7B-MLX", dtype="auto") - MLX
How to use xunker/CodeLens-7B-MLX with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("xunker/CodeLens-7B-MLX") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- vLLM
How to use xunker/CodeLens-7B-MLX with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "xunker/CodeLens-7B-MLX" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "xunker/CodeLens-7B-MLX", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/xunker/CodeLens-7B-MLX
- SGLang
How to use xunker/CodeLens-7B-MLX with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "xunker/CodeLens-7B-MLX" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "xunker/CodeLens-7B-MLX", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "xunker/CodeLens-7B-MLX" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "xunker/CodeLens-7B-MLX", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - MLX LM
How to use xunker/CodeLens-7B-MLX with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "xunker/CodeLens-7B-MLX" --prompt "Once upon a time"
- Docker Model Runner
How to use xunker/CodeLens-7B-MLX with Docker Model Runner:
docker model run hf.co/xunker/CodeLens-7B-MLX
| license: apache-2.0 | |
| base_model: Qwen/Qwen2.5-7B-Instruct | |
| tags: | |
| - code | |
| - code-review | |
| - programming | |
| - qwen2.5 | |
| - bug-detection | |
| - mlx | |
| datasets: | |
| - sahil2801/CodeAlpaca-20k | |
| language: | |
| - en | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| model-index: | |
| - name: CodeLens-7B | |
| results: [] | |
| # [CodeLens-7B-MLX](https://huggingface.co/xunker/CodeLens-7B-MLX) | |
| MLX version of [sriksven/CodeLens-7B](https://huggingface.co/sriksven/CodeLens-7B) in various oQ levels and dtypes. | |
| Directory | oQ Level | dtype | size | |
| ----------------------------------------------|----------|----------|------ | |
| [CodeLens-7B-oQ4-bf16](CodeLens-7B-oQ4-bf16/) | 4-bit | bfloat16 | 4.2GB | |
| [CodeLens-7B-oQ4-fp16](CodeLens-7B-oQ4-fp16/) | 4-bit | fp16 | 4.2GB | |
| [CodeLens-7B-oQ5-bf16](CodeLens-7B-oQ5-bf16/) | 5-bit | bfloat16 | 5.1GB | |
| [CodeLens-7B-oQ5-fp16](CodeLens-7B-oQ5-fp16/) | 5-bit | fp16 | 5.1GB | |
| [CodeLens-7B-oQ6-bf16](CodeLens-7B-oQ6-bf16/) | 6-bit | bfloat16 | 5.9GB | |
| [CodeLens-7B-oQ6-fp16](CodeLens-7B-oQ6-fp16/) | 6-bit | fp16 | 5.9GB | |
| [CodeLens-7B-oQ8-bf16](CodeLens-7B-oQ8-bf16/) | 8-bit | bfloat16 | 7.5GB | |
| [CodeLens-7B-oQ8-fp16](CodeLens-7B-oQ8-fp16/) | 8-bit | fp16 | 7.5GB | |
| ## Why choose FP16 over BFLOAT16/BF16? | |
| On older Apple Silicon (M1 and M2), fp16 can be faster. Here are the details from [Muhammad Raza](https://muhammadraza.me/2026/gguf-vs-mlx-decision-guide/#two-traps-that-will-flip-your-results): | |
| > A lot of MLX builds ship as bf16, and **on the M1 and M2 that data type does not get the accelerated path that fp16 does**. During prefill those weights run un-accelerated and the penalty multiplies across every input token, which is part of why some “MLX is slow” reports come from older hardware. [...] | |
| > | |
| > If you are on an M1 or M2 and MLX feels sluggish, check this before you blame the format. | |
| ### Test Results | |
| Using oQ6, here are the results from oMLX 0.4.4 on a Macbook Pro 2021 (M1 Pro). | |
| **tl;dr**: | |
| Time to First Token (TTFT) and Prompt Processing Tokens Per Second | |
| (ppTPS, aka "prefill speed") are about 60% faster when using FP16. | |
| However, Token Generation (tgTPS) only increases moderately, around 1-2%. | |
| #### BFLOAT16 | |
| ##### Single request results | |
| Test | TTFT(ms) | TPOT(ms) | ppTPS | tgTPS | E2E(s) | Throughput | PeakMem | |
| ------------------|----------|----------|-------|-------|--------|------------|-------- | |
| pp 4096 / tg 128 | 24727.9 | 39.3 | 165.6 | 25.7 | 29.7 | 142.1 | 6.84 GB | |
| pp 16384 / tg 128 | 111811.1 | 48.4 | 146.5 | 20.8 | 118.0 | 140.0 | 7.69 GB | |
| ##### Batch results | |
| Batch | tgTPS | ppTPS | avgTTFT(ms) | E2E(s) | Speedup | |
| ------------|-------|-------|-------------|--------|-------- | |
| 1x baseline | 25.7 | 165.6 | 24727.9 | 29.7 | 1.00x | |
| 2x | 30.0 | 164.0 | 12487.3 | 21.0 | 1.17x | |
| 4x | 31.9 | 239.7 | 16885.7 | 33.1 | 1.24x | |
| #### FP16 | |
| ##### Single request results | |
| Test | TTFT(ms) | TPOT(ms) | ppTPS | tgTPS | E2E(s) | Throughput | PeakMem | |
| ------------------|----------|----------|-------|-------|--------|------------|-------- | |
| pp 4096 / tg 128 | 15226.8 | 37.3 | 269.0 | 27.0 | 20.0 | 211.6 | 6.84 GB | |
| pp 16384 / tg 128 | 69595.4 | 45.6 | 235.4 | 22.1 | 75.4 | 219.0 | 7.69 GB | |
| ##### Batch results | |
| Batch | tgTPS | ppTPS | avgTTFT(ms) | E2E(s) | Speedup | |
| ------------|-------|-------|-------------|--------|-------- | |
| 1x baseline | 27.0 | 269.0 | 15226.8 | 20.0 | 1.00x | |
| 2x | 36.9 | 266.6 | 7681.4 | 14.6 | 1.37x | |
| 4x | 38.0 | 363.3 | 11084.7 | 24.8 | 1.41x | |
| ## Hardware and Software | |
| These were converted to MLX using [oMLX](https://github.com/jundot/omlx) [0.4.4](https://github.com/jundot/omlx/releases/tag/v0.4.4) on a 32GB Macbook Pro 2021 (M1 Pro). I cleared all my RAM so you don't have to. | |
| ## License | |
| Apache 2.0, as per [original model](https://huggingface.co/sriksven/CodeLens-7B). |