| | --- |
| | language: |
| | - en |
| | - zh |
| | license: other |
| | license_name: glm-4-license |
| | pipeline_tag: text-generation |
| | tags: |
| | - mlx |
| | - glm4 |
| | - moe |
| | - prism |
| | - abliterated |
| | - 8bit |
| | - quantized |
| | - apple-silicon |
| | library_name: mlx |
| | base_model: Ex0bit/GLM-4.7-Flash-PRISM |
| | --- |
| | |
| | <p align="center"> |
| | <a href="https://vmlx.net"> |
| | <img src="vmlx-logo.png" alt="vMLX" width="120"> |
| | </a> |
| | </p> |
| | |
| | # GLM-4.7-Flash-PRISM — MLX 8-bit |
| |
|
| | MLX 8-bit quantized version of [Ex0bit/GLM-4.7-Flash-PRISM](https://huggingface.co/Ex0bit/GLM-4.7-Flash-PRISM) for efficient local inference on Apple Silicon. |
| |
|
| | - **Quantization**: 8-bit (8.5 bits per weight, group size 64, affine mode) |
| | - **Architecture**: GLM-4 MoE Lite — 47 layers, 64 routed experts, 4 active per token |
| | - **Context**: 202K tokens |
| | - **Size**: ~30 GB |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from mlx_lm import load, generate |
| | |
| | model, tokenizer = load("shieldstackllc/GLM-4.7-Flash-PRISM-mlx-8bit") |
| | response = generate(model, tokenizer, prompt="Hello!", verbose=True) |
| | ``` |
| |
|
| | Or with [vMLX](https://vmlx.net) for native macOS inference. |
| |
|
| | ## About |
| |
|
| | This model is an abliterated (uncensored) variant of GLM-4.7-Flash, a Mixture-of-Experts language model by Zhipu AI / THUDM. The abliteration was done by [Ex0bit](https://huggingface.co/Ex0bit) as part of the PRISM series. MLX quantization by [vMLX](https://vmlx.net). |
| |
|
| | ## Also Available |
| |
|
| | - [GLM-4.7-Flash-PRISM MLX 4-bit](https://huggingface.co/shieldstackllc/GLM-4.7-Flash-PRISM-mlx-4bit) (~16 GB) |
| |
|
| | ## Made for vMLX |
| |
|
| | This model was converted and optimized for [vMLX](https://vmlx.net) — a free, open source macOS native MLX inference engine for Apple Silicon. Download vMLX to run this model locally with zero configuration. |
| |
|
| | ## Credits |
| |
|
| | - **Base model**: [THUDM/GLM-4](https://github.com/THUDM/GLM-4) by Zhipu AI |
| | - **Abliteration**: [Ex0bit/GLM-4.7-Flash-PRISM](https://huggingface.co/Ex0bit/GLM-4.7-Flash-PRISM) |
| | - **MLX conversion**: [vMLX](https://vmlx.net) — Run AI locally on Mac. No compromises. |
| |
|
| | ## Contact |
| |
|
| | For questions, issues, or collaboration: **admin@vmlx.net** |
| |
|