HackIDLE-NIST-Coder v1.1 (MLX 4-bit)

HackIDLE-NIST-Coder is a NIST-focused local model built from Qwen2.5-Coder-7B-Instruct and fine-tuned on a NIST cybersecurity corpus.

This repo is the MLX 4-bit build for Apple Silicon.

Use it as a helper. Do not treat it as a source of truth for exact control names, RMF step lists, or reference-architecture component names without checking the source publication.

What went into v1.1

Version 1.1 was trained on 530,912 examples from 596 NIST publications.

Compared with the first release, v1.1 added:

  • 7,206 training examples
  • 28 additional NIST documents
  • CSWP coverage, including CSF 2.0, Zero Trust, and Post-Quantum Cryptography material
  • cleanup for 6,150 malformed DOI links
  • removal of known broken-link markers in the training corpus

Training dataset:

Training notes

  • Base model: mlx-community/Qwen2.5-Coder-7B-Instruct-4bit
  • Fine-tuning method: LoRA with MLX
  • Training iterations: 1,000, plus checkpoint recovery work
  • Final training loss: 1.420
  • Best validation loss: 1.512
  • Trainable parameters: 11.5M
  • Hardware used: M4 Max

Current eval status

I ran a small local smoke eval on April 22, 2026 against etgohome/hackidle-nist-coder:latest. In that local Ollama install, latest matched the v1.1 line.

Result: 1/5 cases passed.

The model stayed in-domain and handled a rough FIPS 140-2 vs. FIPS 140-3 comparison. It still missed exact grounding on:

  • SP 800-207 reference-architecture component names
  • the full SP 800-37 Rev. 2 RMF sequence
  • the exact CM-6 control name and description
  • stronger publication selection and logging/audit grounding for a contractor remote-access planning prompt

That is the important limitation. The model can sound close while still being wrong on exact NIST structure.

Good uses

This model is useful for:

  • brainstorming where to start in NIST
  • drafting first-pass explanations
  • surfacing likely document families
  • turning NIST-flavored questions into something a human can verify
  • local experimentation with domain fine-tuning on Apple Silicon

It is not reliable enough yet for:

  • exact control names
  • exact framework step ordering
  • exact reference-architecture component naming
  • answers that need source-level correctness on the first pass

Installation

pip install mlx-lm

Usage

from mlx_lm import load, generate

model, tokenizer = load("ethanolivertroy/HackIDLE-NIST-Coder-v1.1-MLX-4bit")

prompt = "Which NIST docs would you read before drafting a zero trust migration plan?"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)

response = generate(model, tokenizer, prompt=prompt, max_tokens=500)
print(response)

Other formats

License

The base model is Qwen2.5-Coder-7B-Instruct, released under Apache 2.0. The NIST source publications used for the dataset are public domain U.S. government works. This model card uses Apache 2.0 for the model artifact and documents the NIST data source separately.

Citation

@misc{hackidle_nist_coder_v11_mlx,
  title = {HackIDLE-NIST-Coder v1.1 MLX 4-bit},
  author = {Troy, Ethan Oliver},
  year = {2025},
  version = {1.1},
  url = {https://huggingface.co/ethanolivertroy/HackIDLE-NIST-Coder-v1.1-MLX-4bit}
}
Downloads last month
280
Safetensors
Model size
1B params
Tensor type
F16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ethanolivertroy/HackIDLE-NIST-Coder-v1.1-MLX-4bit

Base model

Qwen/Qwen2.5-7B
Quantized
(5)
this model