Patrick Hill PRO

pbhappliedsystems
·

AI & ML interests

PBH Applied Systems publishes evaluated open-weight GGUF models for practical AI deployment, with an emphasis on quantized inference, agentic workflows, structured outputs, tool use, and production reliability. Every model published under this organization is converted, evaluated, and documented by PBH Applied Systems using its proprietary `quant_eval` framework. The evaluation process compares full-precision and quantized variants across agent-adjacent task families including structured JSON output, tool dispatch, multi-turn state retention, mixed natural language plus JSON responses, multiple-choice extraction, fuzz-style constraint adherence, and multi-step planning. These model cards are designed to support deployment decisions, not just model discovery. Each card documents practical behavior, quantization trade-offs, failure modes, recommended use cases, hardware requirements, and guardrails for production use. Try the live PBH Applied Systems AI Agent Demo: https://pbhappliedsystems.com/assistant.html The demo lets visitors interact with evaluated quantized open-weight models across reasoning, document intelligence, and code automation workflows running on private GPU infrastructure.

Recent Activity

posted an update 4 days ago
## quant-eval Agent Arena — Now Live After several months of building, the quant-eval Agent Arena is live: https://huggingface.co/spaces/pbhappliedsystems/quant-eval-agent-arena **What it is:** A side-by-side ReAct agent comparison platform running 9 independently evaluated GGUF models. Select any two models, pick an agent template, submit a query, and watch both agents reason through it in real time — with quant_eval v7.21 behavioral scores displayed alongside every response. **Three agent templates:** - 〔R〕 Reasoning & Analysis - 〔D〕 Document Intelligence - 〔C〕 Code & Automation **The models (all Q4_K_M GGUF):** - Qwen2.5-3B / 7B / 14B-Instruct-1M / 32B - Ministral-3-14B-Instruct-2512 - Ministral-3-14B-Reasoning-2512 - Phi-4-reasoning-plus - Mistral-Nemo-Instruct-2407 - Qwen3.6-27B **What quant_eval v7.21 measures:** 42 fixture cases across 8 task families — json_multistep, stateful_followup, toolcall_only, mixed_brief_json, toolcall, json, fuzz, mcq. Every model evaluated at both F16 and Q4_K_M precision where hardware permits. The delta is the quantization impact report. **Stack:** Gradio + llama-cpp-python (GGUF, CUDA) + custom lightweight ReAct loop + ZeroGPU (H200) All 18 model cards with full evaluation data are published at: https://huggingface.co/pbhappliedsystems Feedback welcome — especially from anyone running evaluations on open-weight quantized models. This is the public-facing surface of a consulting and evaluation practice; the full agent demo is at https://pbhappliedsystems.com/assistant.html
View all activity

Organizations

None yet