--- title: FitCheck emoji: ✅ colorFrom: indigo colorTo: green sdk: gradio sdk_version: 6.16.0 app_file: app.py python_version: "3.12" pinned: false license: mit short_description: Honest, plain answers about what AI your computer can run models: - nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16 --- # FitCheck **What AI can your computer actually run?** And the other way round: **what computer do you need for the AI you want to run?** Tell FitCheck about your machine in plain words. It answers honestly — real models, real memory figures, real licenses, real copy-paste commands — from chatbots to object detection, image generation, speech, and robotics. ## Why it's trustworthy - **A deterministic engine does the math, not an AI.** Verdicts come from a transparent rules engine over `catalogue.json` — 83 real models verified against the Hugging Face API. Nothing in the verdict can be hallucinated. - **Model sizes are exact.** For GGUF models the weights figure is the actual file size in bytes from the Hub — not a params-times-bits estimate. Chat memory uses each model's real architecture (GQA-aware), and every estimate includes a 0.58 GB safety buffer (the 95% load-success margin fitted from ~19,500 community measurements). - **Provenance on every number.** The UI says whether a figure is an exact file size, a vendor-published number, community-reported, or estimated. - **Licenses up front.** AGPL, non-commercial, and gated models are labelled on every card — before you build your project on one. - **Speed estimates with receipts, not vibes.** For LLMs, FitCheck predicts decode tokens/sec from your memory bandwidth (decode is bandwidth-bound) and shows where your machine lands among **real community benchmark runs** ([LocalScore](https://www.localscore.ai)) on an interactive roofline chart. A learned predictor — following IBM's [LLM-Pilot methodology](https://arxiv.org/abs/2410.02425) (gradient boosting over hardware features, validated leave-one-accelerator-out) — replaces the analytical estimate **only if it beats it on hardware it never saw**; otherwise the labelled baseline ships. Vision and diffusion models are compute-bound, not bandwidth-bound, so they honestly keep memory verdicts only rather than fake speed numbers. - **Conservative by design.** Three plain bands (Runs great / Tight, but works / Won't fit) that would rather under-promise than over-promise. ## What's inside 1. **The catalogue** — `scripts/curation.json` (hand-picked models across LLM, vision-language, vision, image/video generation, speech, music, embeddings, forecasting) enriched by `scripts/refresh_catalogue.py` from public Hub endpoints into `catalogue.json`. Refreshed nightly; baked in at build time so the running app is fully offline. 2. **The engine** (`engine/`) — pure Python memory math and honest banding. Also answers the reverse question: minimum vs comfortable hardware tiers for a goal ("Help me pick one" mode). 3. **The model brick** (`model_brick.py`) — NVIDIA Nemotron 3 Nano 4B running in-Space on ZeroGPU (hybrid Mamba-2, accelerated by prebuilt hub kernels), explaining the engine's numbers in plain words. It never does the math; if it states a figure that isn't in the engine's facts, the gate logs it. 4. **The frontend** (`static/`) — hand-built HTML/CSS/JS, no framework, served by Gradio server mode (`gr.Server`). Optional extra: paste any Hugging Face model id and FitCheck walks its finetune/quantized lineage to a known base ("if the base runs, your finetune runs") — the one clearly-labelled online feature. ## Run it locally ``` python -m venv .venv .venv\Scripts\activate pip install -r requirements.txt python app.py ``` Open http://127.0.0.1:7860/ (add `?go` for an instant sample result). Locally the explainer reports the model isn't loaded (it only loads on the Space) — everything else works fully offline. Built for the [Build Small hackathon](https://huggingface.co/build-small-hackathon) (Backyard AI track).