Spaces:
Running on Zero
Running on Zero
A newer version of the Gradio SDK is available: 6.17.3
metadata
title: FitCheck
emoji: ✅
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 6.16.0
app_file: app.py
python_version: '3.12'
pinned: false
license: mit
short_description: Honest, plain answers about what AI your computer can run
models:
- nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
FitCheck
What AI can your computer actually run? And the other way round: what computer do you need for the AI you want to run?
Tell FitCheck about your machine in plain words. It answers honestly — real models, real memory figures, real licenses, real copy-paste commands — from chatbots to object detection, image generation, speech, and robotics.
Why it's trustworthy
- A deterministic engine does the math, not an AI. Verdicts come from a
transparent rules engine over
catalogue.json— 83 real models verified against the Hugging Face API. Nothing in the verdict can be hallucinated. - Model sizes are exact. For GGUF models the weights figure is the actual file size in bytes from the Hub — not a params-times-bits estimate. Chat memory uses each model's real architecture (GQA-aware), and every estimate includes a 0.58 GB safety buffer (the 95% load-success margin fitted from ~19,500 community measurements).
- Provenance on every number. The UI says whether a figure is an exact file size, a vendor-published number, community-reported, or estimated.
- Licenses up front. AGPL, non-commercial, and gated models are labelled on every card — before you build your project on one.
- Speed estimates with receipts, not vibes. For LLMs, FitCheck predicts decode tokens/sec from your memory bandwidth (decode is bandwidth-bound) and shows where your machine lands among real community benchmark runs (LocalScore) on an interactive roofline chart. A learned predictor — following IBM's LLM-Pilot methodology (gradient boosting over hardware features, validated leave-one-accelerator-out) — replaces the analytical estimate only if it beats it on hardware it never saw; otherwise the labelled baseline ships. Vision and diffusion models are compute-bound, not bandwidth-bound, so they honestly keep memory verdicts only rather than fake speed numbers.
- Conservative by design. Three plain bands (Runs great / Tight, but works / Won't fit) that would rather under-promise than over-promise.
What's inside
- The catalogue —
scripts/curation.json(hand-picked models across LLM, vision-language, vision, image/video generation, speech, music, embeddings, forecasting) enriched byscripts/refresh_catalogue.pyfrom public Hub endpoints intocatalogue.json. Refreshed nightly; baked in at build time so the running app is fully offline. - The engine (
engine/) — pure Python memory math and honest banding. Also answers the reverse question: minimum vs comfortable hardware tiers for a goal ("Help me pick one" mode). - The model brick (
model_brick.py) — NVIDIA Nemotron 3 Nano 4B running in-Space on ZeroGPU (hybrid Mamba-2, accelerated by prebuilt hub kernels), explaining the engine's numbers in plain words. It never does the math; if it states a figure that isn't in the engine's facts, the gate logs it. - The frontend (
static/) — hand-built HTML/CSS/JS, no framework, served by Gradio server mode (gr.Server). Optional extra: paste any Hugging Face model id and FitCheck walks its finetune/quantized lineage to a known base ("if the base runs, your finetune runs") — the one clearly-labelled online feature.
Run it locally
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
python app.py
Open http://127.0.0.1:7860/ (add ?go for an instant sample result). Locally
the explainer reports the model isn't loaded (it only loads on the Space) —
everything else works fully offline.
Built for the Build Small hackathon (Backyard AI track).