Spaces:
Running on Zero
Running on Zero
| title: Modular Mind π§ | |
| emoji: π | |
| colorFrom: green | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 6.15.2 | |
| app_file: app.py | |
| pinned: false | |
| short_description: We modulate the mind and communicate! | |
| license: mit | |
| tags: | |
| - track:wood | |
| - sponsor:modal | |
| - achievement:offgrid | |
| - achievement:welltuned | |
| Social Media Post and Demo Video: https://www.linkedin.com/posts/dean-byrne-02a28b191_modular-mind-boss-fight-for-hugging-face-ugcPost-7472410483615084544-yUeC/?utm_source=share&utm_medium=member_desktop&rcm=ACoAAC0RumIBxlIKTkKv5tF-hb2OU7TdZ19kxcQ | |
| Model Weights are in the Space's directory. | |
| # βοΈ Modular Mind: Boss Fight | |
| A mini **Dark-Souls-style** duel where the boss is controlled by a **Modular Mind** β | |
| a handful of tiny neural *specialists* that communicate through a **shared latent** | |
| (a `RecursiveLink` bridge) and a *coordinator* that reads the latent to choose the | |
| boss's next move. **The brain was trained by self-play reinforcement learning** β its | |
| tactics emerged from playing thousands of duels, nothing is scripted. | |
| You play the **Fire Knight**. Defeat the **Demon Slime**. | |
| ## How the Modular Mind works | |
| This is the [ModularMind-on-V2](https://github.com/your-username/ModularMind) concept at *specialist scale*: instead of one | |
| monolithic policy, six small networks each handle one concern and talk to each other | |
| through a latent channel. | |
| ``` | |
| game state ββΆ ββββββββββββββββ each specialist emits a latent | |
| β Aggressor ββββ (LatentProjection) and, if it OWNS | |
| β Stalker ββββ€ an action, a "drive" for that action | |
| β Survivor ββββ€ | |
| β Baiter ββββ€ βββββββββββββββββ βββββββββββββββ | |
| β Punisher (M) ββββΌβ sum ββΆβ RecursiveLink βββββββΆβ Coordinator βββΆ action | |
| β Enrage (M) ββββ β ReGLU+residualβ sharedβ read-out β | |
| ββββββββββββββββ βββββββββββββββββ latentβββββββββββββββ | |
| ``` | |
| - **Four action-owning specialists** push their move's score directly: | |
| **Aggressor β CLEAVE**, **Stalker β APPROACH**, **Survivor β RETREAT**, **Baiter β IDLE**. | |
| - **Two modulators (M)** β **Punisher** ("the player is open!") and **Enrage** | |
| ("we're low on HP β go berserk") β **own no action**. Their *only* way to affect the | |
| fight is the latent they write into the shared `RecursiveLink`, which the coordinator | |
| turns into modulation. So training has to *learn to use the latent channel* β the whole | |
| point of the architecture. | |
| The right-hand panel shows all of this live: each specialist's activity, the shared | |
| latent bridge, and the coordinator's modulation, for every decision the boss makes. | |
| ## What emerged from training | |
| Trained on a reward that values *dealing damage* and *pressuring in range* over | |
| playing it safe (landing a cleave β« whiffing, and stalling / staying out of range is | |
| penalised), the boss learned an **aggressive pressure** style: | |
| - **closes the distance** when you're far or at mid-range, | |
| - **cleaves on contact** β once you're in range and it's off cooldown it commits to a | |
| lunging swing essentially every time (verified: in-range attack-rate β **0.8β1.0**), | |
| - **retreats only when it can't swing** (mid-cooldown) to reset spacing, | |
| - **blocks your punish** β a **Defender** specialist raises a guard (negating ~90% of | |
| your melee) when you swing at it and it can't cleave back, | |
| - **punishes your recovery** and gets **even more aggressive at low HP** β the Enrage | |
| modulator raises CLEAVE through the shared latent. | |
| It reaches a **~55β65% win rate** against a near-optimal scripted dodger (avg reward | |
| +8, up from β12 before the reward was tuned for aggression). Against a human it's a | |
| fair, readable fight: **dodge the red telegraph, then punish the recovery.** | |
| > The earlier version of the brain learned a degenerate "space forever, never commit" | |
| > policy that *technically* won but barely attacked β so the trainer now selects the | |
| > checkpoint on **win-rate + in-range attack-rate**, and a non-attacking policy can no | |
| > longer be saved. (`behavior()` in `train.py` measures this directly.) | |
| ## It learns from your fights (online finetuning) | |
| The model is tiny, so a gradient step is microseconds β the boss finetunes from real | |
| play **on the free CPU**. Each HARD-tier fight is logged (state, action, HP per boss | |
| decision) and POSTed to `/learn`; the server rebuilds per-decision rewards (damage | |
| dealt β taken, + kill / β death), computes REINFORCE returns, and takes **one Adam | |
| step** ([`mm_grad.py`](mm_grad.py), numpy backprop verified against PyTorch to ~1e-8). | |
| A frozen copy of the sim-trained weights anchors the update so it can't drift into | |
| nonsense; the adapted weights feed straight back into the live boss. | |
| - **On by default, in-memory.** Set `MM_ONLINE=0` to disable. | |
| - **Persistent across restarts:** add Space secrets `HF_TOKEN` (write) and | |
| `MM_DATASET_REPO` (e.g. `you/boss-fight-online`) and the adapted weights are pushed | |
| to / pulled from that Dataset. Only HARD-tier fights train (keeps the data on-policy). | |
| ## Difficulty = the trained brain's decision-noise | |
| The difficulty selector doesn't change the boss's stats β it runs the **same trained | |
| Modular Mind at a different *mistake-rate*** (`explore` = probability of a random legal | |
| action). vs a near-optimal scripted dodger: | |
| | Tier | Mistake-rate | Boss win-rate | Feel | | |
| |------|------|------|------| | |
| | **Easy** | 50% | ~0.35 | erratic, leaves big openings β beatable | | |
| | **Normal** | 22% | ~0.65 | competent pressure, occasional slip | | |
| | **Hard** | 4% | ~0.95 | closes in, blocks your punish, relentless | | |
| The browser sends the chosen tier with every decision; the server routes it through | |
| `modular_mind.decide` with that tier's mistake-rate. *(Originally Easy/Normal were | |
| genuinely-undertrained checkpoints β but once the boss learned to BLOCK it dominates | |
| the sim almost immediately, so there's no longer a weak checkpoint; decision-noise is | |
| the controllable, honest dial. `train.py` still snapshots checkpoints if you want them.)* | |
| ## Controls | |
| | Key | Action | | |
| |-----|--------| | |
| | β β | Move | | |
| | Space | Roll / dodge (i-frames, costs stamina) | | |
| | J | Attack | | |
| | K (hold) | **Block** β cuts incoming damage to 20% and drains stamina; if stamina hits 0 your guard **breaks** into a stagger | | |
| | M / π | Mute / unmute music + SFX | | |
| Background music and combat SFX play during the fight (a random track per fight, | |
| plus per-action sound effects). Audio is served statically from `audio/`. | |
| You begin each fight with a one-time **Aegis shield** (a cyan bubble + the *Aegis* | |
| HUD bar): for the first few seconds it absorbs **all** damage so you can learn the | |
| controls. Once it fades it does **not** come back. | |
| > Click **Enter the Fog**, then click the game once so it has keyboard focus. | |
| ## Architecture / files | |
| | File | Role | | |
| |------|------| | |
| | `app.py` | Gradio Space: serves the game, exposes the trained brain at `/decide` | | |
| | `modular_mind.py` | **numpy** inference of the trained Modular Mind (no torch at runtime) | | |
| | `mm_torch.py` | the trainable Modular Mind (specialists + RecursiveLink + coordinator) | | |
| | `train.py` | self-play **REINFORCE** trainer β `mm_weights.npz` | | |
| | `mm_grad.py` | pure-numpy forward+backward (REINFORCE gradient), verified vs torch β the online learner | | |
| | `online.py` | finetunes the HARD brain from real player fights; optional HF-Dataset persistence | | |
| | `duel_sim.py` | the headless duel simulator (the RL environment) | | |
| | `features.py` | shared feature/action definitions (single source of truth) | | |
| | `web/` | the HTML5 canvas game (60fps render; calls `/decide` at decision points) | | |
| | `audio/` | background music (`mp3/`) and sound effects (`sfx/`), served statically | | |
| | `serve_local.py` | run the whole thing locally **without gradio** (stdlib + numpy) | | |
| The model is **tiny** (~4.5k parameters) and inference is pure numpy, so the Space | |
| needs only `gradio` + `numpy` and starts instantly. | |
| ## Run / retrain locally | |
| ```bash | |
| pip install -r requirements.txt # gradio + numpy (runtime) | |
| python app.py # the Space, locally | |
| # or, with no gradio at all: | |
| python serve_local.py # http://localhost:7861 | |
| # retrain the boss (needs torch): | |
| pip install torch | |
| python train.py # -> mm_weights.npz (+ train_log.json) | |
| ``` | |
| ## Credits | |
| - **Sprites:** *Fire Knight* and *Demon Slime* free asset packs by **LuizMelo** | |
| (itch.io). Please check their licenses for your own use. | |
| - **Audio:** background music tracks (`audio/mp3`) and combat SFX (`audio/sfx`). | |
| Check the licenses of your audio packs before publishing. | |
| - **Brain:** the *Modular Mind* concept (latent-communicating specialists via | |
| `RecursiveLink`), trained here at specialist scale by self-play RL. | |
| Built for a HuggingFace hackathon. | |