--- title: Modular Mind 🧠 emoji: πŸ’­ colorFrom: green colorTo: indigo sdk: gradio sdk_version: 6.15.2 app_file: app.py pinned: false short_description: We modulate the mind and communicate! license: mit tags: - track:wood - sponsor:modal - achievement:offgrid - achievement:welltuned --- Social Media Post and Demo Video: https://www.linkedin.com/posts/dean-byrne-02a28b191_modular-mind-boss-fight-for-hugging-face-ugcPost-7472410483615084544-yUeC/?utm_source=share&utm_medium=member_desktop&rcm=ACoAAC0RumIBxlIKTkKv5tF-hb2OU7TdZ19kxcQ Model Weights are in the Space's directory. # βš”οΈ Modular Mind: Boss Fight A mini **Dark-Souls-style** duel where the boss is controlled by a **Modular Mind** β€” a handful of tiny neural *specialists* that communicate through a **shared latent** (a `RecursiveLink` bridge) and a *coordinator* that reads the latent to choose the boss's next move. **The brain was trained by self-play reinforcement learning** β€” its tactics emerged from playing thousands of duels, nothing is scripted. You play the **Fire Knight**. Defeat the **Demon Slime**. ## How the Modular Mind works This is the [ModularMind-on-V2](https://github.com/your-username/ModularMind) concept at *specialist scale*: instead of one monolithic policy, six small networks each handle one concern and talk to each other through a latent channel. ``` game state ─▢ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” each specialist emits a latent β”‚ Aggressor │──┐ (LatentProjection) and, if it OWNS β”‚ Stalker │─── an action, a "drive" for that action β”‚ Survivor │─── β”‚ Baiter │─── β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Punisher (M) │──┼─ sum ─▢│ RecursiveLink │─────▢│ Coordinator │─▢ action β”‚ Enrage (M) β”‚β”€β”€β”˜ β”‚ ReGLU+residualβ”‚ sharedβ”‚ read-out β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ latentβ””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` - **Four action-owning specialists** push their move's score directly: **Aggressor β†’ CLEAVE**, **Stalker β†’ APPROACH**, **Survivor β†’ RETREAT**, **Baiter β†’ IDLE**. - **Two modulators (M)** β€” **Punisher** ("the player is open!") and **Enrage** ("we're low on HP β€” go berserk") β€” **own no action**. Their *only* way to affect the fight is the latent they write into the shared `RecursiveLink`, which the coordinator turns into modulation. So training has to *learn to use the latent channel* β€” the whole point of the architecture. The right-hand panel shows all of this live: each specialist's activity, the shared latent bridge, and the coordinator's modulation, for every decision the boss makes. ## What emerged from training Trained on a reward that values *dealing damage* and *pressuring in range* over playing it safe (landing a cleave ≫ whiffing, and stalling / staying out of range is penalised), the boss learned an **aggressive pressure** style: - **closes the distance** when you're far or at mid-range, - **cleaves on contact** β€” once you're in range and it's off cooldown it commits to a lunging swing essentially every time (verified: in-range attack-rate β‰ˆ **0.8–1.0**), - **retreats only when it can't swing** (mid-cooldown) to reset spacing, - **blocks your punish** β€” a **Defender** specialist raises a guard (negating ~90% of your melee) when you swing at it and it can't cleave back, - **punishes your recovery** and gets **even more aggressive at low HP** β€” the Enrage modulator raises CLEAVE through the shared latent. It reaches a **~55–65% win rate** against a near-optimal scripted dodger (avg reward +8, up from βˆ’12 before the reward was tuned for aggression). Against a human it's a fair, readable fight: **dodge the red telegraph, then punish the recovery.** > The earlier version of the brain learned a degenerate "space forever, never commit" > policy that *technically* won but barely attacked β€” so the trainer now selects the > checkpoint on **win-rate + in-range attack-rate**, and a non-attacking policy can no > longer be saved. (`behavior()` in `train.py` measures this directly.) ## It learns from your fights (online finetuning) The model is tiny, so a gradient step is microseconds β€” the boss finetunes from real play **on the free CPU**. Each HARD-tier fight is logged (state, action, HP per boss decision) and POSTed to `/learn`; the server rebuilds per-decision rewards (damage dealt βˆ’ taken, + kill / βˆ’ death), computes REINFORCE returns, and takes **one Adam step** ([`mm_grad.py`](mm_grad.py), numpy backprop verified against PyTorch to ~1e-8). A frozen copy of the sim-trained weights anchors the update so it can't drift into nonsense; the adapted weights feed straight back into the live boss. - **On by default, in-memory.** Set `MM_ONLINE=0` to disable. - **Persistent across restarts:** add Space secrets `HF_TOKEN` (write) and `MM_DATASET_REPO` (e.g. `you/boss-fight-online`) and the adapted weights are pushed to / pulled from that Dataset. Only HARD-tier fights train (keeps the data on-policy). ## Difficulty = the trained brain's decision-noise The difficulty selector doesn't change the boss's stats β€” it runs the **same trained Modular Mind at a different *mistake-rate*** (`explore` = probability of a random legal action). vs a near-optimal scripted dodger: | Tier | Mistake-rate | Boss win-rate | Feel | |------|------|------|------| | **Easy** | 50% | ~0.35 | erratic, leaves big openings β€” beatable | | **Normal** | 22% | ~0.65 | competent pressure, occasional slip | | **Hard** | 4% | ~0.95 | closes in, blocks your punish, relentless | The browser sends the chosen tier with every decision; the server routes it through `modular_mind.decide` with that tier's mistake-rate. *(Originally Easy/Normal were genuinely-undertrained checkpoints β€” but once the boss learned to BLOCK it dominates the sim almost immediately, so there's no longer a weak checkpoint; decision-noise is the controllable, honest dial. `train.py` still snapshots checkpoints if you want them.)* ## Controls | Key | Action | |-----|--------| | ← β†’ | Move | | Space | Roll / dodge (i-frames, costs stamina) | | J | Attack | | K (hold) | **Block** β€” cuts incoming damage to 20% and drains stamina; if stamina hits 0 your guard **breaks** into a stagger | | M / πŸ”Š | Mute / unmute music + SFX | Background music and combat SFX play during the fight (a random track per fight, plus per-action sound effects). Audio is served statically from `audio/`. You begin each fight with a one-time **Aegis shield** (a cyan bubble + the *Aegis* HUD bar): for the first few seconds it absorbs **all** damage so you can learn the controls. Once it fades it does **not** come back. > Click **Enter the Fog**, then click the game once so it has keyboard focus. ## Architecture / files | File | Role | |------|------| | `app.py` | Gradio Space: serves the game, exposes the trained brain at `/decide` | | `modular_mind.py` | **numpy** inference of the trained Modular Mind (no torch at runtime) | | `mm_torch.py` | the trainable Modular Mind (specialists + RecursiveLink + coordinator) | | `train.py` | self-play **REINFORCE** trainer β†’ `mm_weights.npz` | | `mm_grad.py` | pure-numpy forward+backward (REINFORCE gradient), verified vs torch β€” the online learner | | `online.py` | finetunes the HARD brain from real player fights; optional HF-Dataset persistence | | `duel_sim.py` | the headless duel simulator (the RL environment) | | `features.py` | shared feature/action definitions (single source of truth) | | `web/` | the HTML5 canvas game (60fps render; calls `/decide` at decision points) | | `audio/` | background music (`mp3/`) and sound effects (`sfx/`), served statically | | `serve_local.py` | run the whole thing locally **without gradio** (stdlib + numpy) | The model is **tiny** (~4.5k parameters) and inference is pure numpy, so the Space needs only `gradio` + `numpy` and starts instantly. ## Run / retrain locally ```bash pip install -r requirements.txt # gradio + numpy (runtime) python app.py # the Space, locally # or, with no gradio at all: python serve_local.py # http://localhost:7861 # retrain the boss (needs torch): pip install torch python train.py # -> mm_weights.npz (+ train_log.json) ``` ## Credits - **Sprites:** *Fire Knight* and *Demon Slime* free asset packs by **LuizMelo** (itch.io). Please check their licenses for your own use. - **Audio:** background music tracks (`audio/mp3`) and combat SFX (`audio/sfx`). Check the licenses of your audio packs before publishing. - **Brain:** the *Modular Mind* concept (latent-communicating specialists via `RecursiveLink`), trained here at specialist scale by self-play RL. Built for a HuggingFace hackathon.