Spaces:
Running on Zero
Running on Zero
File size: 9,140 Bytes
9f0e510 a619f61 6e3b98a 9f0e510 6e3b98a 9f0e510 1b36a6d 6e3b98a 2ad3f9f 9f0e510 483de4f 6e3b98a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 | ---
title: Modular Mind π§
emoji: π
colorFrom: green
colorTo: indigo
sdk: gradio
sdk_version: 6.15.2
app_file: app.py
pinned: false
short_description: We modulate the mind and communicate!
license: mit
tags:
- track:wood
- sponsor:modal
- achievement:offgrid
- achievement:welltuned
---
Social Media Post and Demo Video: https://www.linkedin.com/posts/dean-byrne-02a28b191_modular-mind-boss-fight-for-hugging-face-ugcPost-7472410483615084544-yUeC/?utm_source=share&utm_medium=member_desktop&rcm=ACoAAC0RumIBxlIKTkKv5tF-hb2OU7TdZ19kxcQ
Model Weights are in the Space's directory.
# βοΈ Modular Mind: Boss Fight
A mini **Dark-Souls-style** duel where the boss is controlled by a **Modular Mind** β
a handful of tiny neural *specialists* that communicate through a **shared latent**
(a `RecursiveLink` bridge) and a *coordinator* that reads the latent to choose the
boss's next move. **The brain was trained by self-play reinforcement learning** β its
tactics emerged from playing thousands of duels, nothing is scripted.
You play the **Fire Knight**. Defeat the **Demon Slime**.
## How the Modular Mind works
This is the [ModularMind-on-V2](https://github.com/your-username/ModularMind) concept at *specialist scale*: instead of one
monolithic policy, six small networks each handle one concern and talk to each other
through a latent channel.
```
game state ββΆ ββββββββββββββββ each specialist emits a latent
β Aggressor ββββ (LatentProjection) and, if it OWNS
β Stalker ββββ€ an action, a "drive" for that action
β Survivor ββββ€
β Baiter ββββ€ βββββββββββββββββ βββββββββββββββ
β Punisher (M) ββββΌβ sum ββΆβ RecursiveLink βββββββΆβ Coordinator βββΆ action
β Enrage (M) ββββ β ReGLU+residualβ sharedβ read-out β
ββββββββββββββββ βββββββββββββββββ latentβββββββββββββββ
```
- **Four action-owning specialists** push their move's score directly:
**Aggressor β CLEAVE**, **Stalker β APPROACH**, **Survivor β RETREAT**, **Baiter β IDLE**.
- **Two modulators (M)** β **Punisher** ("the player is open!") and **Enrage**
("we're low on HP β go berserk") β **own no action**. Their *only* way to affect the
fight is the latent they write into the shared `RecursiveLink`, which the coordinator
turns into modulation. So training has to *learn to use the latent channel* β the whole
point of the architecture.
The right-hand panel shows all of this live: each specialist's activity, the shared
latent bridge, and the coordinator's modulation, for every decision the boss makes.
## What emerged from training
Trained on a reward that values *dealing damage* and *pressuring in range* over
playing it safe (landing a cleave β« whiffing, and stalling / staying out of range is
penalised), the boss learned an **aggressive pressure** style:
- **closes the distance** when you're far or at mid-range,
- **cleaves on contact** β once you're in range and it's off cooldown it commits to a
lunging swing essentially every time (verified: in-range attack-rate β **0.8β1.0**),
- **retreats only when it can't swing** (mid-cooldown) to reset spacing,
- **blocks your punish** β a **Defender** specialist raises a guard (negating ~90% of
your melee) when you swing at it and it can't cleave back,
- **punishes your recovery** and gets **even more aggressive at low HP** β the Enrage
modulator raises CLEAVE through the shared latent.
It reaches a **~55β65% win rate** against a near-optimal scripted dodger (avg reward
+8, up from β12 before the reward was tuned for aggression). Against a human it's a
fair, readable fight: **dodge the red telegraph, then punish the recovery.**
> The earlier version of the brain learned a degenerate "space forever, never commit"
> policy that *technically* won but barely attacked β so the trainer now selects the
> checkpoint on **win-rate + in-range attack-rate**, and a non-attacking policy can no
> longer be saved. (`behavior()` in `train.py` measures this directly.)
## It learns from your fights (online finetuning)
The model is tiny, so a gradient step is microseconds β the boss finetunes from real
play **on the free CPU**. Each HARD-tier fight is logged (state, action, HP per boss
decision) and POSTed to `/learn`; the server rebuilds per-decision rewards (damage
dealt β taken, + kill / β death), computes REINFORCE returns, and takes **one Adam
step** ([`mm_grad.py`](mm_grad.py), numpy backprop verified against PyTorch to ~1e-8).
A frozen copy of the sim-trained weights anchors the update so it can't drift into
nonsense; the adapted weights feed straight back into the live boss.
- **On by default, in-memory.** Set `MM_ONLINE=0` to disable.
- **Persistent across restarts:** add Space secrets `HF_TOKEN` (write) and
`MM_DATASET_REPO` (e.g. `you/boss-fight-online`) and the adapted weights are pushed
to / pulled from that Dataset. Only HARD-tier fights train (keeps the data on-policy).
## Difficulty = the trained brain's decision-noise
The difficulty selector doesn't change the boss's stats β it runs the **same trained
Modular Mind at a different *mistake-rate*** (`explore` = probability of a random legal
action). vs a near-optimal scripted dodger:
| Tier | Mistake-rate | Boss win-rate | Feel |
|------|------|------|------|
| **Easy** | 50% | ~0.35 | erratic, leaves big openings β beatable |
| **Normal** | 22% | ~0.65 | competent pressure, occasional slip |
| **Hard** | 4% | ~0.95 | closes in, blocks your punish, relentless |
The browser sends the chosen tier with every decision; the server routes it through
`modular_mind.decide` with that tier's mistake-rate. *(Originally Easy/Normal were
genuinely-undertrained checkpoints β but once the boss learned to BLOCK it dominates
the sim almost immediately, so there's no longer a weak checkpoint; decision-noise is
the controllable, honest dial. `train.py` still snapshots checkpoints if you want them.)*
## Controls
| Key | Action |
|-----|--------|
| β β | Move |
| Space | Roll / dodge (i-frames, costs stamina) |
| J | Attack |
| K (hold) | **Block** β cuts incoming damage to 20% and drains stamina; if stamina hits 0 your guard **breaks** into a stagger |
| M / π | Mute / unmute music + SFX |
Background music and combat SFX play during the fight (a random track per fight,
plus per-action sound effects). Audio is served statically from `audio/`.
You begin each fight with a one-time **Aegis shield** (a cyan bubble + the *Aegis*
HUD bar): for the first few seconds it absorbs **all** damage so you can learn the
controls. Once it fades it does **not** come back.
> Click **Enter the Fog**, then click the game once so it has keyboard focus.
## Architecture / files
| File | Role |
|------|------|
| `app.py` | Gradio Space: serves the game, exposes the trained brain at `/decide` |
| `modular_mind.py` | **numpy** inference of the trained Modular Mind (no torch at runtime) |
| `mm_torch.py` | the trainable Modular Mind (specialists + RecursiveLink + coordinator) |
| `train.py` | self-play **REINFORCE** trainer β `mm_weights.npz` |
| `mm_grad.py` | pure-numpy forward+backward (REINFORCE gradient), verified vs torch β the online learner |
| `online.py` | finetunes the HARD brain from real player fights; optional HF-Dataset persistence |
| `duel_sim.py` | the headless duel simulator (the RL environment) |
| `features.py` | shared feature/action definitions (single source of truth) |
| `web/` | the HTML5 canvas game (60fps render; calls `/decide` at decision points) |
| `audio/` | background music (`mp3/`) and sound effects (`sfx/`), served statically |
| `serve_local.py` | run the whole thing locally **without gradio** (stdlib + numpy) |
The model is **tiny** (~4.5k parameters) and inference is pure numpy, so the Space
needs only `gradio` + `numpy` and starts instantly.
## Run / retrain locally
```bash
pip install -r requirements.txt # gradio + numpy (runtime)
python app.py # the Space, locally
# or, with no gradio at all:
python serve_local.py # http://localhost:7861
# retrain the boss (needs torch):
pip install torch
python train.py # -> mm_weights.npz (+ train_log.json)
```
## Credits
- **Sprites:** *Fire Knight* and *Demon Slime* free asset packs by **LuizMelo**
(itch.io). Please check their licenses for your own use.
- **Audio:** background music tracks (`audio/mp3`) and combat SFX (`audio/sfx`).
Check the licenses of your audio packs before publishing.
- **Brain:** the *Modular Mind* concept (latent-communicating specialists via
`RecursiveLink`), trained here at specialist scale by self-play RL.
Built for a HuggingFace hackathon.
|