ModuleMind / README.md
Quazim0t0's picture
Update README.md
483de4f verified
|
Raw
History Blame Contribute Delete
9.14 kB
---
title: Modular Mind 🧠
emoji: πŸ’­
colorFrom: green
colorTo: indigo
sdk: gradio
sdk_version: 6.15.2
app_file: app.py
pinned: false
short_description: We modulate the mind and communicate!
license: mit
tags:
- track:wood
- sponsor:modal
- achievement:offgrid
- achievement:welltuned
---
Social Media Post and Demo Video: https://www.linkedin.com/posts/dean-byrne-02a28b191_modular-mind-boss-fight-for-hugging-face-ugcPost-7472410483615084544-yUeC/?utm_source=share&utm_medium=member_desktop&rcm=ACoAAC0RumIBxlIKTkKv5tF-hb2OU7TdZ19kxcQ
Model Weights are in the Space's directory.
# βš”οΈ Modular Mind: Boss Fight
A mini **Dark-Souls-style** duel where the boss is controlled by a **Modular Mind** β€”
a handful of tiny neural *specialists* that communicate through a **shared latent**
(a `RecursiveLink` bridge) and a *coordinator* that reads the latent to choose the
boss's next move. **The brain was trained by self-play reinforcement learning** β€” its
tactics emerged from playing thousands of duels, nothing is scripted.
You play the **Fire Knight**. Defeat the **Demon Slime**.
## How the Modular Mind works
This is the [ModularMind-on-V2](https://github.com/your-username/ModularMind) concept at *specialist scale*: instead of one
monolithic policy, six small networks each handle one concern and talk to each other
through a latent channel.
```
game state ─▢ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” each specialist emits a latent
β”‚ Aggressor │──┐ (LatentProjection) and, if it OWNS
β”‚ Stalker │─── an action, a "drive" for that action
β”‚ Survivor │───
β”‚ Baiter │─── β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Punisher (M) │──┼─ sum ─▢│ RecursiveLink │─────▢│ Coordinator │─▢ action
β”‚ Enrage (M) β”‚β”€β”€β”˜ β”‚ ReGLU+residualβ”‚ sharedβ”‚ read-out β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ latentβ””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
- **Four action-owning specialists** push their move's score directly:
**Aggressor β†’ CLEAVE**, **Stalker β†’ APPROACH**, **Survivor β†’ RETREAT**, **Baiter β†’ IDLE**.
- **Two modulators (M)** β€” **Punisher** ("the player is open!") and **Enrage**
("we're low on HP β€” go berserk") β€” **own no action**. Their *only* way to affect the
fight is the latent they write into the shared `RecursiveLink`, which the coordinator
turns into modulation. So training has to *learn to use the latent channel* β€” the whole
point of the architecture.
The right-hand panel shows all of this live: each specialist's activity, the shared
latent bridge, and the coordinator's modulation, for every decision the boss makes.
## What emerged from training
Trained on a reward that values *dealing damage* and *pressuring in range* over
playing it safe (landing a cleave ≫ whiffing, and stalling / staying out of range is
penalised), the boss learned an **aggressive pressure** style:
- **closes the distance** when you're far or at mid-range,
- **cleaves on contact** β€” once you're in range and it's off cooldown it commits to a
lunging swing essentially every time (verified: in-range attack-rate β‰ˆ **0.8–1.0**),
- **retreats only when it can't swing** (mid-cooldown) to reset spacing,
- **blocks your punish** β€” a **Defender** specialist raises a guard (negating ~90% of
your melee) when you swing at it and it can't cleave back,
- **punishes your recovery** and gets **even more aggressive at low HP** β€” the Enrage
modulator raises CLEAVE through the shared latent.
It reaches a **~55–65% win rate** against a near-optimal scripted dodger (avg reward
+8, up from βˆ’12 before the reward was tuned for aggression). Against a human it's a
fair, readable fight: **dodge the red telegraph, then punish the recovery.**
> The earlier version of the brain learned a degenerate "space forever, never commit"
> policy that *technically* won but barely attacked β€” so the trainer now selects the
> checkpoint on **win-rate + in-range attack-rate**, and a non-attacking policy can no
> longer be saved. (`behavior()` in `train.py` measures this directly.)
## It learns from your fights (online finetuning)
The model is tiny, so a gradient step is microseconds β€” the boss finetunes from real
play **on the free CPU**. Each HARD-tier fight is logged (state, action, HP per boss
decision) and POSTed to `/learn`; the server rebuilds per-decision rewards (damage
dealt βˆ’ taken, + kill / βˆ’ death), computes REINFORCE returns, and takes **one Adam
step** ([`mm_grad.py`](mm_grad.py), numpy backprop verified against PyTorch to ~1e-8).
A frozen copy of the sim-trained weights anchors the update so it can't drift into
nonsense; the adapted weights feed straight back into the live boss.
- **On by default, in-memory.** Set `MM_ONLINE=0` to disable.
- **Persistent across restarts:** add Space secrets `HF_TOKEN` (write) and
`MM_DATASET_REPO` (e.g. `you/boss-fight-online`) and the adapted weights are pushed
to / pulled from that Dataset. Only HARD-tier fights train (keeps the data on-policy).
## Difficulty = the trained brain's decision-noise
The difficulty selector doesn't change the boss's stats β€” it runs the **same trained
Modular Mind at a different *mistake-rate*** (`explore` = probability of a random legal
action). vs a near-optimal scripted dodger:
| Tier | Mistake-rate | Boss win-rate | Feel |
|------|------|------|------|
| **Easy** | 50% | ~0.35 | erratic, leaves big openings β€” beatable |
| **Normal** | 22% | ~0.65 | competent pressure, occasional slip |
| **Hard** | 4% | ~0.95 | closes in, blocks your punish, relentless |
The browser sends the chosen tier with every decision; the server routes it through
`modular_mind.decide` with that tier's mistake-rate. *(Originally Easy/Normal were
genuinely-undertrained checkpoints β€” but once the boss learned to BLOCK it dominates
the sim almost immediately, so there's no longer a weak checkpoint; decision-noise is
the controllable, honest dial. `train.py` still snapshots checkpoints if you want them.)*
## Controls
| Key | Action |
|-----|--------|
| ← β†’ | Move |
| Space | Roll / dodge (i-frames, costs stamina) |
| J | Attack |
| K (hold) | **Block** β€” cuts incoming damage to 20% and drains stamina; if stamina hits 0 your guard **breaks** into a stagger |
| M / πŸ”Š | Mute / unmute music + SFX |
Background music and combat SFX play during the fight (a random track per fight,
plus per-action sound effects). Audio is served statically from `audio/`.
You begin each fight with a one-time **Aegis shield** (a cyan bubble + the *Aegis*
HUD bar): for the first few seconds it absorbs **all** damage so you can learn the
controls. Once it fades it does **not** come back.
> Click **Enter the Fog**, then click the game once so it has keyboard focus.
## Architecture / files
| File | Role |
|------|------|
| `app.py` | Gradio Space: serves the game, exposes the trained brain at `/decide` |
| `modular_mind.py` | **numpy** inference of the trained Modular Mind (no torch at runtime) |
| `mm_torch.py` | the trainable Modular Mind (specialists + RecursiveLink + coordinator) |
| `train.py` | self-play **REINFORCE** trainer β†’ `mm_weights.npz` |
| `mm_grad.py` | pure-numpy forward+backward (REINFORCE gradient), verified vs torch β€” the online learner |
| `online.py` | finetunes the HARD brain from real player fights; optional HF-Dataset persistence |
| `duel_sim.py` | the headless duel simulator (the RL environment) |
| `features.py` | shared feature/action definitions (single source of truth) |
| `web/` | the HTML5 canvas game (60fps render; calls `/decide` at decision points) |
| `audio/` | background music (`mp3/`) and sound effects (`sfx/`), served statically |
| `serve_local.py` | run the whole thing locally **without gradio** (stdlib + numpy) |
The model is **tiny** (~4.5k parameters) and inference is pure numpy, so the Space
needs only `gradio` + `numpy` and starts instantly.
## Run / retrain locally
```bash
pip install -r requirements.txt # gradio + numpy (runtime)
python app.py # the Space, locally
# or, with no gradio at all:
python serve_local.py # http://localhost:7861
# retrain the boss (needs torch):
pip install torch
python train.py # -> mm_weights.npz (+ train_log.json)
```
## Credits
- **Sprites:** *Fire Knight* and *Demon Slime* free asset packs by **LuizMelo**
(itch.io). Please check their licenses for your own use.
- **Audio:** background music tracks (`audio/mp3`) and combat SFX (`audio/sfx`).
Check the licenses of your audio packs before publishing.
- **Brain:** the *Modular Mind* concept (latent-communicating specialists via
`RecursiveLink`), trained here at specialist scale by self-play RL.
Built for a HuggingFace hackathon.