Spaces:
Running on Zero
Running on Zero
Delete web/README.md
Browse files- web/README.md +0 -168
web/README.md
DELETED
|
@@ -1,168 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
title: "Quazim0t0's 🍄 Thousand Token Wood Entry"
|
| 3 |
-
emoji: 🍄
|
| 4 |
-
colorFrom: green
|
| 5 |
-
colorTo: indigo
|
| 6 |
-
sdk: gradio
|
| 7 |
-
sdk_version: 6.15.2
|
| 8 |
-
app_file: app.py
|
| 9 |
-
pinned: false
|
| 10 |
-
license: mit
|
| 11 |
-
---
|
| 12 |
-
|
| 13 |
-
# ⚔️ Modular Mind: Boss Fight
|
| 14 |
-
|
| 15 |
-
A mini **Dark-Souls-style** duel where the boss is controlled by a **Modular Mind** —
|
| 16 |
-
a handful of tiny neural *specialists* that communicate through a **shared latent**
|
| 17 |
-
(a `RecursiveLink` bridge) and a *coordinator* that reads the latent to choose the
|
| 18 |
-
boss's next move. **The brain was trained by self-play reinforcement learning** — its
|
| 19 |
-
tactics emerged from playing thousands of duels, nothing is scripted.
|
| 20 |
-
|
| 21 |
-
You play the **Fire Knight**. Defeat the **Demon Slime**.
|
| 22 |
-
|
| 23 |
-
## How the Modular Mind works
|
| 24 |
-
|
| 25 |
-
This is the [ModularMind-on-V2](https://github.com/your-username/ModularMind) concept at *specialist scale*: instead of one
|
| 26 |
-
monolithic policy, six small networks each handle one concern and talk to each other
|
| 27 |
-
through a latent channel.
|
| 28 |
-
|
| 29 |
-
```
|
| 30 |
-
game state ─▶ ┌──────────────┐ each specialist emits a latent
|
| 31 |
-
│ Aggressor │──┐ (LatentProjection) and, if it OWNS
|
| 32 |
-
│ Stalker │──┤ an action, a "drive" for that action
|
| 33 |
-
│ Survivor │──┤
|
| 34 |
-
│ Baiter │──┤ ┌───────────────┐ ┌─────────────┐
|
| 35 |
-
│ Punisher (M) │──┼─ sum ─▶│ RecursiveLink │─────▶│ Coordinator │─▶ action
|
| 36 |
-
│ Enrage (M) │──┘ │ ReGLU+residual│ shared│ read-out │
|
| 37 |
-
└──────────────┘ └───────────────┘ latent└─────────────┘
|
| 38 |
-
```
|
| 39 |
-
|
| 40 |
-
- **Four action-owning specialists** push their move's score directly:
|
| 41 |
-
**Aggressor → CLEAVE**, **Stalker → APPROACH**, **Survivor → RETREAT**, **Baiter → IDLE**.
|
| 42 |
-
- **Two modulators (M)** — **Punisher** ("the player is open!") and **Enrage**
|
| 43 |
-
("we're low on HP — go berserk") — **own no action**. Their *only* way to affect the
|
| 44 |
-
fight is the latent they write into the shared `RecursiveLink`, which the coordinator
|
| 45 |
-
turns into modulation. So training has to *learn to use the latent channel* — the whole
|
| 46 |
-
point of the architecture.
|
| 47 |
-
|
| 48 |
-
The right-hand panel shows all of this live: each specialist's activity, the shared
|
| 49 |
-
latent bridge, and the coordinator's modulation, for every decision the boss makes.
|
| 50 |
-
|
| 51 |
-
## What emerged from training
|
| 52 |
-
|
| 53 |
-
Trained on a reward that values *dealing damage* and *pressuring in range* over
|
| 54 |
-
playing it safe (landing a cleave ≫ whiffing, and stalling / staying out of range is
|
| 55 |
-
penalised), the boss learned an **aggressive pressure** style:
|
| 56 |
-
|
| 57 |
-
- **closes the distance** when you're far or at mid-range,
|
| 58 |
-
- **cleaves on contact** — once you're in range and it's off cooldown it commits to a
|
| 59 |
-
lunging swing essentially every time (verified: in-range attack-rate ≈ **0.8–1.0**),
|
| 60 |
-
- **retreats only when it can't swing** (mid-cooldown) to reset spacing,
|
| 61 |
-
- **blocks your punish** — a **Defender** specialist raises a guard (negating ~90% of
|
| 62 |
-
your melee) when you swing at it and it can't cleave back,
|
| 63 |
-
- **punishes your recovery** and gets **even more aggressive at low HP** — the Enrage
|
| 64 |
-
modulator raises CLEAVE through the shared latent.
|
| 65 |
-
|
| 66 |
-
It reaches a **~55–65% win rate** against a near-optimal scripted dodger (avg reward
|
| 67 |
-
+8, up from −12 before the reward was tuned for aggression). Against a human it's a
|
| 68 |
-
fair, readable fight: **dodge the red telegraph, then punish the recovery.**
|
| 69 |
-
|
| 70 |
-
> The earlier version of the brain learned a degenerate "space forever, never commit"
|
| 71 |
-
> policy that *technically* won but barely attacked — so the trainer now selects the
|
| 72 |
-
> checkpoint on **win-rate + in-range attack-rate**, and a non-attacking policy can no
|
| 73 |
-
> longer be saved. (`behavior()` in `train.py` measures this directly.)
|
| 74 |
-
|
| 75 |
-
## It learns from your fights (online finetuning)
|
| 76 |
-
|
| 77 |
-
The model is tiny, so a gradient step is microseconds — the boss finetunes from real
|
| 78 |
-
play **on the free CPU**. Each HARD-tier fight is logged (state, action, HP per boss
|
| 79 |
-
decision) and POSTed to `/learn`; the server rebuilds per-decision rewards (damage
|
| 80 |
-
dealt − taken, + kill / − death), computes REINFORCE returns, and takes **one Adam
|
| 81 |
-
step** ([`mm_grad.py`](mm_grad.py), numpy backprop verified against PyTorch to ~1e-8).
|
| 82 |
-
A frozen copy of the sim-trained weights anchors the update so it can't drift into
|
| 83 |
-
nonsense; the adapted weights feed straight back into the live boss.
|
| 84 |
-
|
| 85 |
-
- **On by default, in-memory.** Set `MM_ONLINE=0` to disable.
|
| 86 |
-
- **Persistent across restarts:** add Space secrets `HF_TOKEN` (write) and
|
| 87 |
-
`MM_DATASET_REPO` (e.g. `you/boss-fight-online`) and the adapted weights are pushed
|
| 88 |
-
to / pulled from that Dataset. Only HARD-tier fights train (keeps the data on-policy).
|
| 89 |
-
|
| 90 |
-
## Difficulty = the trained brain's decision-noise
|
| 91 |
-
|
| 92 |
-
The difficulty selector doesn't change the boss's stats — it runs the **same trained
|
| 93 |
-
Modular Mind at a different *mistake-rate*** (`explore` = probability of a random legal
|
| 94 |
-
action). vs a near-optimal scripted dodger:
|
| 95 |
-
|
| 96 |
-
| Tier | Mistake-rate | Boss win-rate | Feel |
|
| 97 |
-
|------|------|------|------|
|
| 98 |
-
| **Easy** | 50% | ~0.35 | erratic, leaves big openings — beatable |
|
| 99 |
-
| **Normal** | 22% | ~0.65 | competent pressure, occasional slip |
|
| 100 |
-
| **Hard** | 4% | ~0.95 | closes in, blocks your punish, relentless |
|
| 101 |
-
|
| 102 |
-
The browser sends the chosen tier with every decision; the server routes it through
|
| 103 |
-
`modular_mind.decide` with that tier's mistake-rate. *(Originally Easy/Normal were
|
| 104 |
-
genuinely-undertrained checkpoints — but once the boss learned to BLOCK it dominates
|
| 105 |
-
the sim almost immediately, so there's no longer a weak checkpoint; decision-noise is
|
| 106 |
-
the controllable, honest dial. `train.py` still snapshots checkpoints if you want them.)*
|
| 107 |
-
|
| 108 |
-
## Controls
|
| 109 |
-
|
| 110 |
-
| Key | Action |
|
| 111 |
-
|-----|--------|
|
| 112 |
-
| ← → | Move |
|
| 113 |
-
| Space | Roll / dodge (i-frames, costs stamina) |
|
| 114 |
-
| J | Attack |
|
| 115 |
-
| K (hold) | **Block** — cuts incoming damage to 20% and drains stamina; if stamina hits 0 your guard **breaks** into a stagger |
|
| 116 |
-
| M / 🔊 | Mute / unmute music + SFX |
|
| 117 |
-
|
| 118 |
-
Background music and combat SFX play during the fight (a random track per fight,
|
| 119 |
-
plus per-action sound effects). Audio is served statically from `audio/`.
|
| 120 |
-
|
| 121 |
-
You begin each fight with a one-time **Aegis shield** (a cyan bubble + the *Aegis*
|
| 122 |
-
HUD bar): for the first few seconds it absorbs **all** damage so you can learn the
|
| 123 |
-
controls. Once it fades it does **not** come back.
|
| 124 |
-
|
| 125 |
-
> Click **Enter the Fog**, then click the game once so it has keyboard focus.
|
| 126 |
-
|
| 127 |
-
## Architecture / files
|
| 128 |
-
|
| 129 |
-
| File | Role |
|
| 130 |
-
|------|------|
|
| 131 |
-
| `app.py` | Gradio Space: serves the game, exposes the trained brain at `/decide` |
|
| 132 |
-
| `modular_mind.py` | **numpy** inference of the trained Modular Mind (no torch at runtime) |
|
| 133 |
-
| `mm_torch.py` | the trainable Modular Mind (specialists + RecursiveLink + coordinator) |
|
| 134 |
-
| `train.py` | self-play **REINFORCE** trainer → `mm_weights.npz` |
|
| 135 |
-
| `mm_grad.py` | pure-numpy forward+backward (REINFORCE gradient), verified vs torch — the online learner |
|
| 136 |
-
| `online.py` | finetunes the HARD brain from real player fights; optional HF-Dataset persistence |
|
| 137 |
-
| `duel_sim.py` | the headless duel simulator (the RL environment) |
|
| 138 |
-
| `features.py` | shared feature/action definitions (single source of truth) |
|
| 139 |
-
| `web/` | the HTML5 canvas game (60fps render; calls `/decide` at decision points) |
|
| 140 |
-
| `audio/` | background music (`mp3/`) and sound effects (`sfx/`), served statically |
|
| 141 |
-
| `serve_local.py` | run the whole thing locally **without gradio** (stdlib + numpy) |
|
| 142 |
-
|
| 143 |
-
The model is **tiny** (~4.5k parameters) and inference is pure numpy, so the Space
|
| 144 |
-
needs only `gradio` + `numpy` and starts instantly.
|
| 145 |
-
|
| 146 |
-
## Run / retrain locally
|
| 147 |
-
|
| 148 |
-
```bash
|
| 149 |
-
pip install -r requirements.txt # gradio + numpy (runtime)
|
| 150 |
-
python app.py # the Space, locally
|
| 151 |
-
# or, with no gradio at all:
|
| 152 |
-
python serve_local.py # http://localhost:7861
|
| 153 |
-
|
| 154 |
-
# retrain the boss (needs torch):
|
| 155 |
-
pip install torch
|
| 156 |
-
python train.py # -> mm_weights.npz (+ train_log.json)
|
| 157 |
-
```
|
| 158 |
-
|
| 159 |
-
## Credits
|
| 160 |
-
|
| 161 |
-
- **Sprites:** *Fire Knight* and *Demon Slime* free asset packs by **LuizMelo**
|
| 162 |
-
(itch.io). Please check their licenses for your own use.
|
| 163 |
-
- **Audio:** background music tracks (`audio/mp3`) and combat SFX (`audio/sfx`).
|
| 164 |
-
Check the licenses of your audio packs before publishing.
|
| 165 |
-
- **Brain:** the *Modular Mind* concept (latent-communicating specialists via
|
| 166 |
-
`RecursiveLink`), trained here at specialist scale by self-play RL.
|
| 167 |
-
|
| 168 |
-
Built for a HuggingFace hackathon.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|