File size: 9,140 Bytes
9f0e510
a619f61
 
6e3b98a
 
9f0e510
6e3b98a
9f0e510
 
1b36a6d
6e3b98a
2ad3f9f
 
 
 
 
9f0e510
483de4f
 
6e3b98a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
---
title: Modular Mind 🧠
emoji: πŸ’­
colorFrom: green
colorTo: indigo
sdk: gradio
sdk_version: 6.15.2
app_file: app.py
pinned: false
short_description: We modulate the mind and communicate!
license: mit
tags:
  - track:wood
  - sponsor:modal
  - achievement:offgrid
  - achievement:welltuned
---
Social Media Post and Demo Video: https://www.linkedin.com/posts/dean-byrne-02a28b191_modular-mind-boss-fight-for-hugging-face-ugcPost-7472410483615084544-yUeC/?utm_source=share&utm_medium=member_desktop&rcm=ACoAAC0RumIBxlIKTkKv5tF-hb2OU7TdZ19kxcQ
Model Weights are in the Space's directory.
# βš”οΈ Modular Mind: Boss Fight

A mini **Dark-Souls-style** duel where the boss is controlled by a **Modular Mind** β€”
a handful of tiny neural *specialists* that communicate through a **shared latent**
(a `RecursiveLink` bridge) and a *coordinator* that reads the latent to choose the
boss's next move. **The brain was trained by self-play reinforcement learning** β€” its
tactics emerged from playing thousands of duels, nothing is scripted.

You play the **Fire Knight**. Defeat the **Demon Slime**.

## How the Modular Mind works

This is the [ModularMind-on-V2](https://github.com/your-username/ModularMind) concept at *specialist scale*: instead of one
monolithic policy, six small networks each handle one concern and talk to each other
through a latent channel.

```
 game state ─▢ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   each specialist emits a latent
               β”‚ Aggressor    │──┐  (LatentProjection) and, if it OWNS
               β”‚ Stalker      │───   an action, a "drive" for that action
               β”‚ Survivor     │───
               β”‚ Baiter       │───        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
               β”‚ Punisher (M) │──┼─ sum ─▢│ RecursiveLink │─────▢│ Coordinator │─▢ action
               β”‚ Enrage   (M) β”‚β”€β”€β”˜        β”‚ ReGLU+residualβ”‚ sharedβ”‚  read-out   β”‚
               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ latentβ””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

- **Four action-owning specialists** push their move's score directly:
  **Aggressor β†’ CLEAVE**, **Stalker β†’ APPROACH**, **Survivor β†’ RETREAT**, **Baiter β†’ IDLE**.
- **Two modulators (M)** β€” **Punisher** ("the player is open!") and **Enrage**
  ("we're low on HP β€” go berserk") β€” **own no action**. Their *only* way to affect the
  fight is the latent they write into the shared `RecursiveLink`, which the coordinator
  turns into modulation. So training has to *learn to use the latent channel* β€” the whole
  point of the architecture.

The right-hand panel shows all of this live: each specialist's activity, the shared
latent bridge, and the coordinator's modulation, for every decision the boss makes.

## What emerged from training

Trained on a reward that values *dealing damage* and *pressuring in range* over
playing it safe (landing a cleave ≫ whiffing, and stalling / staying out of range is
penalised), the boss learned an **aggressive pressure** style:

- **closes the distance** when you're far or at mid-range,
- **cleaves on contact** β€” once you're in range and it's off cooldown it commits to a
  lunging swing essentially every time (verified: in-range attack-rate β‰ˆ **0.8–1.0**),
- **retreats only when it can't swing** (mid-cooldown) to reset spacing,
- **blocks your punish** β€” a **Defender** specialist raises a guard (negating ~90% of
  your melee) when you swing at it and it can't cleave back,
- **punishes your recovery** and gets **even more aggressive at low HP** β€” the Enrage
  modulator raises CLEAVE through the shared latent.

It reaches a **~55–65% win rate** against a near-optimal scripted dodger (avg reward
+8, up from βˆ’12 before the reward was tuned for aggression). Against a human it's a
fair, readable fight: **dodge the red telegraph, then punish the recovery.**

> The earlier version of the brain learned a degenerate "space forever, never commit"
> policy that *technically* won but barely attacked β€” so the trainer now selects the
> checkpoint on **win-rate + in-range attack-rate**, and a non-attacking policy can no
> longer be saved. (`behavior()` in `train.py` measures this directly.)

## It learns from your fights (online finetuning)

The model is tiny, so a gradient step is microseconds β€” the boss finetunes from real
play **on the free CPU**. Each HARD-tier fight is logged (state, action, HP per boss
decision) and POSTed to `/learn`; the server rebuilds per-decision rewards (damage
dealt βˆ’ taken, + kill / βˆ’ death), computes REINFORCE returns, and takes **one Adam
step** ([`mm_grad.py`](mm_grad.py), numpy backprop verified against PyTorch to ~1e-8).
A frozen copy of the sim-trained weights anchors the update so it can't drift into
nonsense; the adapted weights feed straight back into the live boss.

- **On by default, in-memory.** Set `MM_ONLINE=0` to disable.
- **Persistent across restarts:** add Space secrets `HF_TOKEN` (write) and
  `MM_DATASET_REPO` (e.g. `you/boss-fight-online`) and the adapted weights are pushed
  to / pulled from that Dataset. Only HARD-tier fights train (keeps the data on-policy).

## Difficulty = the trained brain's decision-noise

The difficulty selector doesn't change the boss's stats β€” it runs the **same trained
Modular Mind at a different *mistake-rate*** (`explore` = probability of a random legal
action). vs a near-optimal scripted dodger:

| Tier | Mistake-rate | Boss win-rate | Feel |
|------|------|------|------|
| **Easy** | 50% | ~0.35 | erratic, leaves big openings β€” beatable |
| **Normal** | 22% | ~0.65 | competent pressure, occasional slip |
| **Hard** | 4% | ~0.95 | closes in, blocks your punish, relentless |

The browser sends the chosen tier with every decision; the server routes it through
`modular_mind.decide` with that tier's mistake-rate. *(Originally Easy/Normal were
genuinely-undertrained checkpoints β€” but once the boss learned to BLOCK it dominates
the sim almost immediately, so there's no longer a weak checkpoint; decision-noise is
the controllable, honest dial. `train.py` still snapshots checkpoints if you want them.)*

## Controls

| Key | Action |
|-----|--------|
| ← β†’ | Move |
| Space | Roll / dodge (i-frames, costs stamina) |
| J | Attack |
| K (hold) | **Block** β€” cuts incoming damage to 20% and drains stamina; if stamina hits 0 your guard **breaks** into a stagger |
| M / πŸ”Š | Mute / unmute music + SFX |

Background music and combat SFX play during the fight (a random track per fight,
plus per-action sound effects). Audio is served statically from `audio/`.

You begin each fight with a one-time **Aegis shield** (a cyan bubble + the *Aegis*
HUD bar): for the first few seconds it absorbs **all** damage so you can learn the
controls. Once it fades it does **not** come back.

> Click **Enter the Fog**, then click the game once so it has keyboard focus.

## Architecture / files

| File | Role |
|------|------|
| `app.py` | Gradio Space: serves the game, exposes the trained brain at `/decide` |
| `modular_mind.py` | **numpy** inference of the trained Modular Mind (no torch at runtime) |
| `mm_torch.py` | the trainable Modular Mind (specialists + RecursiveLink + coordinator) |
| `train.py` | self-play **REINFORCE** trainer β†’ `mm_weights.npz` |
| `mm_grad.py` | pure-numpy forward+backward (REINFORCE gradient), verified vs torch β€” the online learner |
| `online.py` | finetunes the HARD brain from real player fights; optional HF-Dataset persistence |
| `duel_sim.py` | the headless duel simulator (the RL environment) |
| `features.py` | shared feature/action definitions (single source of truth) |
| `web/` | the HTML5 canvas game (60fps render; calls `/decide` at decision points) |
| `audio/` | background music (`mp3/`) and sound effects (`sfx/`), served statically |
| `serve_local.py` | run the whole thing locally **without gradio** (stdlib + numpy) |

The model is **tiny** (~4.5k parameters) and inference is pure numpy, so the Space
needs only `gradio` + `numpy` and starts instantly.

## Run / retrain locally

```bash
pip install -r requirements.txt          # gradio + numpy (runtime)
python app.py                            # the Space, locally
# or, with no gradio at all:
python serve_local.py                    # http://localhost:7861

# retrain the boss (needs torch):
pip install torch
python train.py                          # -> mm_weights.npz  (+ train_log.json)
```

## Credits

- **Sprites:** *Fire Knight* and *Demon Slime* free asset packs by **LuizMelo**
  (itch.io). Please check their licenses for your own use.
- **Audio:** background music tracks (`audio/mp3`) and combat SFX (`audio/sfx`).
  Check the licenses of your audio packs before publishing.
- **Brain:** the *Modular Mind* concept (latent-communicating specialists via
  `RecursiveLink`), trained here at specialist scale by self-play RL.

Built for a HuggingFace hackathon.