Quazim0t0 commited on
Commit
cd0afc9
·
verified ·
1 Parent(s): 1b36a6d

Delete web/README.md

Browse files
Files changed (1) hide show
  1. web/README.md +0 -168
web/README.md DELETED
@@ -1,168 +0,0 @@
1
- ---
2
- title: "Quazim0t0's 🍄 Thousand Token Wood Entry"
3
- emoji: 🍄
4
- colorFrom: green
5
- colorTo: indigo
6
- sdk: gradio
7
- sdk_version: 6.15.2
8
- app_file: app.py
9
- pinned: false
10
- license: mit
11
- ---
12
-
13
- # ⚔️ Modular Mind: Boss Fight
14
-
15
- A mini **Dark-Souls-style** duel where the boss is controlled by a **Modular Mind** —
16
- a handful of tiny neural *specialists* that communicate through a **shared latent**
17
- (a `RecursiveLink` bridge) and a *coordinator* that reads the latent to choose the
18
- boss's next move. **The brain was trained by self-play reinforcement learning** — its
19
- tactics emerged from playing thousands of duels, nothing is scripted.
20
-
21
- You play the **Fire Knight**. Defeat the **Demon Slime**.
22
-
23
- ## How the Modular Mind works
24
-
25
- This is the [ModularMind-on-V2](https://github.com/your-username/ModularMind) concept at *specialist scale*: instead of one
26
- monolithic policy, six small networks each handle one concern and talk to each other
27
- through a latent channel.
28
-
29
- ```
30
- game state ─▶ ┌──────────────┐ each specialist emits a latent
31
- │ Aggressor │──┐ (LatentProjection) and, if it OWNS
32
- │ Stalker │──┤ an action, a "drive" for that action
33
- │ Survivor │──┤
34
- │ Baiter │──┤ ┌───────────────┐ ┌─────────────┐
35
- │ Punisher (M) │──┼─ sum ─▶│ RecursiveLink │─────▶│ Coordinator │─▶ action
36
- │ Enrage (M) │──┘ │ ReGLU+residual│ shared│ read-out │
37
- └──────────────┘ └───────────────┘ latent└─────────────┘
38
- ```
39
-
40
- - **Four action-owning specialists** push their move's score directly:
41
- **Aggressor → CLEAVE**, **Stalker → APPROACH**, **Survivor → RETREAT**, **Baiter → IDLE**.
42
- - **Two modulators (M)** — **Punisher** ("the player is open!") and **Enrage**
43
- ("we're low on HP — go berserk") — **own no action**. Their *only* way to affect the
44
- fight is the latent they write into the shared `RecursiveLink`, which the coordinator
45
- turns into modulation. So training has to *learn to use the latent channel* — the whole
46
- point of the architecture.
47
-
48
- The right-hand panel shows all of this live: each specialist's activity, the shared
49
- latent bridge, and the coordinator's modulation, for every decision the boss makes.
50
-
51
- ## What emerged from training
52
-
53
- Trained on a reward that values *dealing damage* and *pressuring in range* over
54
- playing it safe (landing a cleave ≫ whiffing, and stalling / staying out of range is
55
- penalised), the boss learned an **aggressive pressure** style:
56
-
57
- - **closes the distance** when you're far or at mid-range,
58
- - **cleaves on contact** — once you're in range and it's off cooldown it commits to a
59
- lunging swing essentially every time (verified: in-range attack-rate ≈ **0.8–1.0**),
60
- - **retreats only when it can't swing** (mid-cooldown) to reset spacing,
61
- - **blocks your punish** — a **Defender** specialist raises a guard (negating ~90% of
62
- your melee) when you swing at it and it can't cleave back,
63
- - **punishes your recovery** and gets **even more aggressive at low HP** — the Enrage
64
- modulator raises CLEAVE through the shared latent.
65
-
66
- It reaches a **~55–65% win rate** against a near-optimal scripted dodger (avg reward
67
- +8, up from −12 before the reward was tuned for aggression). Against a human it's a
68
- fair, readable fight: **dodge the red telegraph, then punish the recovery.**
69
-
70
- > The earlier version of the brain learned a degenerate "space forever, never commit"
71
- > policy that *technically* won but barely attacked — so the trainer now selects the
72
- > checkpoint on **win-rate + in-range attack-rate**, and a non-attacking policy can no
73
- > longer be saved. (`behavior()` in `train.py` measures this directly.)
74
-
75
- ## It learns from your fights (online finetuning)
76
-
77
- The model is tiny, so a gradient step is microseconds — the boss finetunes from real
78
- play **on the free CPU**. Each HARD-tier fight is logged (state, action, HP per boss
79
- decision) and POSTed to `/learn`; the server rebuilds per-decision rewards (damage
80
- dealt − taken, + kill / − death), computes REINFORCE returns, and takes **one Adam
81
- step** ([`mm_grad.py`](mm_grad.py), numpy backprop verified against PyTorch to ~1e-8).
82
- A frozen copy of the sim-trained weights anchors the update so it can't drift into
83
- nonsense; the adapted weights feed straight back into the live boss.
84
-
85
- - **On by default, in-memory.** Set `MM_ONLINE=0` to disable.
86
- - **Persistent across restarts:** add Space secrets `HF_TOKEN` (write) and
87
- `MM_DATASET_REPO` (e.g. `you/boss-fight-online`) and the adapted weights are pushed
88
- to / pulled from that Dataset. Only HARD-tier fights train (keeps the data on-policy).
89
-
90
- ## Difficulty = the trained brain's decision-noise
91
-
92
- The difficulty selector doesn't change the boss's stats — it runs the **same trained
93
- Modular Mind at a different *mistake-rate*** (`explore` = probability of a random legal
94
- action). vs a near-optimal scripted dodger:
95
-
96
- | Tier | Mistake-rate | Boss win-rate | Feel |
97
- |------|------|------|------|
98
- | **Easy** | 50% | ~0.35 | erratic, leaves big openings — beatable |
99
- | **Normal** | 22% | ~0.65 | competent pressure, occasional slip |
100
- | **Hard** | 4% | ~0.95 | closes in, blocks your punish, relentless |
101
-
102
- The browser sends the chosen tier with every decision; the server routes it through
103
- `modular_mind.decide` with that tier's mistake-rate. *(Originally Easy/Normal were
104
- genuinely-undertrained checkpoints — but once the boss learned to BLOCK it dominates
105
- the sim almost immediately, so there's no longer a weak checkpoint; decision-noise is
106
- the controllable, honest dial. `train.py` still snapshots checkpoints if you want them.)*
107
-
108
- ## Controls
109
-
110
- | Key | Action |
111
- |-----|--------|
112
- | ← → | Move |
113
- | Space | Roll / dodge (i-frames, costs stamina) |
114
- | J | Attack |
115
- | K (hold) | **Block** — cuts incoming damage to 20% and drains stamina; if stamina hits 0 your guard **breaks** into a stagger |
116
- | M / 🔊 | Mute / unmute music + SFX |
117
-
118
- Background music and combat SFX play during the fight (a random track per fight,
119
- plus per-action sound effects). Audio is served statically from `audio/`.
120
-
121
- You begin each fight with a one-time **Aegis shield** (a cyan bubble + the *Aegis*
122
- HUD bar): for the first few seconds it absorbs **all** damage so you can learn the
123
- controls. Once it fades it does **not** come back.
124
-
125
- > Click **Enter the Fog**, then click the game once so it has keyboard focus.
126
-
127
- ## Architecture / files
128
-
129
- | File | Role |
130
- |------|------|
131
- | `app.py` | Gradio Space: serves the game, exposes the trained brain at `/decide` |
132
- | `modular_mind.py` | **numpy** inference of the trained Modular Mind (no torch at runtime) |
133
- | `mm_torch.py` | the trainable Modular Mind (specialists + RecursiveLink + coordinator) |
134
- | `train.py` | self-play **REINFORCE** trainer → `mm_weights.npz` |
135
- | `mm_grad.py` | pure-numpy forward+backward (REINFORCE gradient), verified vs torch — the online learner |
136
- | `online.py` | finetunes the HARD brain from real player fights; optional HF-Dataset persistence |
137
- | `duel_sim.py` | the headless duel simulator (the RL environment) |
138
- | `features.py` | shared feature/action definitions (single source of truth) |
139
- | `web/` | the HTML5 canvas game (60fps render; calls `/decide` at decision points) |
140
- | `audio/` | background music (`mp3/`) and sound effects (`sfx/`), served statically |
141
- | `serve_local.py` | run the whole thing locally **without gradio** (stdlib + numpy) |
142
-
143
- The model is **tiny** (~4.5k parameters) and inference is pure numpy, so the Space
144
- needs only `gradio` + `numpy` and starts instantly.
145
-
146
- ## Run / retrain locally
147
-
148
- ```bash
149
- pip install -r requirements.txt # gradio + numpy (runtime)
150
- python app.py # the Space, locally
151
- # or, with no gradio at all:
152
- python serve_local.py # http://localhost:7861
153
-
154
- # retrain the boss (needs torch):
155
- pip install torch
156
- python train.py # -> mm_weights.npz (+ train_log.json)
157
- ```
158
-
159
- ## Credits
160
-
161
- - **Sprites:** *Fire Knight* and *Demon Slime* free asset packs by **LuizMelo**
162
- (itch.io). Please check their licenses for your own use.
163
- - **Audio:** background music tracks (`audio/mp3`) and combat SFX (`audio/sfx`).
164
- Check the licenses of your audio packs before publishing.
165
- - **Brain:** the *Modular Mind* concept (latent-communicating specialists via
166
- `RecursiveLink`), trained here at specialist scale by self-play RL.
167
-
168
- Built for a HuggingFace hackathon.