ModuleMind

Running on Zero

App Files Files Community

Quazim0t0 commited on 6 days ago

Commit

cd0afc9

verified ·

1 Parent(s): 1b36a6d

Delete web/README.md

Browse files

Files changed (1) hide show

web/README.md +0 -168

web/README.md DELETED Viewed

@@ -1,168 +0,0 @@
----
-title: "Quazim0t0's 🍄 Thousand Token Wood Entry"
-emoji: 🍄
-colorFrom: green
-colorTo: indigo
-sdk: gradio
-sdk_version: 6.15.2
-app_file: app.py
-pinned: false
-license: mit
----
-# ⚔️ Modular Mind: Boss Fight
-A mini **Dark-Souls-style** duel where the boss is controlled by a **Modular Mind** —
-a handful of tiny neural *specialists* that communicate through a **shared latent**
-(a `RecursiveLink` bridge) and a *coordinator* that reads the latent to choose the
-boss's next move. **The brain was trained by self-play reinforcement learning** — its
-tactics emerged from playing thousands of duels, nothing is scripted.
-You play the **Fire Knight**. Defeat the **Demon Slime**.
-## How the Modular Mind works
-This is the [ModularMind-on-V2](https://github.com/your-username/ModularMind) concept at *specialist scale*: instead of one
-monolithic policy, six small networks each handle one concern and talk to each other
-through a latent channel.
-```
- game state ─▶ ┌──────────────┐   each specialist emits a latent
-               │ Aggressor    │──┐  (LatentProjection) and, if it OWNS
-               │ Stalker      │──┤   an action, a "drive" for that action
-               │ Survivor     │──┤
-               │ Baiter       │──┤        ┌───────────────┐      ┌─────────────┐
-               │ Punisher (M) │──┼─ sum ─▶│ RecursiveLink │─────▶│ Coordinator │─▶ action
-               │ Enrage   (M) │──┘        │ ReGLU+residual│ shared│  read-out   │
-               └──────────────┘           └───────────────┘ latent└─────────────┘
-```
-- **Four action-owning specialists** push their move's score directly:
-  **Aggressor → CLEAVE**, **Stalker → APPROACH**, **Survivor → RETREAT**, **Baiter → IDLE**.
-- **Two modulators (M)** — **Punisher** ("the player is open!") and **Enrage**
-  ("we're low on HP — go berserk") — **own no action**. Their *only* way to affect the
-  fight is the latent they write into the shared `RecursiveLink`, which the coordinator
-  turns into modulation. So training has to *learn to use the latent channel* — the whole
-  point of the architecture.
-The right-hand panel shows all of this live: each specialist's activity, the shared
-latent bridge, and the coordinator's modulation, for every decision the boss makes.
-## What emerged from training
-Trained on a reward that values *dealing damage* and *pressuring in range* over
-playing it safe (landing a cleave ≫ whiffing, and stalling / staying out of range is
-penalised), the boss learned an **aggressive pressure** style:
-- **closes the distance** when you're far or at mid-range,
-- **cleaves on contact** — once you're in range and it's off cooldown it commits to a
-  lunging swing essentially every time (verified: in-range attack-rate ≈ **0.8–1.0**),
-- **retreats only when it can't swing** (mid-cooldown) to reset spacing,
-- **blocks your punish** — a **Defender** specialist raises a guard (negating ~90% of
-  your melee) when you swing at it and it can't cleave back,
-- **punishes your recovery** and gets **even more aggressive at low HP** — the Enrage
-  modulator raises CLEAVE through the shared latent.
-It reaches a **~55–65% win rate** against a near-optimal scripted dodger (avg reward
-+8, up from −12 before the reward was tuned for aggression). Against a human it's a
-fair, readable fight: **dodge the red telegraph, then punish the recovery.**
-> The earlier version of the brain learned a degenerate "space forever, never commit"
-> policy that *technically* won but barely attacked — so the trainer now selects the
-> checkpoint on **win-rate + in-range attack-rate**, and a non-attacking policy can no
-> longer be saved. (`behavior()` in `train.py` measures this directly.)
-## It learns from your fights (online finetuning)
-The model is tiny, so a gradient step is microseconds — the boss finetunes from real
-play **on the free CPU**. Each HARD-tier fight is logged (state, action, HP per boss
-decision) and POSTed to `/learn`; the server rebuilds per-decision rewards (damage
-dealt − taken, + kill / − death), computes REINFORCE returns, and takes **one Adam
-step** ([`mm_grad.py`](mm_grad.py), numpy backprop verified against PyTorch to ~1e-8).
-A frozen copy of the sim-trained weights anchors the update so it can't drift into
-nonsense; the adapted weights feed straight back into the live boss.
-- **On by default, in-memory.** Set `MM_ONLINE=0` to disable.
-- **Persistent across restarts:** add Space secrets `HF_TOKEN` (write) and
-  `MM_DATASET_REPO` (e.g. `you/boss-fight-online`) and the adapted weights are pushed
-  to / pulled from that Dataset. Only HARD-tier fights train (keeps the data on-policy).
-## Difficulty = the trained brain's decision-noise
-The difficulty selector doesn't change the boss's stats — it runs the **same trained
-Modular Mind at a different *mistake-rate*** (`explore` = probability of a random legal
-action). vs a near-optimal scripted dodger:
-| Tier | Mistake-rate | Boss win-rate | Feel |
-|------|------|------|------|
-| **Easy** | 50% | ~0.35 | erratic, leaves big openings — beatable |
-| **Normal** | 22% | ~0.65 | competent pressure, occasional slip |
-| **Hard** | 4% | ~0.95 | closes in, blocks your punish, relentless |
-The browser sends the chosen tier with every decision; the server routes it through
-`modular_mind.decide` with that tier's mistake-rate. *(Originally Easy/Normal were
-genuinely-undertrained checkpoints — but once the boss learned to BLOCK it dominates
-the sim almost immediately, so there's no longer a weak checkpoint; decision-noise is
-the controllable, honest dial. `train.py` still snapshots checkpoints if you want them.)*
-## Controls
-| Key | Action |
-|-----|--------|
-| ← → | Move |
-| Space | Roll / dodge (i-frames, costs stamina) |
-| J | Attack |
-| K (hold) | **Block** — cuts incoming damage to 20% and drains stamina; if stamina hits 0 your guard **breaks** into a stagger |
-| M / 🔊 | Mute / unmute music + SFX |
-Background music and combat SFX play during the fight (a random track per fight,
-plus per-action sound effects). Audio is served statically from `audio/`.
-You begin each fight with a one-time **Aegis shield** (a cyan bubble + the *Aegis*
-HUD bar): for the first few seconds it absorbs **all** damage so you can learn the
-controls. Once it fades it does **not** come back.
-> Click **Enter the Fog**, then click the game once so it has keyboard focus.
-## Architecture / files
-| File | Role |
-|------|------|
-| `app.py` | Gradio Space: serves the game, exposes the trained brain at `/decide` |
-| `modular_mind.py` | **numpy** inference of the trained Modular Mind (no torch at runtime) |
-| `mm_torch.py` | the trainable Modular Mind (specialists + RecursiveLink + coordinator) |
-| `train.py` | self-play **REINFORCE** trainer → `mm_weights.npz` |
-| `mm_grad.py` | pure-numpy forward+backward (REINFORCE gradient), verified vs torch — the online learner |
-| `online.py` | finetunes the HARD brain from real player fights; optional HF-Dataset persistence |
-| `duel_sim.py` | the headless duel simulator (the RL environment) |
-| `features.py` | shared feature/action definitions (single source of truth) |
-| `web/` | the HTML5 canvas game (60fps render; calls `/decide` at decision points) |
-| `audio/` | background music (`mp3/`) and sound effects (`sfx/`), served statically |
-| `serve_local.py` | run the whole thing locally **without gradio** (stdlib + numpy) |
-The model is **tiny** (~4.5k parameters) and inference is pure numpy, so the Space
-needs only `gradio` + `numpy` and starts instantly.
-## Run / retrain locally
-```bash
-pip install -r requirements.txt          # gradio + numpy (runtime)
-python app.py                            # the Space, locally
-# or, with no gradio at all:
-python serve_local.py                    # http://localhost:7861
-# retrain the boss (needs torch):
-pip install torch
-python train.py                          # -> mm_weights.npz  (+ train_log.json)
-```
-## Credits
-- **Sprites:** *Fire Knight* and *Demon Slime* free asset packs by **LuizMelo**
-  (itch.io). Please check their licenses for your own use.
-- **Audio:** background music tracks (`audio/mp3`) and combat SFX (`audio/sfx`).
-  Check the licenses of your audio packs before publishing.
-- **Brain:** the *Modular Mind* concept (latent-communicating specialists via
-  `RecursiveLink`), trained here at specialist scale by self-play RL.
-Built for a HuggingFace hackathon.