ModuleMind

Running on Zero

File size: 9,140 Bytes

9f0e510
a619f61
 
6e3b98a
 
9f0e510
6e3b98a
9f0e510
 
1b36a6d
6e3b98a
2ad3f9f
 
 
 
 
9f0e510
483de4f
 
6e3b98a

---
title: Modular Mind 🧠
emoji: 💭
colorFrom: green
colorTo: indigo
sdk: gradio
sdk_version: 6.15.2
app_file: app.py
pinned: false
short_description: We modulate the mind and communicate!
license: mit
tags:
  - track:wood
  - sponsor:modal
  - achievement:offgrid
  - achievement:welltuned
---
Social Media Post and Demo Video: https://www.linkedin.com/posts/dean-byrne-02a28b191_modular-mind-boss-fight-for-hugging-face-ugcPost-7472410483615084544-yUeC/?utm_source=share&utm_medium=member_desktop&rcm=ACoAAC0RumIBxlIKTkKv5tF-hb2OU7TdZ19kxcQ
Model Weights are in the Space's directory.
# ⚔️ Modular Mind: Boss Fight

A mini **Dark-Souls-style** duel where the boss is controlled by a **Modular Mind** —
a handful of tiny neural *specialists* that communicate through a **shared latent**
(a `RecursiveLink` bridge) and a *coordinator* that reads the latent to choose the
boss's next move. **The brain was trained by self-play reinforcement learning** — its
tactics emerged from playing thousands of duels, nothing is scripted.

You play the **Fire Knight**. Defeat the **Demon Slime**.

## How the Modular Mind works

This is the [ModularMind-on-V2](https://github.com/your-username/ModularMind) concept at *specialist scale*: instead of one
monolithic policy, six small networks each handle one concern and talk to each other
through a latent channel.

```
 game state ─▶ ┌──────────────┐   each specialist emits a latent
               │ Aggressor    │──┐  (LatentProjection) and, if it OWNS
               │ Stalker      │──┤   an action, a "drive" for that action
               │ Survivor     │──┤
               │ Baiter       │──┤        ┌───────────────┐      ┌─────────────┐
               │ Punisher (M) │──┼─ sum ─▶│ RecursiveLink │─────▶│ Coordinator │─▶ action
               │ Enrage   (M) │──┘        │ ReGLU+residual│ shared│  read-out   │
               └──────────────┘           └───────────────┘ latent└─────────────┘
```

- **Four action-owning specialists** push their move's score directly:
  **Aggressor → CLEAVE**, **Stalker → APPROACH**, **Survivor → RETREAT**, **Baiter → IDLE**.
- **Two modulators (M)** — **Punisher** ("the player is open!") and **Enrage**
  ("we're low on HP — go berserk") — **own no action**. Their *only* way to affect the
  fight is the latent they write into the shared `RecursiveLink`, which the coordinator
  turns into modulation. So training has to *learn to use the latent channel* — the whole
  point of the architecture.

The right-hand panel shows all of this live: each specialist's activity, the shared
latent bridge, and the coordinator's modulation, for every decision the boss makes.

## What emerged from training

Trained on a reward that values *dealing damage* and *pressuring in range* over
playing it safe (landing a cleave ≫ whiffing, and stalling / staying out of range is
penalised), the boss learned an **aggressive pressure** style:

- **closes the distance** when you're far or at mid-range,
- **cleaves on contact** — once you're in range and it's off cooldown it commits to a
  lunging swing essentially every time (verified: in-range attack-rate ≈ **0.8–1.0**),
- **retreats only when it can't swing** (mid-cooldown) to reset spacing,
- **blocks your punish** — a **Defender** specialist raises a guard (negating ~90% of
  your melee) when you swing at it and it can't cleave back,
- **punishes your recovery** and gets **even more aggressive at low HP** — the Enrage
  modulator raises CLEAVE through the shared latent.

It reaches a **~55–65% win rate** against a near-optimal scripted dodger (avg reward
+8, up from −12 before the reward was tuned for aggression). Against a human it's a
fair, readable fight: **dodge the red telegraph, then punish the recovery.**

> The earlier version of the brain learned a degenerate "space forever, never commit"
> policy that *technically* won but barely attacked — so the trainer now selects the
> checkpoint on **win-rate + in-range attack-rate**, and a non-attacking policy can no
> longer be saved. (`behavior()` in `train.py` measures this directly.)

## It learns from your fights (online finetuning)

The model is tiny, so a gradient step is microseconds — the boss finetunes from real
play **on the free CPU**. Each HARD-tier fight is logged (state, action, HP per boss
decision) and POSTed to `/learn`; the server rebuilds per-decision rewards (damage
dealt − taken, + kill / − death), computes REINFORCE returns, and takes **one Adam
step** ([`mm_grad.py`](mm_grad.py), numpy backprop verified against PyTorch to ~1e-8).
A frozen copy of the sim-trained weights anchors the update so it can't drift into
nonsense; the adapted weights feed straight back into the live boss.

- **On by default, in-memory.** Set `MM_ONLINE=0` to disable.
- **Persistent across restarts:** add Space secrets `HF_TOKEN` (write) and
  `MM_DATASET_REPO` (e.g. `you/boss-fight-online`) and the adapted weights are pushed
  to / pulled from that Dataset. Only HARD-tier fights train (keeps the data on-policy).

## Difficulty = the trained brain's decision-noise

The difficulty selector doesn't change the boss's stats — it runs the **same trained
Modular Mind at a different *mistake-rate*** (`explore` = probability of a random legal
action). vs a near-optimal scripted dodger:

| Tier | Mistake-rate | Boss win-rate | Feel |
|------|------|------|------|
| **Easy** | 50% | ~0.35 | erratic, leaves big openings — beatable |
| **Normal** | 22% | ~0.65 | competent pressure, occasional slip |
| **Hard** | 4% | ~0.95 | closes in, blocks your punish, relentless |

The browser sends the chosen tier with every decision; the server routes it through
`modular_mind.decide` with that tier's mistake-rate. *(Originally Easy/Normal were
genuinely-undertrained checkpoints — but once the boss learned to BLOCK it dominates
the sim almost immediately, so there's no longer a weak checkpoint; decision-noise is
the controllable, honest dial. `train.py` still snapshots checkpoints if you want them.)*

## Controls

| Key | Action |
|-----|--------|
| ← → | Move |
| Space | Roll / dodge (i-frames, costs stamina) |
| J | Attack |
| K (hold) | **Block** — cuts incoming damage to 20% and drains stamina; if stamina hits 0 your guard **breaks** into a stagger |
| M / 🔊 | Mute / unmute music + SFX |

Background music and combat SFX play during the fight (a random track per fight,
plus per-action sound effects). Audio is served statically from `audio/`.

You begin each fight with a one-time **Aegis shield** (a cyan bubble + the *Aegis*
HUD bar): for the first few seconds it absorbs **all** damage so you can learn the
controls. Once it fades it does **not** come back.

> Click **Enter the Fog**, then click the game once so it has keyboard focus.

## Architecture / files

| File | Role |
|------|------|
| `app.py` | Gradio Space: serves the game, exposes the trained brain at `/decide` |
| `modular_mind.py` | **numpy** inference of the trained Modular Mind (no torch at runtime) |
| `mm_torch.py` | the trainable Modular Mind (specialists + RecursiveLink + coordinator) |
| `train.py` | self-play **REINFORCE** trainer → `mm_weights.npz` |
| `mm_grad.py` | pure-numpy forward+backward (REINFORCE gradient), verified vs torch — the online learner |
| `online.py` | finetunes the HARD brain from real player fights; optional HF-Dataset persistence |
| `duel_sim.py` | the headless duel simulator (the RL environment) |
| `features.py` | shared feature/action definitions (single source of truth) |
| `web/` | the HTML5 canvas game (60fps render; calls `/decide` at decision points) |
| `audio/` | background music (`mp3/`) and sound effects (`sfx/`), served statically |
| `serve_local.py` | run the whole thing locally **without gradio** (stdlib + numpy) |

The model is **tiny** (~4.5k parameters) and inference is pure numpy, so the Space
needs only `gradio` + `numpy` and starts instantly.

## Run / retrain locally

```bash
pip install -r requirements.txt          # gradio + numpy (runtime)
python app.py                            # the Space, locally
# or, with no gradio at all:
python serve_local.py                    # http://localhost:7861

# retrain the boss (needs torch):
pip install torch
python train.py                          # -> mm_weights.npz  (+ train_log.json)
```

## Credits

- **Sprites:** *Fire Knight* and *Demon Slime* free asset packs by **LuizMelo**
  (itch.io). Please check their licenses for your own use.
- **Audio:** background music tracks (`audio/mp3`) and combat SFX (`audio/sfx`).
  Check the licenses of your audio packs before publishing.
- **Brain:** the *Modular Mind* concept (latent-communicating specialists via
  `RecursiveLink`), trained here at specialist scale by self-play RL.

Built for a HuggingFace hackathon.