Spaces:
Running on Zero
A newer version of the Gradio SDK is available: 6.18.0
title: Modular Mind π§
emoji: π
colorFrom: green
colorTo: indigo
sdk: gradio
sdk_version: 6.15.2
app_file: app.py
pinned: false
short_description: We modulate the mind and communicate!
license: mit
tags:
- track:wood
- sponsor:modal
- achievement:offgrid
- achievement:welltuned
Social Media Post and Demo Video: https://www.linkedin.com/posts/dean-byrne-02a28b191_modular-mind-boss-fight-for-hugging-face-ugcPost-7472410483615084544-yUeC/?utm_source=share&utm_medium=member_desktop&rcm=ACoAAC0RumIBxlIKTkKv5tF-hb2OU7TdZ19kxcQ Model Weights are in the Space's directory.
βοΈ Modular Mind: Boss Fight
A mini Dark-Souls-style duel where the boss is controlled by a Modular Mind β
a handful of tiny neural specialists that communicate through a shared latent
(a RecursiveLink bridge) and a coordinator that reads the latent to choose the
boss's next move. The brain was trained by self-play reinforcement learning β its
tactics emerged from playing thousands of duels, nothing is scripted.
You play the Fire Knight. Defeat the Demon Slime.
How the Modular Mind works
This is the ModularMind-on-V2 concept at specialist scale: instead of one monolithic policy, six small networks each handle one concern and talk to each other through a latent channel.
game state ββΆ ββββββββββββββββ each specialist emits a latent
β Aggressor ββββ (LatentProjection) and, if it OWNS
β Stalker ββββ€ an action, a "drive" for that action
β Survivor ββββ€
β Baiter ββββ€ βββββββββββββββββ βββββββββββββββ
β Punisher (M) ββββΌβ sum ββΆβ RecursiveLink βββββββΆβ Coordinator βββΆ action
β Enrage (M) ββββ β ReGLU+residualβ sharedβ read-out β
ββββββββββββββββ βββββββββββββββββ latentβββββββββββββββ
- Four action-owning specialists push their move's score directly: Aggressor β CLEAVE, Stalker β APPROACH, Survivor β RETREAT, Baiter β IDLE.
- Two modulators (M) β Punisher ("the player is open!") and Enrage
("we're low on HP β go berserk") β own no action. Their only way to affect the
fight is the latent they write into the shared
RecursiveLink, which the coordinator turns into modulation. So training has to learn to use the latent channel β the whole point of the architecture.
The right-hand panel shows all of this live: each specialist's activity, the shared latent bridge, and the coordinator's modulation, for every decision the boss makes.
What emerged from training
Trained on a reward that values dealing damage and pressuring in range over playing it safe (landing a cleave β« whiffing, and stalling / staying out of range is penalised), the boss learned an aggressive pressure style:
- closes the distance when you're far or at mid-range,
- cleaves on contact β once you're in range and it's off cooldown it commits to a lunging swing essentially every time (verified: in-range attack-rate β 0.8β1.0),
- retreats only when it can't swing (mid-cooldown) to reset spacing,
- blocks your punish β a Defender specialist raises a guard (negating ~90% of your melee) when you swing at it and it can't cleave back,
- punishes your recovery and gets even more aggressive at low HP β the Enrage modulator raises CLEAVE through the shared latent.
It reaches a ~55β65% win rate against a near-optimal scripted dodger (avg reward +8, up from β12 before the reward was tuned for aggression). Against a human it's a fair, readable fight: dodge the red telegraph, then punish the recovery.
The earlier version of the brain learned a degenerate "space forever, never commit" policy that technically won but barely attacked β so the trainer now selects the checkpoint on win-rate + in-range attack-rate, and a non-attacking policy can no longer be saved. (
behavior()intrain.pymeasures this directly.)
It learns from your fights (online finetuning)
The model is tiny, so a gradient step is microseconds β the boss finetunes from real
play on the free CPU. Each HARD-tier fight is logged (state, action, HP per boss
decision) and POSTed to /learn; the server rebuilds per-decision rewards (damage
dealt β taken, + kill / β death), computes REINFORCE returns, and takes one Adam
step (mm_grad.py, numpy backprop verified against PyTorch to ~1e-8).
A frozen copy of the sim-trained weights anchors the update so it can't drift into
nonsense; the adapted weights feed straight back into the live boss.
- On by default, in-memory. Set
MM_ONLINE=0to disable. - Persistent across restarts: add Space secrets
HF_TOKEN(write) andMM_DATASET_REPO(e.g.you/boss-fight-online) and the adapted weights are pushed to / pulled from that Dataset. Only HARD-tier fights train (keeps the data on-policy).
Difficulty = the trained brain's decision-noise
The difficulty selector doesn't change the boss's stats β it runs the same trained
Modular Mind at a different mistake-rate (explore = probability of a random legal
action). vs a near-optimal scripted dodger:
| Tier | Mistake-rate | Boss win-rate | Feel |
|---|---|---|---|
| Easy | 50% | ~0.35 | erratic, leaves big openings β beatable |
| Normal | 22% | ~0.65 | competent pressure, occasional slip |
| Hard | 4% | ~0.95 | closes in, blocks your punish, relentless |
The browser sends the chosen tier with every decision; the server routes it through
modular_mind.decide with that tier's mistake-rate. (Originally Easy/Normal were
genuinely-undertrained checkpoints β but once the boss learned to BLOCK it dominates
the sim almost immediately, so there's no longer a weak checkpoint; decision-noise is
the controllable, honest dial. train.py still snapshots checkpoints if you want them.)
Controls
| Key | Action |
|---|---|
| β β | Move |
| Space | Roll / dodge (i-frames, costs stamina) |
| J | Attack |
| K (hold) | Block β cuts incoming damage to 20% and drains stamina; if stamina hits 0 your guard breaks into a stagger |
| M / π | Mute / unmute music + SFX |
Background music and combat SFX play during the fight (a random track per fight,
plus per-action sound effects). Audio is served statically from audio/.
You begin each fight with a one-time Aegis shield (a cyan bubble + the Aegis HUD bar): for the first few seconds it absorbs all damage so you can learn the controls. Once it fades it does not come back.
Click Enter the Fog, then click the game once so it has keyboard focus.
Architecture / files
| File | Role |
|---|---|
app.py |
Gradio Space: serves the game, exposes the trained brain at /decide |
modular_mind.py |
numpy inference of the trained Modular Mind (no torch at runtime) |
mm_torch.py |
the trainable Modular Mind (specialists + RecursiveLink + coordinator) |
train.py |
self-play REINFORCE trainer β mm_weights.npz |
mm_grad.py |
pure-numpy forward+backward (REINFORCE gradient), verified vs torch β the online learner |
online.py |
finetunes the HARD brain from real player fights; optional HF-Dataset persistence |
duel_sim.py |
the headless duel simulator (the RL environment) |
features.py |
shared feature/action definitions (single source of truth) |
web/ |
the HTML5 canvas game (60fps render; calls /decide at decision points) |
audio/ |
background music (mp3/) and sound effects (sfx/), served statically |
serve_local.py |
run the whole thing locally without gradio (stdlib + numpy) |
The model is tiny (~4.5k parameters) and inference is pure numpy, so the Space
needs only gradio + numpy and starts instantly.
Run / retrain locally
pip install -r requirements.txt # gradio + numpy (runtime)
python app.py # the Space, locally
# or, with no gradio at all:
python serve_local.py # http://localhost:7861
# retrain the boss (needs torch):
pip install torch
python train.py # -> mm_weights.npz (+ train_log.json)
Credits
- Sprites: Fire Knight and Demon Slime free asset packs by LuizMelo (itch.io). Please check their licenses for your own use.
- Audio: background music tracks (
audio/mp3) and combat SFX (audio/sfx). Check the licenses of your audio packs before publishing. - Brain: the Modular Mind concept (latent-communicating specialists via
RecursiveLink), trained here at specialist scale by self-play RL.
Built for a HuggingFace hackathon.