ModuleMind / README.md
Quazim0t0's picture
Update README.md
483de4f verified
|
Raw
History Blame Contribute Delete
9.14 kB

A newer version of the Gradio SDK is available: 6.18.0

Upgrade
metadata
title: Modular Mind 🧠
emoji: πŸ’­
colorFrom: green
colorTo: indigo
sdk: gradio
sdk_version: 6.15.2
app_file: app.py
pinned: false
short_description: We modulate the mind and communicate!
license: mit
tags:
  - track:wood
  - sponsor:modal
  - achievement:offgrid
  - achievement:welltuned

Social Media Post and Demo Video: https://www.linkedin.com/posts/dean-byrne-02a28b191_modular-mind-boss-fight-for-hugging-face-ugcPost-7472410483615084544-yUeC/?utm_source=share&utm_medium=member_desktop&rcm=ACoAAC0RumIBxlIKTkKv5tF-hb2OU7TdZ19kxcQ Model Weights are in the Space's directory.

βš”οΈ Modular Mind: Boss Fight

A mini Dark-Souls-style duel where the boss is controlled by a Modular Mind β€” a handful of tiny neural specialists that communicate through a shared latent (a RecursiveLink bridge) and a coordinator that reads the latent to choose the boss's next move. The brain was trained by self-play reinforcement learning β€” its tactics emerged from playing thousands of duels, nothing is scripted.

You play the Fire Knight. Defeat the Demon Slime.

How the Modular Mind works

This is the ModularMind-on-V2 concept at specialist scale: instead of one monolithic policy, six small networks each handle one concern and talk to each other through a latent channel.

 game state ─▢ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   each specialist emits a latent
               β”‚ Aggressor    │──┐  (LatentProjection) and, if it OWNS
               β”‚ Stalker      │───   an action, a "drive" for that action
               β”‚ Survivor     │───
               β”‚ Baiter       │───        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
               β”‚ Punisher (M) │──┼─ sum ─▢│ RecursiveLink │─────▢│ Coordinator │─▢ action
               β”‚ Enrage   (M) β”‚β”€β”€β”˜        β”‚ ReGLU+residualβ”‚ sharedβ”‚  read-out   β”‚
               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ latentβ””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  • Four action-owning specialists push their move's score directly: Aggressor β†’ CLEAVE, Stalker β†’ APPROACH, Survivor β†’ RETREAT, Baiter β†’ IDLE.
  • Two modulators (M) β€” Punisher ("the player is open!") and Enrage ("we're low on HP β€” go berserk") β€” own no action. Their only way to affect the fight is the latent they write into the shared RecursiveLink, which the coordinator turns into modulation. So training has to learn to use the latent channel β€” the whole point of the architecture.

The right-hand panel shows all of this live: each specialist's activity, the shared latent bridge, and the coordinator's modulation, for every decision the boss makes.

What emerged from training

Trained on a reward that values dealing damage and pressuring in range over playing it safe (landing a cleave ≫ whiffing, and stalling / staying out of range is penalised), the boss learned an aggressive pressure style:

  • closes the distance when you're far or at mid-range,
  • cleaves on contact β€” once you're in range and it's off cooldown it commits to a lunging swing essentially every time (verified: in-range attack-rate β‰ˆ 0.8–1.0),
  • retreats only when it can't swing (mid-cooldown) to reset spacing,
  • blocks your punish β€” a Defender specialist raises a guard (negating ~90% of your melee) when you swing at it and it can't cleave back,
  • punishes your recovery and gets even more aggressive at low HP β€” the Enrage modulator raises CLEAVE through the shared latent.

It reaches a ~55–65% win rate against a near-optimal scripted dodger (avg reward +8, up from βˆ’12 before the reward was tuned for aggression). Against a human it's a fair, readable fight: dodge the red telegraph, then punish the recovery.

The earlier version of the brain learned a degenerate "space forever, never commit" policy that technically won but barely attacked β€” so the trainer now selects the checkpoint on win-rate + in-range attack-rate, and a non-attacking policy can no longer be saved. (behavior() in train.py measures this directly.)

It learns from your fights (online finetuning)

The model is tiny, so a gradient step is microseconds β€” the boss finetunes from real play on the free CPU. Each HARD-tier fight is logged (state, action, HP per boss decision) and POSTed to /learn; the server rebuilds per-decision rewards (damage dealt βˆ’ taken, + kill / βˆ’ death), computes REINFORCE returns, and takes one Adam step (mm_grad.py, numpy backprop verified against PyTorch to ~1e-8). A frozen copy of the sim-trained weights anchors the update so it can't drift into nonsense; the adapted weights feed straight back into the live boss.

  • On by default, in-memory. Set MM_ONLINE=0 to disable.
  • Persistent across restarts: add Space secrets HF_TOKEN (write) and MM_DATASET_REPO (e.g. you/boss-fight-online) and the adapted weights are pushed to / pulled from that Dataset. Only HARD-tier fights train (keeps the data on-policy).

Difficulty = the trained brain's decision-noise

The difficulty selector doesn't change the boss's stats β€” it runs the same trained Modular Mind at a different mistake-rate (explore = probability of a random legal action). vs a near-optimal scripted dodger:

Tier Mistake-rate Boss win-rate Feel
Easy 50% ~0.35 erratic, leaves big openings β€” beatable
Normal 22% ~0.65 competent pressure, occasional slip
Hard 4% ~0.95 closes in, blocks your punish, relentless

The browser sends the chosen tier with every decision; the server routes it through modular_mind.decide with that tier's mistake-rate. (Originally Easy/Normal were genuinely-undertrained checkpoints β€” but once the boss learned to BLOCK it dominates the sim almost immediately, so there's no longer a weak checkpoint; decision-noise is the controllable, honest dial. train.py still snapshots checkpoints if you want them.)

Controls

Key Action
← β†’ Move
Space Roll / dodge (i-frames, costs stamina)
J Attack
K (hold) Block β€” cuts incoming damage to 20% and drains stamina; if stamina hits 0 your guard breaks into a stagger
M / πŸ”Š Mute / unmute music + SFX

Background music and combat SFX play during the fight (a random track per fight, plus per-action sound effects). Audio is served statically from audio/.

You begin each fight with a one-time Aegis shield (a cyan bubble + the Aegis HUD bar): for the first few seconds it absorbs all damage so you can learn the controls. Once it fades it does not come back.

Click Enter the Fog, then click the game once so it has keyboard focus.

Architecture / files

File Role
app.py Gradio Space: serves the game, exposes the trained brain at /decide
modular_mind.py numpy inference of the trained Modular Mind (no torch at runtime)
mm_torch.py the trainable Modular Mind (specialists + RecursiveLink + coordinator)
train.py self-play REINFORCE trainer β†’ mm_weights.npz
mm_grad.py pure-numpy forward+backward (REINFORCE gradient), verified vs torch β€” the online learner
online.py finetunes the HARD brain from real player fights; optional HF-Dataset persistence
duel_sim.py the headless duel simulator (the RL environment)
features.py shared feature/action definitions (single source of truth)
web/ the HTML5 canvas game (60fps render; calls /decide at decision points)
audio/ background music (mp3/) and sound effects (sfx/), served statically
serve_local.py run the whole thing locally without gradio (stdlib + numpy)

The model is tiny (~4.5k parameters) and inference is pure numpy, so the Space needs only gradio + numpy and starts instantly.

Run / retrain locally

pip install -r requirements.txt          # gradio + numpy (runtime)
python app.py                            # the Space, locally
# or, with no gradio at all:
python serve_local.py                    # http://localhost:7861

# retrain the boss (needs torch):
pip install torch
python train.py                          # -> mm_weights.npz  (+ train_log.json)

Credits

  • Sprites: Fire Knight and Demon Slime free asset packs by LuizMelo (itch.io). Please check their licenses for your own use.
  • Audio: background music tracks (audio/mp3) and combat SFX (audio/sfx). Check the licenses of your audio packs before publishing.
  • Brain: the Modular Mind concept (latent-communicating specialists via RecursiveLink), trained here at specialist scale by self-play RL.

Built for a HuggingFace hackathon.