ModuleMind

Running on Zero

App Files Files Community

ModuleMind / README.md

Quazim0t0

Update README.md

483de4f verified 2 days ago

preview code

Raw

History Blame Contribute Delete

9.14 kB

	---
	title: Modular Mind 🧠
	emoji: 💭
	colorFrom: green
	colorTo: indigo
	sdk: gradio
	sdk_version: 6.15.2
	app_file: app.py
	pinned: false
	short_description: We modulate the mind and communicate!
	license: mit
	tags:
	- track:wood
	- sponsor:modal
	- achievement:offgrid
	- achievement:welltuned
	---
	Social Media Post and Demo Video: https://www.linkedin.com/posts/dean-byrne-02a28b191_modular-mind-boss-fight-for-hugging-face-ugcPost-7472410483615084544-yUeC/?utm_source=share&utm_medium=member_desktop&rcm=ACoAAC0RumIBxlIKTkKv5tF-hb2OU7TdZ19kxcQ
	Model Weights are in the Space's directory.
	# ⚔️ Modular Mind: Boss Fight

	A mini Dark-Souls-style duel where the boss is controlled by a Modular Mind —
	a handful of tiny neural specialists that communicate through a shared latent
	(a `RecursiveLink` bridge) and a coordinator that reads the latent to choose the
	boss's next move. The brain was trained by self-play reinforcement learning — its
	tactics emerged from playing thousands of duels, nothing is scripted.

	You play the Fire Knight. Defeat the Demon Slime.

	## How the Modular Mind works

	This is the [ModularMind-on-V2](https://github.com/your-username/ModularMind) concept at specialist scale: instead of one
	monolithic policy, six small networks each handle one concern and talk to each other
	through a latent channel.

	```
	game state ─▶ ┌──────────────┐ each specialist emits a latent
	│ Aggressor │──┐ (LatentProjection) and, if it OWNS
	│ Stalker │──┤ an action, a "drive" for that action
	│ Survivor │──┤
	│ Baiter │──┤ ┌───────────────┐ ┌─────────────┐
	│ Punisher (M) │──┼─ sum ─▶│ RecursiveLink │─────▶│ Coordinator │─▶ action
	│ Enrage (M) │──┘ │ ReGLU+residual│ shared│ read-out │
	└──────────────┘ └───────────────┘ latent└─────────────┘
	```

	- Four action-owning specialists push their move's score directly:
	Aggressor → CLEAVE, Stalker → APPROACH, Survivor → RETREAT, Baiter → IDLE.
	- Two modulators (M) — Punisher ("the player is open!") and Enrage
	("we're low on HP — go berserk") — own no action. Their only way to affect the
	fight is the latent they write into the shared `RecursiveLink`, which the coordinator
	turns into modulation. So training has to learn to use the latent channel — the whole
	point of the architecture.

	The right-hand panel shows all of this live: each specialist's activity, the shared
	latent bridge, and the coordinator's modulation, for every decision the boss makes.

	## What emerged from training

	Trained on a reward that values dealing damage and pressuring in range over
	playing it safe (landing a cleave ≫ whiffing, and stalling / staying out of range is
	penalised), the boss learned an aggressive pressure style:

	- closes the distance when you're far or at mid-range,
	- cleaves on contact — once you're in range and it's off cooldown it commits to a
	lunging swing essentially every time (verified: in-range attack-rate ≈ 0.8–1.0),
	- retreats only when it can't swing (mid-cooldown) to reset spacing,
	- blocks your punish — a Defender specialist raises a guard (negating ~90% of
	your melee) when you swing at it and it can't cleave back,
	- punishes your recovery and gets even more aggressive at low HP — the Enrage
	modulator raises CLEAVE through the shared latent.

	It reaches a ~55–65% win rate against a near-optimal scripted dodger (avg reward
	+8, up from −12 before the reward was tuned for aggression). Against a human it's a
	fair, readable fight: dodge the red telegraph, then punish the recovery.

	> The earlier version of the brain learned a degenerate "space forever, never commit"
	> policy that technically won but barely attacked — so the trainer now selects the
	> checkpoint on win-rate + in-range attack-rate, and a non-attacking policy can no
	> longer be saved. (`behavior()` in `train.py` measures this directly.)

	## It learns from your fights (online finetuning)

	The model is tiny, so a gradient step is microseconds — the boss finetunes from real
	play on the free CPU. Each HARD-tier fight is logged (state, action, HP per boss
	decision) and POSTed to `/learn`; the server rebuilds per-decision rewards (damage
	dealt − taken, + kill / − death), computes REINFORCE returns, and takes **one Adam
	step** ([`mm_grad.py`](mm_grad.py), numpy backprop verified against PyTorch to ~1e-8).
	A frozen copy of the sim-trained weights anchors the update so it can't drift into
	nonsense; the adapted weights feed straight back into the live boss.

	- On by default, in-memory. Set `MM_ONLINE=0` to disable.
	- Persistent across restarts: add Space secrets `HF_TOKEN` (write) and
	`MM_DATASET_REPO` (e.g. `you/boss-fight-online`) and the adapted weights are pushed
	to / pulled from that Dataset. Only HARD-tier fights train (keeps the data on-policy).

	## Difficulty = the trained brain's decision-noise

	The difficulty selector doesn't change the boss's stats — it runs the **same trained
	Modular Mind at a different mistake-rate** (`explore` = probability of a random legal
	action). vs a near-optimal scripted dodger:

	\| Tier \| Mistake-rate \| Boss win-rate \| Feel \|
	\|------\|------\|------\|------\|
	\| Easy \| 50% \| ~0.35 \| erratic, leaves big openings — beatable \|
	\| Normal \| 22% \| ~0.65 \| competent pressure, occasional slip \|
	\| Hard \| 4% \| ~0.95 \| closes in, blocks your punish, relentless \|

	The browser sends the chosen tier with every decision; the server routes it through
	`modular_mind.decide` with that tier's mistake-rate. *(Originally Easy/Normal were
	genuinely-undertrained checkpoints — but once the boss learned to BLOCK it dominates
	the sim almost immediately, so there's no longer a weak checkpoint; decision-noise is
	the controllable, honest dial. `train.py` still snapshots checkpoints if you want them.)*

	## Controls

	\| Key \| Action \|
	\|-----\|--------\|
	\| ← → \| Move \|
	\| Space \| Roll / dodge (i-frames, costs stamina) \|
	\| J \| Attack \|
	\| K (hold) \| Block — cuts incoming damage to 20% and drains stamina; if stamina hits 0 your guard breaks into a stagger \|
	\| M / 🔊 \| Mute / unmute music + SFX \|

	Background music and combat SFX play during the fight (a random track per fight,
	plus per-action sound effects). Audio is served statically from `audio/`.

	You begin each fight with a one-time Aegis shield (a cyan bubble + the Aegis
	HUD bar): for the first few seconds it absorbs all damage so you can learn the
	controls. Once it fades it does not come back.

	> Click Enter the Fog, then click the game once so it has keyboard focus.

	## Architecture / files

	\| File \| Role \|
	\|------\|------\|
	\| `app.py` \| Gradio Space: serves the game, exposes the trained brain at `/decide` \|
	\| `modular_mind.py` \| numpy inference of the trained Modular Mind (no torch at runtime) \|
	\| `mm_torch.py` \| the trainable Modular Mind (specialists + RecursiveLink + coordinator) \|
	\| `train.py` \| self-play REINFORCE trainer → `mm_weights.npz` \|
	\| `mm_grad.py` \| pure-numpy forward+backward (REINFORCE gradient), verified vs torch — the online learner \|
	\| `online.py` \| finetunes the HARD brain from real player fights; optional HF-Dataset persistence \|
	\| `duel_sim.py` \| the headless duel simulator (the RL environment) \|
	\| `features.py` \| shared feature/action definitions (single source of truth) \|
	\| `web/` \| the HTML5 canvas game (60fps render; calls `/decide` at decision points) \|
	\| `audio/` \| background music (`mp3/`) and sound effects (`sfx/`), served statically \|
	\| `serve_local.py` \| run the whole thing locally without gradio (stdlib + numpy) \|

	The model is tiny (~4.5k parameters) and inference is pure numpy, so the Space
	needs only `gradio` + `numpy` and starts instantly.

	## Run / retrain locally

	```bash
	pip install -r requirements.txt # gradio + numpy (runtime)
	python app.py # the Space, locally
	# or, with no gradio at all:
	python serve_local.py # http://localhost:7861

	# retrain the boss (needs torch):
	pip install torch
	python train.py # -> mm_weights.npz (+ train_log.json)
	```

	## Credits

	- Sprites: Fire Knight and Demon Slime free asset packs by LuizMelo
	(itch.io). Please check their licenses for your own use.
	- Audio: background music tracks (`audio/mp3`) and combat SFX (`audio/sfx`).
	Check the licenses of your audio packs before publishing.
	- Brain: the Modular Mind concept (latent-communicating specialists via
	`RecursiveLink`), trained here at specialist scale by self-play RL.

	Built for a HuggingFace hackathon.