ghostexec / README.md
modelbuilderhq's picture
Upload folder using huggingface_hub
8c627b1 verified
---
title: Ghostexec Environment Server
emoji: πŸ“’
colorFrom: pink
colorTo: yellow
sdk: docker
pinned: false
app_port: 7860
base_path: /web
tags:
- openenv
---
# Ghostexec: The AI Chief-of-Staff Environment
Ghostexec is an [OpenEnv](https://github.com/meta-pytorch/OpenEnv)-compliant environment where an LLM acts as an executive chief-of-staff under pressure: triaging inbox crises, resolving calendar conflicts, protecting stakeholder relationships, and finishing critical tasks.
The agent gets a dense plain-text briefing, takes one structured action, and is scored on three coupled dimensions: conflict reduction, relationship quality, and task progress.
## Submission Package
| Item | Link |
|------|------|
| Public HF Space (required) | [modelbuilderhq/ghostexec](https://huggingface.co/spaces/modelbuilderhq/ghostexec) |
| OpenEnv manifest | [`openenv.yaml`](openenv.yaml) |
| Training notebook (Colab-ready) | [`notebooks/ghostexec_unsloth_grpo_hf_api.ipynb`](notebooks/ghostexec_unsloth_grpo_hf_api.ipynb) |
| Minimal training script (Unsloth + TRL) | [`scripts/train_sft_then_grpo.py`](scripts/train_sft_then_grpo.py) |
| Mini-blog (required) | [**BLOG.md on Hugging Face**](https://huggingface.co/spaces/modelbuilderhq/ghostexec/blob/main/BLOG.md) |
| Demo video <2 minutes (required) | [**YouTube β€” Ghostexec demo**](https://youtu.be/g4IFZMEzfO8) |
## Why This Environment Is Competitive
- **Novel task composition**: combines language-heavy triage, social reasoning, scheduling constraints, and deadline management in a single trainable loop.
- **Non-trivial behavior**: valid JSON is necessary but not sufficient; the policy must choose useful actions on the right entity ids at the right time.
- **Dynamic world model**: mood shifts, conflict rebuilds, overdue penalties, and scenario drift events force adaptation over a trajectory.
- **Trainable reward signal**: dense step reward for learning plus bounded graders for evaluation.
- **Hackathon fit**: fully OpenEnv-packaged, hostable on HF Spaces, with reproducible training and visible before/after evidence.
### 1) Our Inovation
- The observation is a realistic text briefing, not a toy tabular state dump.
- Actions are schema-bound (`GhostexecAction`) and validated against live world ids.
- The world evolves after each step (conflict graph, stress, mood, time shifts).
- Drift events in scenario data test robustness to changing conditions.
**Task ladder**
| Task ID | Difficulty | Scenario |
|---------|------------|----------|
| `phase2_core` | easy | `scenarios/phase2_core.json` |
| `monday_morning` | medium | `scenarios/monday_morning.json` |
| `dinner_disaster` | hard | `scenarios/dinner_disaster.json` |
### 2) Overview
Ghostexec tells a familiar high-stakes story: too many urgent asks, not enough time, and every action has social + operational consequences.
The demo is easy to follow:
1. show the same briefing the model sees,
2. compare weak vs better action choice,
3. show reward movement and policy behavior improvements.
### 3) Improvement in Rewards
The repo includes persisted training artifacts and plot outputs:
- `output/reward_curve.png`
- `output/loss_curve.png`
- `output/baseline_comparison.png`
**Training evidence plots**
![Reward curve](output/reward_curve.png)
*Reward trend across training progression.*
![Loss curve](output/loss_curve.png)
*SFT/GRPO training loss over optimization steps.*
![Baseline comparison](output/baseline_comparison.png)
*Random vs frozen vs trained policy mean episode reward.*
**Current before/after metrics (from saved artifacts)**
| Metric | Baseline | Trained |
|--------|----------|---------|
| Mean step reward | `0.145` | `0.257` |
| Invalid action rate | `Not logged in saved artifacts` | `Not logged in saved artifacts` |
| Grader score | `Not logged in saved artifacts` | `Not logged in saved artifacts` |
### 4) Reward and Training Pipeline
Ghostexec uses a coherent weighted reward core plus bounded shaping:
\[
\text{weighted\_base} = 0.35 \cdot \text{conflict} + 0.35 \cdot \text{relationship} + 0.30 \cdot \text{task}
\]
Then applies structured adjustments (invalid-action penalties, do-nothing pressure, completion/catastrophic terms) with transparent breakdown fields.
Training is end-to-end and environment-connected (not static-only): SFT warm start, then GRPO with environment reward plus local shaping functions.
## Quick Start
```bash
uv sync
uv run server --port 8000
```
Python client example:
```python
from ghostexec import GhostexecAction, GhostexecEnv
with GhostexecEnv(base_url="http://127.0.0.1:8000") as env:
out = env.reset()
print(out.observation.echoed_message[:400], "...")
step = env.step(
GhostexecAction(
action_type="reply_email",
email_id="e01",
message_body="Acknowledged. Sending concise revised update before noon.",
)
)
print("reward:", step.reward)
```
## Reproducible Training Commands
```bash
uv run python scripts/train_sft_then_grpo.py \
--model-preset small_iter_fast \
--training-preset hackathon_turbo \
--env-url http://127.0.0.1:8000 \
--generate-sft-from-env \
--sft-samples 120 \
--max-sft-steps 60 \
--max-grpo-steps 120 \
--env-reward-scale 1.0 \
--local-reward-scale 0.35 \
--complexity-curriculum easy_to_full \
--curriculum-ramp-ratio 0.60
```
Generate post-train plots:
```bash
uv run python scripts/plot_training_report.py \
--trainer-history outputs/trainer_state.json \
--reward-csv outputs/reward_log.csv \
--baselines-json outputs/compliance_manifest.json \
--out-dir output
```
## OpenEnv and Space Deployment
```bash
openenv serve
openenv build
openenv validate --verbose
openenv push
```
If needed:
```bash
openenv push --repo-id your-username/ghostexec
```
## Environment API and Contract
- Core endpoints: `/reset`, `/step`, `/state`, `/schema`, `/health`, `/docs`, `/ws`
- Observation contains:
- `echoed_message` (plain-text briefing),
- optional metadata (step validity, reward breakdown, ids).
- Action schema: see `GhostexecAction` in [`models.py`](models.py).
Supported `action_type` values:
- `reply_email`
- `archive_email`
- `reschedule_meeting`
- `cancel_meeting`
- `complete_task`
- `delegate_task`
- `send_message`
- `do_nothing`
## Submission Readiness Checklist
- [x] OpenEnv latest-compatible environment with valid `openenv.yaml`
- [x] Public HF Space deployed and reachable
- [x] Minimal trainable script using Unsloth + TRL
- [x] Colab-ready notebook for reruns
- [x] Training evidence plots embedded in README
- [x] Add HF blog link β€” [spaces/modelbuilderhq/ghostexec/blob/main/BLOG.md](https://huggingface.co/spaces/modelbuilderhq/ghostexec/blob/main/BLOG.md)
- [x] Add <2 minute YouTube demo link β€” [youtu.be/g4IFZMEzfO8](https://youtu.be/g4IFZMEzfO8)
## Repository Structure
```text
ghostexec/
β”œβ”€β”€ openenv.yaml
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ models.py
β”œβ”€β”€ client.py
β”œβ”€β”€ graders.py
β”œβ”€β”€ scenarios/
β”œβ”€β”€ scripts/
β”œβ”€β”€ notebooks/
β”œβ”€β”€ tests/
β”œβ”€β”€ output/
└── server/
β”œβ”€β”€ app.py
β”œβ”€β”€ ghostexec_environment.py
└── reward.py
```
## Additional References
- [OpenEnv (Meta PyTorch)](https://github.com/meta-pytorch/OpenEnv)
- [OpenEnv Packaging and Deploying Docs](https://meta-pytorch.org/OpenEnv/auto_getting_started/environment-builder.html)
- [OpenEnv Hub](https://huggingface.co/openenv)
- [Environment Innovation Deep-Dive](environment-innovation/README.md)
## License
BSD-style license as included in this repository and upstream OpenEnv lineage notices.