Spaces:
Running
title: OpenCode Environment Server
emoji: π οΈ
colorFrom: indigo
colorTo: purple
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
short_description: OpenCode coding agent in an E2B sandbox with logprob capture
OpenCode Environment for OpenEnv
opencode_env runs the OpenCode coding agent inside
an isolated E2B sandbox against any OpenAI-compatible
LLM endpoint, optionally capturing per-token logprobs for GRPO training.
π Try it live: AdithyaSK/opencode-env
The deployed Space exposes:
- Web UI at
/webβ pick endpoint, write task, hit Run, watch live phase log + reward + logprobs. - MCP tool API at
/mcpβ programmaticrun_rolloutcalls. - OpenAPI docs at
/docs. - Health at
/health.
The env is task-agnostic β every rollout is configured at call-time with a uniform Task shape:
instructionβ prompt for the agentsetupβ list of bash commands run before the agent (pip install, git clone, file downloads β anything you need staged in the sandbox)verifyβ list of bash commands run after the agent (asserts, pytest invocations, score-file writes)
Reward = passed_verify / total_verify unless any verify command writes
a float to /home/user/logs/verifier/reward.txt (override).
Quick Start
Async (default β talk to the deployed Space)
import asyncio
import os
from opencode_env import OpenCodeEnv
from opencode_env.client import _extract_text
from opencode_env.models import RolloutResult
async def main():
SPACE = "https://adithyask-opencode-env.hf.space"
async with OpenCodeEnv(base_url=SPACE) as env:
await env.reset()
# The MCP tool returns JSON; deserialize via the typed model.
raw = await env.call_tool(
"run_rollout",
endpoint="openai", # vllm | openai | hf_router
api_key=os.environ["OPENAI_API_KEY"], # or set as a Space secret
instruction=(
"Create binary_search.py exposing def binary_search(arr, target) -> int "
"that returns the index of target in arr, or -1 if absent. Use a "
"relative path."
),
setup=[],
verify=[
"test -f /home/user/workdir/binary_search.py",
"python -c \"import sys; sys.path.insert(0, '/home/user/workdir'); "
"import binary_search; "
"assert binary_search.binary_search([1,2,3], 2) == 1; print('OK')\"",
],
template="opencode-rl", # prebaked E2B template
task_id="binary_search_v1",
)
result = RolloutResult.model_validate_json(_extract_text(raw))
print("reward:", result.reward)
print("turns:", len(result.proxy_turns))
print("files:", list(result.files.keys()))
print("wall:", result.wall_s, "s")
asyncio.run(main())
Expected output (~20s with the prebaked template):
reward: 1.0
turns: 3
files: ['/home/user/workdir/binary_search.py', ...]
wall: 19.8 s
Sync wrapper
import os
from opencode_env import OpenCodeEnv
# .sync() returns a synchronous wrapper around the async client.
with OpenCodeEnv(base_url="https://adithyask-opencode-env.hf.space").sync() as env:
env.reset()
# MCP tools are reachable via env.call_tool(...) / env.step(...) sync-wrapped.
# See the async example above for the full run_rollout signature.
Point base_url at http://localhost:8000 to talk to a local container
instead of the public Space.
In-process primitive (no HTTP)
For trainers that want to drive a sandbox directly without an HTTP boundary:
import os
from opencode_env import (
OpenCodeConfig, OpenCodeSessionFactory, OpenCodeTask, E2BSandboxBackend,
)
factory = OpenCodeSessionFactory(
config=OpenCodeConfig(
provider="openai_compatible",
base_url="https://api.openai.com/v1",
api_key=os.environ["OPENAI_API_KEY"],
model="gpt-4o-mini",
),
sandbox_backend=E2BSandboxBackend(),
mode="transparent_proxy", # captures per-token logprobs
)
session = factory.create(task=OpenCodeTask(instruction="..."))
session.wait_for_completion()
turns = session.fetch_proxy_trace() # per-turn (tokens, logprobs)
session.close()
Building the Docker Image
The Dockerfile lives at server/Dockerfile. Use the openenv CLI from
the env root:
cd envs/opencode_env
openenv validate # check pyproject.toml + openenv.yaml + server/app.py + uv.lock
openenv build -t opencode-env # builds the image (uses server/Dockerfile)
# run locally with E2B credentials
docker run -p 8000:8000 -e E2B_API_KEY=e2b_... opencode-env
# push to HF Spaces (Docker variant)
openenv push --repo-id <user>/opencode-env
Or build directly without the CLI:
docker build -t opencode-env -f envs/opencode_env/server/Dockerfile envs/opencode_env
The image:
- Runs
uvicorn server.app:app --host 0.0.0.0 --port 8000 - Exposes the MCP API at
/mcpand/step, the Gradio UI at/web, health at/health, and OpenAPI docs at/docs. - Reads
E2B_API_KEYand (optionally) endpoint-specific env vars at runtime (see Environment Variables).
The MCP Tool: run_rollout
Single tool, two ways to specify the LLM endpoint:
Option A β endpoint shorthand (recommended): pass
endpoint="vllm" (or "openai" / "hf_router"). The server resolves
base_url, api_key, and model from env vars + catalog defaults.
Any explicit field overrides the catalog.
Option B β fully explicit: pass base_url + api_key + model
directly.
| Arg | Type | Default | Notes |
|---|---|---|---|
endpoint |
str |
"" |
One of "vllm" / "openai" / "hf_router". |
base_url / api_key / model |
str |
"" |
Override / supply explicitly. |
instruction |
str |
required | Prompt passed to opencode run. |
setup |
list[str] |
[] |
Bash commands run before the agent. |
verify |
list[str] |
[] |
Bash commands run after the agent. |
task_id |
str |
"" |
Echoed back in result. |
mode |
str |
"transparent_proxy" |
Or "black_box" (no logprobs). |
disable_thinking |
bool | None |
None (catalog default) |
Inject chat_template_kwargs.enable_thinking=false. |
max_tokens_cap |
int |
4096 |
Per-turn max_tokens clamp. |
top_logprobs |
int |
5 |
HF Router cap is 5; OpenAI 0β20; vLLM unbounded. |
agent_timeout_s |
float |
600.0 |
Hard wall budget for opencode. |
template |
str |
"" |
E2B template name; "opencode-rl" skips ~2 min of install per rollout. |
Returns RolloutResult JSON with: reward, setup_results[],
verify_results[], proxy_turns[], files{}, agent_log_tail,
proxy_log_tail, wall_s, agent_exit_code, sandbox_id, error.
Two Operating Modes
| Mode | What it does | Best for |
|---|---|---|
transparent_proxy (default) |
In-sandbox proxy at localhost:7000 forwards opencode's LLM calls to base_url, injects logprobs=true, captures per-turn (messages, completion_tokens, logprobs) to proxy_trace.jsonl. |
GRPO / RL training, observability, top-k distillation. |
black_box |
No proxy. opencode talks straight to base_url. |
Smoke tests, eval, SFT data collection. |
Environment Variables
The server reads these at runtime. Local dev auto-loads them from a
sibling .env file; on HF Spaces, set them as Space secrets.
| Variable | Required | Purpose |
|---|---|---|
E2B_API_KEY |
yes for any rollout | E2B sandbox credentials. |
MAX_CONCURRENT_ENVS |
no | Env-instance pool size. Default 4. |
ENABLE_WEB_INTERFACE |
no | Set false to disable the /web Gradio mount. Default true. |
| vLLM endpoint | ||
VLLM_URL |
required for endpoint="vllm" |
OAI-compatible base URL. |
VLLM_API_KEY |
no | Defaults to intercepted. |
VLLM_MODEL |
no | Defaults to Qwen/Qwen3.5-4B. |
| OpenAI endpoint | ||
OPENAI_API_KEY |
required for endpoint="openai" |
Standard OpenAI key. |
OPENAI_BASE_URL |
no | Defaults to https://api.openai.com/v1. |
OPENAI_MODEL |
no | Defaults to gpt-4o-mini (gpt-5.x and o-series refuse logprobs). |
| HF Router endpoint | ||
HF_ROUTER_API_KEY |
required for endpoint="hf_router" |
HF user token. |
HF_ROUTER_BASE_URL |
no | Defaults to https://router.huggingface.co/v1. |
HF_ROUTER_MODEL |
no | Defaults to Qwen/Qwen3-4B-Instruct-2507:nscale. |
Pick provider: suffixes that actually return logprobs:
Together / Nscale / Scaleway / SambaNova / Cerebras. Avoid Novita /
Hyperbolic / Featherless (silent drop) and Groq (HTTP 400).
Pre-baked E2B Template
The first rollout in a fresh E2B sandbox spends ~2 min installing opencode and the proxy's Python deps. Build a one-time template that ships those pre-installed:
.venv/bin/python envs/opencode_env/sandbox/build_template.py
# β builds `opencode-rl` template in your E2B account (~1m20s, one-time)
After this, pass template="opencode-rl" on every run_rollout call β
each rollout drops to ~20β30s end-to-end.
Project Structure
opencode_env/
βββ README.md # this file
βββ openenv.yaml # OpenEnv space spec
βββ pyproject.toml # deps + ``server`` entrypoint
βββ uv.lock # frozen deps (required by ``openenv validate``)
βββ .gitignore / .dockerignore # excludes .env / __pycache__
βββ __init__.py # re-exports primitive + client + models
β
βββ client.py # OpenCodeEnv(MCPToolClient)
βββ models.py # RolloutResult / RolloutTurn / OpenCodeState
β
βββ config.py # OpenCodeConfig (primitive)
βββ harness.py # OpenCodeSession / OpenCodeSessionFactory (CLI-only)
βββ opencode_runtime.py # opencode.json builder + cmds
βββ task.py # OpenCodeTask
β
βββ server/
β βββ __init__.py
β βββ app.py # FastAPI factory; mounts Gradio at /web
β βββ opencode_environment.py # MCPEnvironment with single ``run_rollout`` tool
β βββ gradio_ui.py # the /web Gradio Blocks UI
β βββ catalog.py # endpoint shorthand resolver
β βββ Dockerfile # multi-stage uv build (used by ``openenv build``)
β
βββ sandbox/
βββ __init__.py
βββ base.py # SandboxBackend / SandboxHandle Protocols
βββ e2b.py # E2B implementation
βββ interception.py # in-sandbox FastAPI proxy (logprob capture)
βββ build_template.py # one-time E2B template builder