DataBoySu commited on
Commit
dfd1faa
·
1 Parent(s): a4c032a

agent working

Browse files
Files changed (4) hide show
  1. README.md +394 -193
  2. inference.py +113 -23
  3. models.py +21 -1
  4. server/AML_env_environment.py +9 -2
README.md CHANGED
@@ -9,281 +9,482 @@ tags:
9
  - openenv
10
  ---
11
 
12
- # AML Investigator Environment
13
 
14
- A financial crime investigation environment for Reinforcement Learning agents.
15
- The agent must query a mock banking system (transactions, KYC records) under a strict API budget
16
- to investigate flagged accounts and submit a final fraud/clear decision.
17
 
18
- ## Quick Start
19
 
20
- The simplest way to use the Aml Env environment is through the `AmlEnv` class:
 
 
 
 
21
 
22
- ```python
23
- from AML_env import AmlAction, AmlEnv
24
 
25
- try:
26
- # Create environment from Docker image (built from root Dockerfile)
27
- env = AmlEnv.from_docker_image("aml-env:latest")
28
 
29
- # Reset to a specific task
30
- obs = env.reset(task="aml_easy")
31
- print(f"Alert: {obs.observation.alert_details}")
32
- print(f"Budget: {obs.observation.budget_remaining}")
33
 
34
- # Query transactions
35
- result = env.step(AmlAction(action={
36
- "action_type": "query_transactions",
37
- "account_id": "ACC-9001",
38
- "limit": 10,
39
- "offset": 0,
40
- }))
41
- print(f"Transactions: {result.observation.last_action_result}")
42
 
43
- # Submit final decision
44
- result = env.step(AmlAction(action={
45
- "action_type": "submit_decision",
46
- "decision": "CLEAR",
47
- "evidence_links": [],
48
- }))
49
- print(f"Done: {result.done}, Reward: {result.reward}")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
- finally:
52
- env.close()
53
  ```
54
 
55
- That's it! The `AmlEnv.from_docker_image()` method handles:
56
- - Starting the Docker container
57
- - Waiting for the server to be ready
58
- - Connecting to the environment
59
- - Container cleanup when you call `close()`
60
 
61
- ## Building the Docker Image
62
 
63
- Before using the environment, you need to build the Docker image:
64
 
65
- ```bash
66
- # From project root
67
- docker build -t aml-env:latest .
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
  ```
69
 
70
- ## Deploying to Hugging Face Spaces
71
 
72
- You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command:
73
 
74
- ```bash
75
- # From the environment directory (where openenv.yaml is located)
76
- openenv push
 
 
 
 
 
77
 
78
- # Or specify options
79
- openenv push --namespace my-org --private
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
  ```
81
 
82
- The `openenv push` command will:
83
- 1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`)
84
- 2. Prepare a custom build for Hugging Face Docker space (enables web interface)
85
- 3. Upload to Hugging Face (ensuring you're logged in)
86
 
87
- ### Prerequisites
88
 
89
- - Authenticate with Hugging Face: The command will prompt for login if not already authenticated
90
 
91
- ### Options
92
 
93
- - `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory)
94
- - `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)
95
- - `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM)
96
- - `--private`: Deploy the space as private (default: public)
97
 
98
- ### Examples
99
 
100
- ```bash
101
- # Push to your personal namespace (defaults to username/env-name from openenv.yaml)
102
- openenv push
103
 
104
- # Push to a specific repository
105
- openenv push --repo-id my-org/my-env
 
106
 
107
- # Push with a custom base image
108
- openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest
 
 
 
 
 
109
 
110
- # Push as a private space
111
- openenv push --private
112
 
113
- # Combine options
114
- openenv push --repo-id my-org/my-env --base-image custom-base:latest --private
115
  ```
116
 
117
- After deployment, your space will be available at:
118
- `https://huggingface.co/spaces/<repo-id>`
119
 
120
- The deployed space includes:
121
- - **Web Interface** at `/web` - Interactive UI for exploring the environment
122
- - **API Documentation** at `/docs` - Full OpenAPI/Swagger interface
123
- - **Health Check** at `/health` - Container health monitoring
124
- - **WebSocket** at `/ws` - Persistent session endpoint for low-latency interactions
125
 
126
- ## Environment Details
127
 
128
- ### Action Space
129
- **AmlAction** wraps one of four tool calls (discriminated by `action_type`):
130
 
131
- | Tool | Fields | Description |
132
- |---|---|---|
133
- | `query_transactions` | `account_id`, `limit`, `offset` | Paginated transaction history for an account |
134
- | `search_transactions` | `account_id`, `keyword` | Search memo_text of transactions |
135
- | `get_kyc_record` | `entity_id` | Retrieve KYC data for an entity |
136
- | `submit_decision` | `decision` (`FRAUD`\|`CLEAR`), `evidence_links` | Final verdict — ends the episode |
137
 
138
- ### Observation Space
139
- **AmlObservation** is returned after every `reset()` and `step()`:
 
140
 
141
- | Field | Type | Description |
142
- |---|---|---|
143
- | `alert_details` | `str` | The investigation mission (constant per episode) |
144
- | `budget_remaining` | `int` | API calls left before forced termination |
145
- | `last_action` | `str \| None` | Name of the last tool called |
146
- | `last_action_result` | `Any` | Payload returned by the last tool |
147
- | `error_message` | `str \| None` | Error string if the last action failed |
148
- | `done` | `bool` | Whether the episode has ended |
149
- | `reward` | `float` | Per-step reward signal |
150
 
151
- ### Reward
152
- - **Per step:** `-0.02` (efficiency penalty discourages random looping)
153
- - **Submit FRAUD (correct):** grader returns `0.4`–`1.0` depending on evidence quality
154
- - **Submit CLEAR (correct false positive):** grader returns `1.0`
155
- - **Budget exhausted without submission:** episode ends with accumulated negative rewards
156
 
157
- ## Advanced Usage
 
 
158
 
159
- ### Connecting to an Existing Server
160
 
161
- If you already have a Aml Env environment server running, you can connect directly:
162
 
163
- ```python
164
- from AML_env import AmlEnv
165
 
166
- # Connect to existing server
167
- AML_envenv = AmlEnv(base_url="<ENV_HTTP_URL_HERE>")
168
 
169
- # Use as normal
170
- result = AML_envenv.reset()
171
- result = AML_envenv.step(AmlAction(message="Hello!"))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
172
  ```
173
 
174
- Note: When connecting to an existing server, `AML_envenv.close()` will NOT stop the server.
175
 
176
- ### Using the Context Manager
177
 
178
- The client supports context manager usage for automatic connection management:
179
 
180
- ```python
181
- from AML_env import AmlAction, AmlEnv
182
 
183
- # Connect with context manager (auto-connects and closes)
184
- with AmlEnv(base_url="http://localhost:8000") as env:
185
- result = env.reset()
186
- print(f"Reset: {result.observation.echoed_message}")
187
- # Multiple steps with low latency
188
- for msg in ["Hello", "World", "!"]:
189
- result = env.step(AmlAction(message=msg))
190
- print(f"Echoed: {result.observation.echoed_message}")
191
  ```
192
 
193
- The client uses WebSocket connections for:
194
- - **Lower latency**: No HTTP connection overhead per request
195
- - **Persistent session**: Server maintains your environment state
196
- - **Efficient for episodes**: Better for many sequential steps
197
 
198
- ### Concurrent WebSocket Sessions
 
 
 
 
199
 
200
- The server supports multiple concurrent WebSocket connections. To enable this,
201
- modify `server/app.py` to use factory mode:
 
 
 
202
 
203
- ```python
204
- # In server/app.py - use factory mode for concurrent sessions
205
- app = create_app(
206
- AmlEnvironment, # Pass class, not instance
207
- AmlAction,
208
- AmlObservation,
209
- max_concurrent_envs=4, # Allow 4 concurrent sessions
210
- )
211
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
212
 
213
- Then multiple clients can connect simultaneously:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
214
 
215
  ```python
216
  from AML_env import AmlAction, AmlEnv
217
- from concurrent.futures import ThreadPoolExecutor
218
-
219
- def run_episode(client_id: int):
220
- with AmlEnv(base_url="http://localhost:8000") as env:
221
- result = env.reset()
222
- for i in range(10):
223
- result = env.step(AmlAction(message=f"Client {client_id}, step {i}"))
224
- return client_id, result.observation.message_length
225
-
226
- # Run 4 episodes concurrently
227
- with ThreadPoolExecutor(max_workers=4) as executor:
228
- results = list(executor.map(run_episode, range(4)))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
229
  ```
230
 
231
- ## Development & Testing
232
 
233
- ### Direct Environment Testing
 
 
234
 
235
- Test the environment logic directly without starting the HTTP server:
236
 
237
  ```bash
238
- # From the server directory
239
- python3 server/AML_env_environment.py
 
 
 
240
  ```
241
 
242
- This verifies that:
243
- - Environment resets correctly
244
- - Step executes actions properly
245
- - State tracking works
246
- - Rewards are calculated correctly
247
 
248
- ### Running Locally
 
 
 
 
 
 
249
 
250
- Run the server locally for development:
251
 
252
  ```bash
253
- uvicorn server.app:app --reload
 
 
 
 
254
  ```
255
 
 
 
 
 
 
 
 
 
 
 
 
256
  ## Project Structure
257
 
258
  ```
259
  AML_env/
260
- ├── Dockerfile # Container image (root, HF Spaces compliant)
261
- ├── .dockerignore # Docker build exclusions
262
- ├── .hfignore # HF Space upload exclusions
263
- ├── .gitignore # Git exclusions
264
- ├── __init__.py # Package exports (AmlEnv, AmlAction, AmlObservation)
265
- ├── client.py # AmlEnv WebSocket client
266
- ├── models.py # Pydantic action/observation schemas
267
- ├── inference.py # Baseline RL agent (OpenAI client, [START]/[STEP]/[END] logs)
268
- ├── openenv.yaml # OpenEnv manifest (tasks, graders, port)
269
- ├── pyproject.toml # Project metadata and uv dependencies
270
- ├── uv.lock # Locked dependency graph
271
- ├── README.md # This file (also HF Space card)
272
  ├── data/
273
- │ ├── entities.json # 312 KYC entity records
274
- │ ├── accounts.json # 410 bank accounts
275
- │ └── transactions.json # 5,079 transactions (haystack + fraud scenarios)
 
276
  ├── graders/
277
- │ ├── __init__.py
278
- │ ├── aml_easy.py # "The False Positive" grader
279
- ── aml_medium.py # "The Smurf Network" grader
280
- └── aml_hard.py # "The Corporate Mirage" grader
281
  ├── server/
282
- │ ├── __init__.py
283
- │ ├── AML_env_environment.py # Core OpenEnv environment (reset/step/state)
284
- ── app.py # FastAPI server (CORS, create_app wrapper)
285
- └── requirements.txt # Pip fallback requirements
286
  └── tools/
287
- ├── haystack.py # Financial graph generator
288
- └── tasks.json # Manual fraud scenario definitions
289
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - openenv
10
  ---
11
 
12
+ <div align="center">
13
 
14
+ # 🕵️ AML Investigator OpenEnv RL Environment
 
 
15
 
16
+ **A financial crime investigation environment for training and evaluating LLM agents**
17
 
18
+ [![OpenEnv](https://img.shields.io/badge/OpenEnv-compatible-6366f1?style=flat-square)](https://github.com/openenv)
19
+ [![FastAPI](https://img.shields.io/badge/FastAPI-async-009688?style=flat-square&logo=fastapi)](https://fastapi.tiangolo.com)
20
+ [![Pydantic](https://img.shields.io/badge/Pydantic-v2-e92063?style=flat-square)](https://docs.pydantic.dev)
21
+ [![Docker](https://img.shields.io/badge/Docker-ready-2496ED?style=flat-square&logo=docker)](https://www.docker.com)
22
+ [![HF Spaces](https://img.shields.io/badge/HuggingFace-Spaces-FFD21E?style=flat-square&logo=huggingface)](https://huggingface.co/spaces)
23
 
24
+ </div>
 
25
 
26
+ ---
 
 
27
 
28
+ ## What Is This?
 
 
 
29
 
30
+ Most RL benchmarks for language models test knowledge retrieval or reasoning in isolation. This environment tests something harder and more practical: **can an LLM agent act as a financial investigator?**
 
 
 
 
 
 
 
31
 
32
+ The agent is given a banking system alert and a budget of API calls. It must use tools to query transaction ledgers, search memo fields, pull KYC records, and finally submit a verdict — `FRAUD` or `CLEAR` — with evidence. The agent is rewarded for correctness and efficiency; it is penalized for every wasted call.
33
+
34
+ What makes this environment non-trivial:
35
+
36
+ - **The haystack is real noise.** 5,000+ transactions of legitimate payroll, utility bills, and vendor invoices surround every fraud signal.
37
+ - **Pagination is mandatory.** Corporate accounts hold 150–500 transactions. Dumping them all into context causes an OOM failure. The agent must learn to search and paginate strategically.
38
+ - **False flags are everywhere.** The hard task contains a $100 transfer to an entity with a watchlist name — designed specifically to bait the agent into wasting its budget.
39
+ - **KYC cross-referencing.** The hardest task cannot be solved by reading transactions alone. The agent must chain multiple `get_kyc_record` calls to trace hidden ownership loops.
40
+
41
+ ---
42
+
43
+ ## Architecture Overview
44
+
45
+ ```mermaid
46
+ graph TD
47
+ subgraph Agent["LLM Agent (inference.py)"]
48
+ P[Prompt + Alert Details]
49
+ T[Tool Selection via Pydantic JSON]
50
+ C[Sliding Context Window]
51
+ end
52
+
53
+ subgraph Server["OpenEnv Server (FastAPI)"]
54
+ E[AML Environment<br/>Reset / Step]
55
+ G[Grader<br/>aml_easy, aml_medium, aml_hard]
56
+ end
57
+
58
+ subgraph Data["Mock Banking Database /data"]
59
+ ENT[entities.json<br/>312 KYC Records]
60
+ ACC[accounts.json<br/>410 Bank Accounts]
61
+ TXN[transactions.json<br/>5,079 Transactions]
62
+ end
63
+
64
+ P -->|AmlAction JSON| E
65
+ E -->|AmlObservation| C
66
+ C --> T
67
+ T --> P
68
+ E <-->|O1 dict lookups| ENT
69
+ E <-->|O1 dict lookups| ACC
70
+ E <-->|O1 dict lookups| TXN
71
+ E -->|submit_decision| G
72
+ G -->|score 0.0-1.0| E
73
 
 
 
74
  ```
75
 
76
+ ---
 
 
 
 
77
 
78
+ ## The Episode Loop
79
 
80
+ Every investigation runs as a sequence of steps between agent and environment. The agent sees no state beyond what it has explicitly queried.
81
 
82
+ ```mermaid
83
+ sequenceDiagram
84
+ participant A as Agent
85
+ participant E as Environment
86
+ participant D as Data Layer
87
+
88
+ E-->>A: reset() -> AmlObservation<br/>(alert_details, budget=N)
89
+
90
+ loop Until submit_decision or budget=0
91
+ A->>E: step(AmlAction)
92
+ E->>D: dict lookup (O(1))
93
+ D-->>E: raw records
94
+ E-->>A: AmlObservation<br/>(last_action_result, budget-=1, reward-=0.02)
95
+ end
96
+
97
+ A->>E: step(submit_decision, evidence=[...])
98
+ E->>E: Run Grader
99
+ E-->>A: AmlObservation<br/>(done=True, reward=0.0-1.0)
100
  ```
101
 
102
+ ---
103
 
104
+ ## Action Space
105
 
106
+ The agent communicates exclusively through **typed Pydantic actions**. No regex parsing. No free-form text commands. Every action dispatches to exactly one tool.
107
+
108
+ | Action | Key Parameters | Purpose |
109
+ |---|---|---|
110
+ | `query_transactions` | `account_id`, `limit=10`, `offset=0` | Paginated ledger history. **Must paginate** for corporate accounts. |
111
+ | `search_transactions` | `account_id`, `keyword` | Filter `memo_text` fields. Cuts noise without burning pagination budget. |
112
+ | `get_kyc_record` | `entity_id` | Retrieve address, entity type, and corporate directors. |
113
+ | `submit_decision` | `decision: FRAUD\|CLEAR`, `evidence_links: List[str]` | Terminal action. Ends the episode and triggers the grader. |
114
 
115
+ > **Why Pydantic?** The LLM is the router. Strict schemas with `Field(description="...")` mean the model reads the tool contract, not a prompt full of prose instructions. Malformed output is caught at validation, not execution, preventing silent failures and hallucinated account IDs from crashing the environment.
116
+
117
+ ---
118
+
119
+ ## Observation Space
120
+
121
+ Every `reset()` and `step()` returns an `AmlObservation` containing the agent's full situational picture.
122
+
123
+ ```python
124
+ class AmlObservation(BaseModel):
125
+ alert_details: str # Investigation mission — constant per episode
126
+ budget_remaining: int # API calls left before forced termination
127
+ last_action: str | None # Name of the last tool called
128
+ last_action_result: Any # Exact payload returned by the last tool
129
+ error_message: str | None # Formatted error if the last call failed (not a crash)
130
+ done: bool # Whether the episode has ended
131
+ reward: float # Cumulative reward signal
132
  ```
133
 
134
+ > **Errors are data, not exceptions.** If the agent hallucinates `ACC-9999`, the environment catches the `KeyError`, formats it as `"Account 'ACC-9999' not found"`, and returns it as `error_message`. The container never crashes. The agent can read the error and self-correct on the next step.
 
 
 
135
 
136
+ ---
137
 
138
+ ## The Three Tasks
139
 
140
+ The environment ships with three investigation scenarios of escalating difficulty, each targeting a distinct AML typology.
141
 
142
+ ### Task 1 The False Positive `aml_easy`
 
 
 
143
 
144
+ > **Alert:** `ACC-101` (local construction company) transferred $50,000 to `ACC-909`, a newly registered entity in a high-risk jurisdiction.
145
 
146
+ The trap is the jurisdiction flag. A naive model panics and submits `FRAUD`. A well-reasoned agent reads the memo, pulls the KYC record, and discovers a legitimate equipment supplier.
 
 
147
 
148
+ ```mermaid
149
+ flowchart LR
150
+ A([Alert:<br/>ACC-101 to ACC-909<br/>$50,000]) --> B
151
 
152
+ subgraph Investigation
153
+ B[query_transactions<br/>ACC-101] --> C{Memo:<br/>'Heavy Machinery<br/>Purchase - Unit 4'}
154
+ C --> D[get_kyc_record<br/>ACC-909]
155
+ D --> E{Registered as:<br/>Global Tractor Sales Ltd}
156
+ E --> F[query_transactions<br/>ACC-909]
157
+ F --> G{50 inbound payments<br/>from global firms}
158
+ end
159
 
160
+ G --> H([submit_decision<br/>CLEAR])
 
161
 
162
+ style A fill:#ef4444,color:#fff
163
+ style H fill:#22c55e,color:#fff
164
  ```
165
 
166
+ **Reward:** `1.0` for `CLEAR`. The agent proves it can dismiss noise without over-indexing on surface-level signals.
 
167
 
168
+ ---
 
 
 
 
169
 
170
+ ### Task 2 — The Smurf Network `aml_medium`
171
 
172
+ > **Alert:** `ACC-200` (used car dealership) shows a spike in cash deposits over a 5-day window.
 
173
 
174
+ The agent must paginate through hundreds of normal car-sale transactions to surface 14 cash deposits — all for exactly $9,900 or $9,500, just below the $10,000 AML reporting threshold. The three sender accounts (`ACC-301`, `ACC-302`, `ACC-303`) were all opened on the same day with the same occupation listed: `Student`.
 
 
 
 
 
175
 
176
+ ```mermaid
177
+ flowchart TD
178
+ A([Alert:<br/>ACC-200 deposit velocity spike]) --> B
179
 
180
+ subgraph Investigation["Paginate -> Spot -> Cross-Reference"]
181
+ B[query_transactions<br/>ACC-200<br/>offset 0, 10, 20...] --> C{14 deposits<br/>$9,900 and $9,500<br/>below $10k threshold}
182
+ C --> D[get_kyc_record<br/>ACC-301, ACC-302, ACC-303]
183
+ D --> E{All 3 accounts:<br/>Opened same day<br/>Occupation: Student}
184
+ end
 
 
 
 
185
 
186
+ E --> F([submit_decision<br/>FRAUD<br/>evidence: ACC-301, ACC-302, ACC-303])
 
 
 
 
187
 
188
+ style A fill:#f97316,color:#fff
189
+ style F fill:#dc2626,color:#fff
190
+ ```
191
 
192
+ **Partial credit scoring:** The grader awards proportional reward based on how many of the three smurf accounts are included in `evidence_links`. Identifying 1 of 3 scores higher than 0 but lower than the full `1.0`.
193
 
194
+ ---
195
 
196
+ ### Task 3 — The Corporate Mirage `aml_hard`
 
197
 
198
+ > **Alert:** `ACC-500` (major logistics firm) transferred $2.5M to `ACC-700` (generic consulting agency).
 
199
 
200
+ This is the full haystack. `ACC-500` has 500+ transactions. `ACC-700` has hundreds of outbound payments to vendors, charities, and payroll. Hidden inside: 48 hours after receiving $2.5M, `ACC-700` moves $2.4M offshore. The ownership chain requires three chained KYC lookups to resolve.
201
+
202
+ **The false flag trap:** `ACC-500` also made a $100 payment to an entity named `Al-Qaeda Watchlist Target`. This is deliberate bait. Agents that investigate the $100 transfer instead of the $2.5M loop receive a score of `0.05`.
203
+
204
+ ```mermaid
205
+ flowchart TD
206
+ A([Alert:<br/>ACC-500 to ACC-700<br/>$2.5M]) --> B
207
+
208
+ subgraph Trap["The Bait - Do Not Take It"]
209
+ X["$100 transfer<br/>to Watchlist Target"]
210
+ end
211
+
212
+ subgraph Investigation["The Real Loop"]
213
+ B --> C["search_transactions<br/>ACC-700<br/>keyword: 'consulting'"]
214
+ C --> D{48hrs later:<br/>ACC-700 to ACC-888<br/>$2.4M offshore}
215
+ D --> E[get_kyc_record<br/>ACC-888]
216
+ E --> F{Director:<br/>Robert House}
217
+ F --> G[get_kyc_record<br/>ACC-500]
218
+ G --> H{Director:<br/>Apex Management Corp}
219
+ H --> I[get_kyc_record<br/>Apex Management Corp]
220
+ I --> J{CEO:<br/>Robert House same person}
221
+ end
222
+
223
+ A -.->|naive agent wastes budget| X
224
+ J --> K([submit_decision<br/>FRAUD<br/>evidence: ACC-500, ACC-700, ACC-888])
225
+
226
+ style A fill:#ef4444,color:#fff
227
+ style X fill:#6b7280,color:#fff,stroke-dasharray: 5 5
228
+ style K fill:#dc2626,color:#fff
229
+ style J fill:#fbbf24,color:#000
230
  ```
231
 
232
+ **Scoring:** Full `1.0` for identifying all three accounts with the circular KYC loop documented. `0.05` if the agent chases the false flag instead.
233
 
234
+ ---
235
 
236
+ ## Reward Structure
237
 
238
+ ```
239
+ Episode reward = Σ(step penalties) + terminal reward
240
 
241
+ Step penalty: −0.02 per API call (discourages random exploration)
242
+ FRAUD correct: +0.4 to +1.0 (scales with evidence quality)
243
+ CLEAR correct: +1.0 (false positives must be dismissed confidently)
244
+ Budget exhaust: 0.0 (no terminal reward — accumulated penalties only)
 
 
 
 
245
  ```
246
 
247
+ Budget scales with task difficulty:
 
 
 
248
 
249
+ | Task | Budget | Rationale |
250
+ |---|---|---|
251
+ | `aml_easy` | 5 calls | 4 tool calls are sufficient; any more suggests confusion |
252
+ | `aml_medium` | 12 calls | Pagination required; partial paths need room |
253
+ | `aml_hard` | 20 calls | Three KYC hops + pagination across two high-volume accounts |
254
 
255
+ ---
256
+
257
+ ## The Mock Knowledge Graph
258
+
259
+ The haystack is a procedurally generated slice of a fictional bank, seeded for reproducibility.
260
 
 
 
 
 
 
 
 
 
261
  ```
262
+ entities.json 312 records 80% Individual, 20% Corporate (with directors list)
263
+ accounts.json 410 records 95% Active, 5% Closed
264
+ transactions.json 5,079 rows Procedural noise + 3 injected fraud scenarios
265
+ ```
266
+
267
+ Transaction `memo_text` is typed by sender/receiver pair to simulate realistic commerce:
268
+
269
+ | Flow | Example Memos | Amount Range |
270
+ |---|---|---|
271
+ | Corporate → Individual | `Payroll`, `Salary Q3`, `Expense Reimbursement` | $2,000–$10,000 |
272
+ | Corporate → Corporate | `Server Hosting`, `Consulting Retainer`, `Invoice #XXXX` | $500–$50,000 |
273
+ | Individual → Corporate | `Utility Bill`, `Gym Membership`, `Coffee` | $5–$200 |
274
+ | Individual → Individual | `Dinner split`, `Rent share`, `Birthday gift` | $10–$500 |
275
+
276
+ Fraud scenarios are injected with camouflage: 5–10 "normal" bridging transactions connect each manual account to the procedural haystack so no fraud node appears as an isolated island in the graph.
277
+
278
+ ---
279
+
280
+ ## Core Engineering Principles
281
+
282
+ These principles govern how the environment is designed and why each decision was made.
283
+
284
+ <details>
285
+ <summary><strong>1. You don't design the control flow</strong></summary>
286
+
287
+ The `step()` function is a pure reactive state machine. If the agent queries the same account five times in a row, the environment returns the result five times. It never forces a sequence or nudges toward the solution path. The agent is in the driver's seat.
288
+
289
+ </details>
290
+
291
+ <details>
292
+ <summary><strong>2. Errors are data, not control flow</strong></summary>
293
+
294
+ Hallucinated account IDs, missing entity records, malformed queries — all are caught with `try/except`, formatted as human-readable strings, and returned as `error_message` in the observation. The container never crashes on bad agent output.
295
 
296
+ </details>
297
+
298
+ <details>
299
+ <summary><strong>3. The conversation is the database</strong></summary>
300
+
301
+ The environment is stateless between calls. The agent's only memory is the `AmlObservation` history it has accumulated. Every response includes `budget_remaining`, `last_action`, and the full `last_action_result` payload so nothing is lost between turns.
302
+
303
+ </details>
304
+
305
+ <details>
306
+ <summary><strong>4. No regex. Pydantic is the contract.</strong></summary>
307
+
308
+ Actions are strictly typed Pydantic models with `Field(description="...")` on every parameter. The LLM reads the schema to understand how to use each tool. Invalid JSON is caught at validation — not mid-execution.
309
+
310
+ </details>
311
+
312
+ <details>
313
+ <summary><strong>5. Pagination is an OOM prevention mechanism</strong></summary>
314
+
315
+ Corporate accounts have 150–500 transactions. Returning them all in one response would blow up the context window. The `query_transactions` tool enforces a `limit` parameter (default 10, max configurable). The agent must learn to paginate or use keyword search to find signals in high-volume accounts.
316
+
317
+ </details>
318
+
319
+ <details>
320
+ <summary><strong>6. Context compaction is layered</strong></summary>
321
+
322
+ The inference script maintains a sliding window over conversation history (last 4–5 steps). Internal chain-of-thought reasoning is routed to `stderr`, keeping `stdout` clean for the grader's `[START]`/`[STEP]`/`[END]` log parsing.
323
+
324
+ </details>
325
+
326
+ <details>
327
+ <summary><strong>7. The prompt is code, not config</strong></summary>
328
+
329
+ The `alert_details` string returned by `reset()` is the agent's mission statement. It defines the goal, names the flagged account, and sets the investigation frame. Vague alerts produce vague investigations.
330
+
331
+ </details>
332
+
333
+ ---
334
+
335
+ ## Quick Start
336
+
337
+ ### Prerequisites
338
+
339
+ ```bash
340
+ pip install faker # for haystack generation
341
+ docker build -t aml-env:latest .
342
+ ```
343
+
344
+ ### Running an Episode
345
 
346
  ```python
347
  from AML_env import AmlAction, AmlEnv
348
+
349
+ try:
350
+ env = AmlEnv.from_docker_image("aml-env:latest")
351
+
352
+ # Choose task: "aml_easy" | "aml_medium" | "aml_hard"
353
+ obs = env.reset(task="aml_medium")
354
+ print(f"Alert: {obs.observation.alert_details}")
355
+ print(f"Budget: {obs.observation.budget_remaining}")
356
+
357
+ # Page through transactions
358
+ result = env.step(AmlAction(action={
359
+ "action_type": "query_transactions",
360
+ "account_id": "ACC-200",
361
+ "limit": 10,
362
+ "offset": 0,
363
+ }))
364
+ print(result.observation.last_action_result)
365
+
366
+ # Search by keyword to cut noise
367
+ result = env.step(AmlAction(action={
368
+ "action_type": "search_transactions",
369
+ "account_id": "ACC-700",
370
+ "keyword": "consulting",
371
+ }))
372
+
373
+ # Pull KYC record
374
+ result = env.step(AmlAction(action={
375
+ "action_type": "get_kyc_record",
376
+ "entity_id": "ENT-0042",
377
+ }))
378
+
379
+ # Submit final verdict
380
+ result = env.step(AmlAction(action={
381
+ "action_type": "submit_decision",
382
+ "decision": "FRAUD",
383
+ "evidence_links": ["ACC-301", "ACC-302", "ACC-303"],
384
+ }))
385
+ print(f"Done: {result.done} | Reward: {result.reward:.3f}")
386
+
387
+ finally:
388
+ env.close()
389
  ```
390
 
391
+ ### Connect to an Existing Server
392
 
393
+ ```python
394
+ env = AmlEnv(base_url="http://localhost:8760")
395
+ ```
396
 
397
+ ### Regenerate the Haystack
398
 
399
  ```bash
400
+ # Procedural noise only
401
+ python tools/haystack.py
402
+
403
+ # Inject hand-written fraud scenarios
404
+ python tools/haystack.py --inject tools/tasks.json --output-dir data/
405
  ```
406
 
407
+ ---
 
 
 
 
408
 
409
+ ## Deployment
410
+
411
+ ### Local Development
412
+
413
+ ```bash
414
+ uvicorn server.app:app --reload --port 8760
415
+ ```
416
 
417
+ ### Hugging Face Spaces
418
 
419
  ```bash
420
+ # From environment directory
421
+ openenv push
422
+
423
+ # Private space with custom repo
424
+ openenv push --repo-id my-org/aml-investigator --private
425
  ```
426
 
427
+ After deployment, the space exposes:
428
+
429
+ | Endpoint | Description |
430
+ |---|---|
431
+ | `/web` | Interactive UI for manual exploration |
432
+ | `/docs` | Swagger / OpenAPI interface |
433
+ | `/ws` | WebSocket endpoint for low-latency agent sessions |
434
+ | `/health` | Container health check |
435
+
436
+ ---
437
+
438
  ## Project Structure
439
 
440
  ```
441
  AML_env/
442
+ ├── Dockerfile # HF Spaces compliant; exposes port 8760
443
+ ├── openenv.yaml # Task manifest: aml_easy, aml_medium, aml_hard
444
+ ├── models.py # Pydantic AmlAction + AmlObservation schemas
445
+ ├── client.py # AmlEnv WebSocket client
446
+ ├── inference.py # Baseline agent: asyncio, sliding window, stderr CoT
447
+
 
 
 
 
 
 
448
  ├── data/
449
+ │ ├── entities.json # 312 KYC entity records
450
+ │ ├── accounts.json # 410 bank accounts
451
+ │ └── transactions.json # 5,079 transactions (haystack + fraud)
452
+
453
  ├── graders/
454
+ │ ├── aml_easy.py # False positive — reward CLEAR, penalise over-flagging
455
+ │ ├── aml_medium.py # Smurf network partial credit per smurf account found
456
+ ── aml_hard.py # Corporate mirage 0.05 if false-flag bait taken
457
+
458
  ├── server/
459
+ │ ├── AML_env_environment.py # Core state machine: reset(), step(), budget, grader dispatch
460
+ │ ├── app.py # FastAPI wrapper with CORS
461
+ ── requirements.txt
462
+
463
  └── tools/
464
+ ├── haystack.py # Procedural KB generator (Faker + random)
465
+ └── tasks.json # Hand-written fraud scenario definitions
466
  ```
467
+
468
+ ---
469
+
470
+ ## Evaluation Log Format
471
+
472
+ The inference script emits strict single-line logs to `stdout` for automated grading:
473
+
474
+ ```
475
+ [START] {"task": "aml_hard", "budget": 20}
476
+ [STEP] {"action": "query_transactions", "reward": -0.02, "done": false, "budget": 19}
477
+ [STEP] {"action": "get_kyc_record", "reward": -0.02, "done": false, "budget": 18}
478
+ [STEP] {"action": "submit_decision", "reward": 0.85, "done": true, "budget": 17}
479
+ [END] {"total_reward": 0.79, "steps": 3, "decision": "FRAUD"}
480
+ ```
481
+
482
+ Internal chain-of-thought reasoning routes to `stderr` and is never visible to the grader.
483
+
484
+ ---
485
+
486
+ <div align="center">
487
+
488
+ Built with [OpenEnv](https://github.com/openenv) · Deployed on [Hugging Face Spaces](https://huggingface.co/spaces)
489
+
490
+ </div>
inference.py CHANGED
@@ -7,6 +7,7 @@ import os
7
  import json
8
  import textwrap
9
  import sys
 
10
  from typing import List, Optional
11
  from openai import OpenAI
12
 
@@ -39,6 +40,13 @@ SYSTEM_PROMPT = textwrap.dedent(
39
  2. {"action": {"action_type": "search_transactions", "account_id": "ACC-XXXX", "keyword": "invoice"}}
40
  3. {"action": {"action_type": "get_kyc_record", "entity_id": "ENT-XXXX"}}
41
  4. {"action": {"action_type": "submit_decision", "decision": "FRAUD", "evidence_links": ["ACC-1234"]}} (Use "CLEAR" for False Positives with empty evidence_links).
 
 
 
 
 
 
 
42
  """
43
  ).strip()
44
 
@@ -106,6 +114,68 @@ def _extract_text_from_completions_api(completion: object) -> str:
106
 
107
  raise ValueError("Completions API response text is empty")
108
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
109
  def log_start(task: str, env: str, model: str) -> None:
110
  print(f"[START] task={task} env={env} model={model}", flush=True)
111
 
@@ -119,6 +189,18 @@ def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> No
119
  rewards_str = ",".join(f"{r:.2f}" for r in rewards)
120
  print(f"[END] success={str(success).lower()} steps={steps} score={score:.2f} rewards={rewards_str}", flush=True)
121
 
 
 
 
 
 
 
 
 
 
 
 
 
122
  def get_model_message(client: OpenAI, obs_dict: dict, history: List[str]) -> str:
123
  history_block = "\n".join(history[-5:]) if history else "No previous steps."
124
  user_prompt = f"Observation:\n{json.dumps(obs_dict, indent=2)}\n\nHistory:\n{history_block}\n\nProvide your next JSON action:"
@@ -130,10 +212,11 @@ def get_model_message(client: OpenAI, obs_dict: dict, history: List[str]) -> str
130
  {"role": "system", "content": SYSTEM_PROMPT},
131
  {"role": "user", "content": user_prompt},
132
  ],
133
- temperature=0.1,
134
- max_tokens=200,
 
135
  )
136
- return _extract_text_from_chat_completion(completion)
137
  except Exception as chat_exc:
138
  # Retry via Responses API for OpenAI-compatible providers that do not
139
  # populate chat.completions choices consistently.
@@ -142,18 +225,18 @@ def get_model_message(client: OpenAI, obs_dict: dict, history: List[str]) -> str
142
  model=MODEL_NAME,
143
  instructions=SYSTEM_PROMPT,
144
  input=user_prompt,
145
- max_output_tokens=200,
146
  )
147
- return _extract_text_from_responses_api(response)
148
  except Exception as responses_exc:
149
  try:
150
  completion = client.completions.create(
151
  model=MODEL_NAME,
152
  prompt=f"{SYSTEM_PROMPT}\n\n{user_prompt}",
153
- temperature=0.1,
154
  max_tokens=200,
155
  )
156
- return _extract_text_from_completions_api(completion)
157
  except Exception as completions_exc:
158
  print(
159
  (
@@ -163,7 +246,7 @@ def get_model_message(client: OpenAI, obs_dict: dict, history: List[str]) -> str
163
  file=sys.stderr,
164
  flush=True,
165
  )
166
- return FALLBACK_ACTION_JSON
167
 
168
  async def main() -> None:
169
  client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)
@@ -177,6 +260,7 @@ async def main() -> None:
177
  steps_taken = 0
178
  score = 0.0
179
  success = False
 
180
 
181
  log_start(task=task_name, env=BENCHMARK, model=MODEL_NAME)
182
 
@@ -191,25 +275,26 @@ async def main() -> None:
191
  action_str = get_model_message(client, obs_dict, history)
192
 
193
  # Parse LLM string to Pydantic Model
 
194
  try:
195
- # Strip possible markdown backticks
196
- clean_str = action_str.replace("```json", "").replace("```", "").strip()
197
  action_json = json.loads(clean_str)
 
 
 
 
 
198
  action_obj = AmlAction.model_validate(action_json)
199
  error = None
200
  except Exception as e:
201
  # Errors are data! If the LLM writes bad JSON, we catch it and force a dummy action
202
  # so the environment can return a schema error to the LLM.
 
203
  error = f"JSON Parse/Schema Error: {str(e)}"
204
- action_obj = AmlAction.model_validate(
205
- {
206
- "action": {
207
- "action_type": "submit_decision",
208
- "decision": "CLEAR",
209
- "evidence_links": [],
210
- }
211
- }
212
- )
213
 
214
  obs = env.step(action_obj)
215
 
@@ -219,16 +304,21 @@ async def main() -> None:
219
  rewards.append(reward)
220
  steps_taken = step
221
 
222
- log_step(step=step, action=action_str.replace('\n', ''), reward=reward, done=done, error=error)
223
  history.append(f"Step {step}: Action: {action_str} -> Result: {obs.last_action_result} | Error: {obs.error_message}")
224
 
225
  if done:
226
  break
227
 
228
- # Calculate a baseline score for the stdout logs (Graders handle real scoring)
229
- score = sum(rewards) + 1.0 if "submit_decision" in (obs.last_action or "") else 0.0
 
 
 
 
 
230
  score = min(max(score, 0.01), 0.99)
231
- success = score > 0.5
232
 
233
  finally:
234
  log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
 
7
  import json
8
  import textwrap
9
  import sys
10
+ import re
11
  from typing import List, Optional
12
  from openai import OpenAI
13
 
 
40
  2. {"action": {"action_type": "search_transactions", "account_id": "ACC-XXXX", "keyword": "invoice"}}
41
  3. {"action": {"action_type": "get_kyc_record", "entity_id": "ENT-XXXX"}}
42
  4. {"action": {"action_type": "submit_decision", "decision": "FRAUD", "evidence_links": ["ACC-1234"]}} (Use "CLEAR" for False Positives with empty evidence_links).
43
+
44
+ Token-saving style rule:
45
+ - Think in caveman style (short, simple words).
46
+ - Never output prose. Output JSON only.
47
+
48
+ Data rule:
49
+ - get_kyc_record must use ENT-XXXX only, never ACC-XXXX.
50
  """
51
  ).strip()
52
 
 
114
 
115
  raise ValueError("Completions API response text is empty")
116
 
117
+
118
+ def _coerce_json_object(raw_text: str) -> str:
119
+ text = raw_text.strip()
120
+ if text.startswith("```"):
121
+ text = text.replace("```json", "").replace("```", "").strip()
122
+
123
+ if text.startswith("{") and text.endswith("}"):
124
+ return text
125
+
126
+ start = text.find("{")
127
+ end = text.rfind("}")
128
+ if start != -1 and end > start:
129
+ return text[start : end + 1]
130
+
131
+ return text
132
+
133
+
134
+ def _build_recovery_action_from_obs(obs_dict: dict) -> dict:
135
+ """Use a non-terminal fallback action when model output is malformed."""
136
+ alert = str(obs_dict.get("alert_details", "") or "")
137
+ match = re.search(r"ACC-\d+", alert)
138
+ if match:
139
+ return {
140
+ "action": {
141
+ "action_type": "query_transactions",
142
+ "account_id": match.group(0),
143
+ "limit": 10,
144
+ "offset": 0,
145
+ }
146
+ }
147
+ return {
148
+ "action": {
149
+ "action_type": "submit_decision",
150
+ "decision": "CLEAR",
151
+ "evidence_links": [],
152
+ }
153
+ }
154
+
155
+
156
+ def _ensure_valid_action_json(raw_text: str, obs_dict: dict) -> str:
157
+ """Guarantee a valid action JSON string for downstream parsing."""
158
+ candidate = _coerce_json_object(raw_text)
159
+ try:
160
+ payload = json.loads(candidate)
161
+ if not isinstance(payload, dict):
162
+ raise ValueError("top-level JSON is not an object")
163
+ action = payload.get("action")
164
+ if not isinstance(action, dict):
165
+ raise ValueError("missing 'action' object")
166
+ action_type = action.get("action_type")
167
+ if not isinstance(action_type, str):
168
+ raise ValueError("missing 'action_type' string")
169
+ return json.dumps(payload, ensure_ascii=True)
170
+ except Exception as exc:
171
+ recovery_json = _build_recovery_action_from_obs(obs_dict)
172
+ print(
173
+ f"[DEBUG] Non-JSON/invalid model action; using recovery action ({exc})",
174
+ file=sys.stderr,
175
+ flush=True,
176
+ )
177
+ return json.dumps(recovery_json, ensure_ascii=True)
178
+
179
  def log_start(task: str, env: str, model: str) -> None:
180
  print(f"[START] task={task} env={env} model={model}", flush=True)
181
 
 
189
  rewards_str = ",".join(f"{r:.2f}" for r in rewards)
190
  print(f"[END] success={str(success).lower()} steps={steps} score={score:.2f} rewards={rewards_str}", flush=True)
191
 
192
+
193
+ def log_thought(step: int, thought: Optional[object]) -> None:
194
+ """Print model thought to stderr so stdout contract stays validator-safe."""
195
+ if thought is None:
196
+ return
197
+ if isinstance(thought, dict):
198
+ compact = json.dumps(thought, ensure_ascii=True)
199
+ else:
200
+ compact = str(thought)
201
+ compact = compact.replace("\n", " ").strip()
202
+ print(f"[THOUGHT] step={step} thought={compact}", file=sys.stderr, flush=True)
203
+
204
  def get_model_message(client: OpenAI, obs_dict: dict, history: List[str]) -> str:
205
  history_block = "\n".join(history[-5:]) if history else "No previous steps."
206
  user_prompt = f"Observation:\n{json.dumps(obs_dict, indent=2)}\n\nHistory:\n{history_block}\n\nProvide your next JSON action:"
 
212
  {"role": "system", "content": SYSTEM_PROMPT},
213
  {"role": "user", "content": user_prompt},
214
  ],
215
+ temperature=0.0,
216
+ max_tokens=1000,
217
+ response_format={"type": "json_object"},
218
  )
219
+ return _ensure_valid_action_json(_extract_text_from_chat_completion(completion), obs_dict)
220
  except Exception as chat_exc:
221
  # Retry via Responses API for OpenAI-compatible providers that do not
222
  # populate chat.completions choices consistently.
 
225
  model=MODEL_NAME,
226
  instructions=SYSTEM_PROMPT,
227
  input=user_prompt,
228
+ max_output_tokens=1000,
229
  )
230
+ return _ensure_valid_action_json(_extract_text_from_responses_api(response), obs_dict)
231
  except Exception as responses_exc:
232
  try:
233
  completion = client.completions.create(
234
  model=MODEL_NAME,
235
  prompt=f"{SYSTEM_PROMPT}\n\n{user_prompt}",
236
+ temperature=0.0,
237
  max_tokens=200,
238
  )
239
+ return _ensure_valid_action_json(_extract_text_from_completions_api(completion), obs_dict)
240
  except Exception as completions_exc:
241
  print(
242
  (
 
246
  file=sys.stderr,
247
  flush=True,
248
  )
249
+ return _ensure_valid_action_json(FALLBACK_ACTION_JSON, obs_dict)
250
 
251
  async def main() -> None:
252
  client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)
 
260
  steps_taken = 0
261
  score = 0.0
262
  success = False
263
+ had_parse_error = False
264
 
265
  log_start(task=task_name, env=BENCHMARK, model=MODEL_NAME)
266
 
 
275
  action_str = get_model_message(client, obs_dict, history)
276
 
277
  # Parse LLM string to Pydantic Model
278
+ action_for_log = action_str
279
  try:
280
+ clean_str = _coerce_json_object(action_str)
 
281
  action_json = json.loads(clean_str)
282
+ thought_for_log = action_json.get("thought")
283
+ if thought_for_log is None:
284
+ action_type = action_json.get("action", {}).get("action_type", "unknown")
285
+ thought_for_log = f"do {action_type} now"
286
+ log_thought(step=step, thought=thought_for_log)
287
  action_obj = AmlAction.model_validate(action_json)
288
  error = None
289
  except Exception as e:
290
  # Errors are data! If the LLM writes bad JSON, we catch it and force a dummy action
291
  # so the environment can return a schema error to the LLM.
292
+ had_parse_error = True
293
  error = f"JSON Parse/Schema Error: {str(e)}"
294
+ log_thought(step=step, thought="parse fail; use recovery action")
295
+ recovery_json = _build_recovery_action_from_obs(obs_dict)
296
+ action_obj = AmlAction.model_validate(recovery_json)
297
+ action_for_log = json.dumps(recovery_json, ensure_ascii=True)
 
 
 
 
 
298
 
299
  obs = env.step(action_obj)
300
 
 
304
  rewards.append(reward)
305
  steps_taken = step
306
 
307
+ log_step(step=step, action=action_for_log.replace('\n', ''), reward=reward, done=done, error=error)
308
  history.append(f"Step {step}: Action: {action_str} -> Result: {obs.last_action_result} | Error: {obs.error_message}")
309
 
310
  if done:
311
  break
312
 
313
+ # Keep score in open interval (0,1) and avoid false positives on parse failures.
314
+ if had_parse_error or obs.error_message:
315
+ score = 0.05
316
+ elif "submit_decision" in (obs.last_action or ""):
317
+ score = 0.75
318
+ else:
319
+ score = 0.25
320
  score = min(max(score, 0.01), 0.99)
321
+ success = (not had_parse_error) and (obs.error_message is None) and score > 0.5
322
 
323
  finally:
324
  log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
models.py CHANGED
@@ -11,7 +11,7 @@ The AML_env environment is a simple test environment that echoes back messages.
11
  """
12
 
13
  from openenv.core.env_server.types import Action, Observation
14
- from pydantic import Field
15
  from typing import List, Literal, Optional, Any, Union
16
 
17
  # ==========================================
@@ -47,8 +47,28 @@ class SubmitDecision(Action):
47
  decision: Literal["FRAUD", "CLEAR"] = Field(description="Your final verdict.")
48
  evidence_links: List[str] = Field(description="List of ACC-XXXX or ENT-XXXX IDs proving fraud.")
49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
  # The master Action model using Union
51
  class AmlAction(Action):
 
 
 
 
 
52
  action: Union[QueryTransactions, SearchTransactions, GetKYCRecord, SubmitDecision] = Field(
53
  discriminator='action_type'
54
  )
 
11
  """
12
 
13
  from openenv.core.env_server.types import Action, Observation
14
+ from pydantic import BaseModel, Field
15
  from typing import List, Literal, Optional, Any, Union
16
 
17
  # ==========================================
 
47
  decision: Literal["FRAUD", "CLEAR"] = Field(description="Your final verdict.")
48
  evidence_links: List[str] = Field(description="List of ACC-XXXX or ENT-XXXX IDs proving fraud.")
49
 
50
+
51
+ # ==========================================
52
+ # OPTIONAL THOUGHT SCRATCHPAD
53
+ # ==========================================
54
+ class ThoughtProcess(BaseModel):
55
+ observation: str = Field(
56
+ description="Analyze what just happened and summarize useful clues from the last tool output."
57
+ )
58
+ plan: str = Field(
59
+ description="State the next investigation step and why it follows from the current evidence."
60
+ )
61
+ action: str = Field(
62
+ description="Explain which tool call you are about to make and with which key parameters."
63
+ )
64
+
65
  # The master Action model using Union
66
  class AmlAction(Action):
67
+ # Keep this optional so existing inference JSON remains compatible.
68
+ thought: Optional[ThoughtProcess] = Field(
69
+ default=None,
70
+ description="Optional ReAct-style scratchpad for model reasoning.",
71
+ )
72
  action: Union[QueryTransactions, SearchTransactions, GetKYCRecord, SubmitDecision] = Field(
73
  discriminator='action_type'
74
  )
server/AML_env_environment.py CHANGED
@@ -13,6 +13,7 @@ explore a massive transaction graph using a strict budget.
13
 
14
  import json
15
  import os
 
16
  from pathlib import Path
17
  from uuid import uuid4
18
 
@@ -73,9 +74,15 @@ class AmlEnvironment(Environment):
73
  # Sort transactions by timestamp to ensure deterministic pagination
74
  self.transactions_db = sorted(txn_list, key=lambda x: x.get("timestamp", ""))
75
 
76
- print(f"[AML-ENV] Loaded {len(self.entities_db)} entities, {len(self.accounts_db)} accounts, {len(self.transactions_db)} transactions.")
 
 
 
77
  except Exception as e:
78
- print(f"[AML-ENV ERROR] Failed to load data from {data_dir}. Ensure JSON files exist. Error: {e}")
 
 
 
79
  self.entities_db = {}
80
  self.accounts_db = {}
81
  self.transactions_db = []
 
13
 
14
  import json
15
  import os
16
+ import sys
17
  from pathlib import Path
18
  from uuid import uuid4
19
 
 
74
  # Sort transactions by timestamp to ensure deterministic pagination
75
  self.transactions_db = sorted(txn_list, key=lambda x: x.get("timestamp", ""))
76
 
77
+ print(
78
+ f"[AML-ENV] Loaded {len(self.entities_db)} entities, {len(self.accounts_db)} accounts, {len(self.transactions_db)} transactions.",
79
+ file=sys.stderr,
80
+ )
81
  except Exception as e:
82
+ print(
83
+ f"[AML-ENV ERROR] Failed to load data from {data_dir}. Ensure JSON files exist. Error: {e}",
84
+ file=sys.stderr,
85
+ )
86
  self.entities_db = {}
87
  self.accounts_db = {}
88
  self.transactions_db = []