Instructions to use poolside-laguna-hackathon/trade-pool with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use poolside-laguna-hackathon/trade-pool with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,85 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
base_model: poolside/Laguna-XS.2
|
| 4 |
+
tags:
|
| 5 |
+
- reinforcement-learning
|
| 6 |
+
- lora
|
| 7 |
+
- trading
|
| 8 |
+
- coding-agent
|
| 9 |
+
- verifiers
|
| 10 |
+
- prime-intellect
|
| 11 |
+
- poolside-hackathon
|
| 12 |
+
library_name: peft
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# TradePool β a self-improving trading coding-agent (Laguna XS.2 LoRA)
|
| 16 |
+
|
| 17 |
+
**Poolside Γ Prime Intellect Research Hackathon β Foundations track.**
|
| 18 |
+
|
| 19 |
+
A LoRA adapter for `poolside/Laguna-XS.2`, trained with reinforcement learning so the
|
| 20 |
+
model becomes a **coding agent that writes causal crypto trading-strategy functions**,
|
| 21 |
+
scored by a leak-proof out-of-sample backtest.
|
| 22 |
+
|
| 23 |
+
## The idea in one line
|
| 24 |
+
> Trading discipline that normally lives as *prompt text* (a memory file of rules) is
|
| 25 |
+
> turned into **adapter weights** by rewarding disciplined, profitable behaviour on
|
| 26 |
+
> held-out market data. The verifier *is* the backtest.
|
| 27 |
+
|
| 28 |
+
## How it works
|
| 29 |
+
1. **Environment** (`verifiers`, v0 `SingleTurnEnv`, pushed to `stimulir/trade-pool`):
|
| 30 |
+
the agent is given a Base-chain token's in-sample price history + a library of causal
|
| 31 |
+
indicators (RSI, MACD, MAs, z-score, Bollinger, volatility) and must write
|
| 32 |
+
`def strategy(features, position) -> target_position`.
|
| 33 |
+
2. **Verifier / reward** β the strategy runs bar-by-bar over a **held-out** window
|
| 34 |
+
(lookahead is structurally impossible; the function never sees future bars), scored by
|
| 35 |
+
a weighted rubric:
|
| 36 |
+
- OOS Sharpe (0.40) Β· beats buy-and-hold (0.20) Β· drawdown control (0.15) Β·
|
| 37 |
+
sane exposure (0.10) Β· transaction cost (0.05) Β· valid+actually-trades (0.10)
|
| 38 |
+
- Hard gates β reward 0: invalid code, lookahead, NaN equity, **do-nothing strategies**.
|
| 39 |
+
3. **Training** β Prime Hosted RL (GRPO), `poolside/Laguna-XS.2`, 50 steps, batch 128,
|
| 40 |
+
`rollouts_per_example=8`, `enable_thinking=false`. FREE hosted Laguna run.
|
| 41 |
+
|
| 42 |
+
## Results
|
| 43 |
+
RL produced a clean, monotonic reward climb on the training environment:
|
| 44 |
+
|
| 45 |
+
| Stage | Total reward |
|
| 46 |
+
|---|---|
|
| 47 |
+
| step ~0 (baseline) | ~0.15 |
|
| 48 |
+
| step ~8 | 0.19 |
|
| 49 |
+
| step ~11 | 0.28 |
|
| 50 |
+
| step ~13 (peak) | ~0.42 |
|
| 51 |
+
| step ~50 (final) | ~0.34β0.41 |
|
| 52 |
+
|
| 53 |
+
Every rubric component improved together (not single-metric gaming):
|
| 54 |
+
`reward_valid` 0.30 β ~0.70 (writes valid trading code far more often),
|
| 55 |
+
`reward_sharpe` 0.10 β 0.33, drawdown/exposure/cost all up. Held-out-symbol eval on base
|
| 56 |
+
Laguna scored `reward_valid` 0.75 / `reward_sharpe` 0.45, confirming the env is in the
|
| 57 |
+
healthy trainable band before training.
|
| 58 |
+
|
| 59 |
+
## The novel contribution: closing the self-improvement loop
|
| 60 |
+
- **Weights channel:** each RL iteration warm-starts from the prior adapter
|
| 61 |
+
(`checkpoint_id`) β genuine parametric continuation.
|
| 62 |
+
- **Curriculum channel:** a reflection step reads the prior adapter's out-of-sample eval
|
| 63 |
+
and shifts the next run's objective (sharpe β min-drawdown β balanced) and focuses the
|
| 64 |
+
weakest symbols β the agent's own results drive its next curriculum.
|
| 65 |
+
- **Falsifiable proof ("memory is the adapter"):** the discipline block (distilled from
|
| 66 |
+
618 real prior trading decisions) can be **stripped from the prompt**
|
| 67 |
+
(`use_seed_principles=false`); if the trained adapter stays disciplined, the rules now
|
| 68 |
+
live in the weights, not the prompt.
|
| 69 |
+
|
| 70 |
+
## Files
|
| 71 |
+
- `trade_pool/` β the full `verifiers` environment (features, causal backtester, executor,
|
| 72 |
+
rubric, data) β installable, builds to a wheel, bundles its own OHLCV tape.
|
| 73 |
+
- `adapter/` β the trained LoRA adapter weights for `poolside/Laguna-XS.2`.
|
| 74 |
+
- `configs/` β the RL training config(s).
|
| 75 |
+
- `reward_curve.txt`, `eval_*.json` β training + eval metrics.
|
| 76 |
+
|
| 77 |
+
## Reproduce
|
| 78 |
+
```bash
|
| 79 |
+
prime env push --path ./trade_pool --visibility PRIVATE # -> <you>/trade-pool
|
| 80 |
+
prime eval run <you>/trade-pool -m poolside/laguna-xs.2 -n 8 -r 1
|
| 81 |
+
prime train run configs/iter_1.toml # FREE hosted Laguna RL
|
| 82 |
+
prime deployments create <adapter_id> # serve the adapter
|
| 83 |
+
```
|
| 84 |
+
|
| 85 |
+
Built at the Poolside London hackathon, 29β30 May 2026. Team: **TradePool** (Tosin Dairo).
|