Spaces:
Running
Running
| # observability-and-dashboard | |
| ## overview | |
| Observability provides deep insight into runtime behavior, model usage, tool execution, memory quality, and rewards. | |
| ## dashboard-sections | |
| ### 1-live-thought-stream | |
| - chronological reasoning notes | |
| - model/router choice trace | |
| - action confidence timeline | |
| - override events | |
| ### 2-navigation-map | |
| Graph of visited pages: | |
| - nodes = URLs | |
| - edges = transitions | |
| - node color = relevance/confidence | |
| - revisit highlighting | |
| ### 3-mcp-usage-panel | |
| - tool call count by server | |
| - avg latency by tool | |
| - error rate and retries | |
| - top successful tool chains | |
| ### 4-memory-viewer | |
| - inspect short/working/long/shared memory | |
| - filter by task/domain/confidence | |
| - edit/delete entries | |
| - prune previews | |
| ### 5-reward-analytics | |
| - per-step reward breakdown | |
| - component contribution trends | |
| - penalty heatmap | |
| - episode comparison | |
| ### 6-cost-and-token-monitor | |
| - per-provider usage | |
| - per-model token counts | |
| - cumulative cost vs budget | |
| - forecasted burn rate | |
| ## core-metrics | |
| ### agent-metrics | |
| - task completion rate | |
| - avg steps to completion | |
| - recovery score | |
| - generalization score | |
| - exploration ratio | |
| ### tool-metrics | |
| - tool success rate | |
| - timeout ratio | |
| - fallback frequency | |
| - schema validation failures | |
| ### memory-metrics | |
| - retrieval hit rate | |
| - relevance score distribution | |
| - prune rate | |
| - memory-assisted success ratio | |
| ### search-metrics | |
| - query success rate | |
| - multi-hop depth distribution | |
| - credibility score average | |
| - duplicate result ratio | |
| ## logging-model | |
| Structured logs (JSON): | |
| ```json | |
| { | |
| "timestamp": "2026-03-27T00:00:00Z", | |
| "episode_id": "ep_123", | |
| "step": 7, | |
| "event": "tool_call", | |
| "tool": "beautifulsoup.find_all", | |
| "latency_ms": 54, | |
| "success": true, | |
| "reward_delta": 0.08 | |
| } | |
| ``` | |
| ## tracing | |
| Per-episode trace includes: | |
| - observations | |
| - actions | |
| - rewards | |
| - tool calls | |
| - memory operations | |
| - final submission and grader results | |
| ## alerts | |
| Configurable alerts: | |
| - budget threshold crossed | |
| - error spike | |
| - tool outage | |
| - memory bloat | |
| - anomalous low reward streak | |
| ## apis | |
| - `GET /api/metrics/summary` | |
| - `GET /api/metrics/timeseries` | |
| - `GET /api/traces/{episode_id}` | |
| - `GET /api/costs` | |
| - `GET /api/memory/stats` | |
| - `GET /api/tools/stats` | |
| ## recommended-dashboard-layout | |
| 1. Top row: completion, cost, latency, error rate | |
| 2. Mid row: thought stream + navigation graph | |
| 3. Lower row: reward breakdown + MCP usage + memory viewer | |
| 4. Bottom row: raw trace and export controls | |
| ## export-and-audit | |
| Exports: | |
| - JSON trace | |
| - CSV metrics | |
| - reward analysis report | |
| - model usage report | |
| All exports include episode and configuration fingerprints for reproducibility. | |
| ## related-api-reference | |
| | item | value | | |
| | --- | --- | | |
| | api-reference | `api-reference.md` | | |
| ## document-metadata | |
| | key | value | | |
| | --- | --- | | |
| | document | `observability.md` | | |
| | status | active | | |
| ## document-flow | |
| ```mermaid | |
| flowchart TD | |
| A[document] --> B[key-sections] | |
| B --> C[implementation] | |
| B --> D[operations] | |
| B --> E[validation] | |
| ``` | |