Spaces:

NeerajCodz
/

scrapeRL

Running

App Files Files Community

scrapeRL / docs /observability.md

NeerajCodz

docs: init proto

24f0bf0 26 days ago

preview code

raw

history blame contribute delete

2.98 kB

	# observability-and-dashboard

	## overview

	Observability provides deep insight into runtime behavior, model usage, tool execution, memory quality, and rewards.

	## dashboard-sections

	### 1-live-thought-stream

	- chronological reasoning notes
	- model/router choice trace
	- action confidence timeline
	- override events

	### 2-navigation-map

	Graph of visited pages:

	- nodes = URLs
	- edges = transitions
	- node color = relevance/confidence
	- revisit highlighting

	### 3-mcp-usage-panel

	- tool call count by server
	- avg latency by tool
	- error rate and retries
	- top successful tool chains

	### 4-memory-viewer

	- inspect short/working/long/shared memory
	- filter by task/domain/confidence
	- edit/delete entries
	- prune previews

	### 5-reward-analytics

	- per-step reward breakdown
	- component contribution trends
	- penalty heatmap
	- episode comparison

	### 6-cost-and-token-monitor

	- per-provider usage
	- per-model token counts
	- cumulative cost vs budget
	- forecasted burn rate

	## core-metrics

	### agent-metrics

	- task completion rate
	- avg steps to completion
	- recovery score
	- generalization score
	- exploration ratio

	### tool-metrics

	- tool success rate
	- timeout ratio
	- fallback frequency
	- schema validation failures

	### memory-metrics

	- retrieval hit rate
	- relevance score distribution
	- prune rate
	- memory-assisted success ratio

	### search-metrics

	- query success rate
	- multi-hop depth distribution
	- credibility score average
	- duplicate result ratio

	## logging-model

	Structured logs (JSON):

	```json
	{
	"timestamp": "2026-03-27T00:00:00Z",
	"episode_id": "ep_123",
	"step": 7,
	"event": "tool_call",
	"tool": "beautifulsoup.find_all",
	"latency_ms": 54,
	"success": true,
	"reward_delta": 0.08
	}
	```

	## tracing

	Per-episode trace includes:

	- observations
	- actions
	- rewards
	- tool calls
	- memory operations
	- final submission and grader results

	## alerts

	Configurable alerts:

	- budget threshold crossed
	- error spike
	- tool outage
	- memory bloat
	- anomalous low reward streak

	## apis

	- `GET /api/metrics/summary`
	- `GET /api/metrics/timeseries`
	- `GET /api/traces/{episode_id}`
	- `GET /api/costs`
	- `GET /api/memory/stats`
	- `GET /api/tools/stats`

	## recommended-dashboard-layout

	1. Top row: completion, cost, latency, error rate
	2. Mid row: thought stream + navigation graph
	3. Lower row: reward breakdown + MCP usage + memory viewer
	4. Bottom row: raw trace and export controls

	## export-and-audit

	Exports:

	- JSON trace
	- CSV metrics
	- reward analysis report
	- model usage report

	All exports include episode and configuration fingerprints for reproducibility.


	## related-api-reference

	\| item \| value \|
	\| --- \| --- \|
	\| api-reference \| `api-reference.md` \|

	## document-metadata

	\| key \| value \|
	\| --- \| --- \|
	\| document \| `observability.md` \|
	\| status \| active \|

	## document-flow

	```mermaid
	flowchart TD
	A[document] --> B[key-sections]
	B --> C[implementation]
	B --> D[operations]
	B --> E[validation]
	```