Spaces:

NeerajCodz
/

scrapeRL

Running

App Files Files Community

scrapeRL / docs /features.md

NeerajCodz

docs: init proto

24f0bf0 22 days ago

preview code

raw

history blame contribute delete

2.9 kB

	# advanced-features

	## overview

	This document captures high-end platform capabilities beyond baseline extraction.

	## 1-self-improving-agent

	Post-episode learning loop:

	- classify failures by root cause
	- update selector/tool strategy priors
	- persist successful patterns with confidence
	- penalize repeated failure paths

	## 2-strategy-library

	Built-in strategies:

	- Search-first
	- Direct extraction
	- Multi-hop reasoning
	- Verification-first
	- Table-first

	Each strategy tracks:

	- win rate
	- cost per success
	- average latency
	- domain affinity

	## 3-explainable-ai-mode

	For every decision, provide:

	- selected action and confidence
	- top alternatives considered
	- evidence from memory/tools/search
	- expected reward impact

	## 4-human-in-the-loop

	Intervention controls:

	- approve/reject action
	- force tool/model switch
	- enforce verification before submit
	- set hard constraints during runtime

	## 5-scenario-simulator

	Stress testing scenarios:

	- noisy HTML
	- broken DOM
	- pagination traps
	- conflicting facts
	- anti-scraping patterns

	Outputs:

	- robustness score
	- recovery score
	- strategy suitability map

	## 6-context-compression

	- rolling summaries
	- salience-based pruning
	- token-aware context packing
	- differential memory refresh

	## 7-batch-parallel-runtime

	- task queue with priorities
	- parallel extraction workers
	- bounded concurrency
	- idempotent retry handling

	## 8-prompt-versioning-and-evaluation

	- versioned prompt templates
	- A/B testing by task type
	- reward/cost comparison dashboards
	- rollout and rollback controls

	## 9-mcp-toolchain-composition

	Composable flow examples:

	- Browser MCP -> Parser MCP -> Validator MCP -> DB MCP
	- Search MCP -> Fetch MCP -> Extract MCP -> Verify MCP

	## 10-governance-and-safety

	- tool allowlist/denylist
	- PII redaction in logs
	- budget and rate guardrails
	- provenance tracking for extracted facts

	## feature-flags

	All advanced features should be toggleable from Settings and safely disabled by default where cost/latency impact is high.

	## api-driven-feature-map

	\| feature-domain \| endpoint-surface \|
	\| --- \| --- \|
	\| agent planning and execution \| `/api/agents/run`, `/api/agents/plan`, `/api/agents/message` \|
	\| dynamic scraping \| `/api/scrape/stream`, `/api/scrape/`, `/api/scrape/sessions` \|
	\| memory operations \| `/api/memory/store`, `/api/memory/query`, `/api/memory/consolidate` \|
	\| tool and plugin usage \| `/api/tools/registry`, `/api/plugins/tools`, `/api/plugins/install` \|
	\| model and provider controls \| `/api/settings/model`, `/api/providers/models/all`, `/api/providers/costs/summary` \|

	See `api-reference.md` for full endpoint signatures.

	## document-metadata

	\| key \| value \|
	\| --- \| --- \|
	\| document \| `features.md` \|
	\| status \| active \|

	## document-flow

	```mermaid
	flowchart TD
	A[document] --> B[key-sections]
	B --> C[implementation]
	B --> D[operations]
	B --> E[validation]
	```