Spaces:
Running
unified-memory-system
table-of-contents
- Overview
- Memory Architecture
- Memory Layers
- Memory Operations
- Implementation Details
- Configuration
- Best Practices
overview
The Unified Memory System is the most critical upgrade for the WebScraper-OpenEnv agent. It provides persistent, contextual, and hierarchical memory across episodes, enabling the agent to learn from past experiences, maintain reasoning context, and share knowledge across multiple agents.
memory-api-contract
| operation | endpoint |
|---|---|
| store-entry | POST /api/memory/store |
| query-entries | POST /api/memory/query |
| get-entry | GET /api/memory/{entry_id} |
| update-entry | PUT /api/memory/{entry_id} |
| delete-entry | DELETE /api/memory/{entry_id} |
| layer-stats | GET /api/memory/stats/overview |
| clear-layer | DELETE /api/memory/clear/{memory_type} |
| consolidate | POST /api/memory/consolidate |
For request and response details, see api-reference.md.
why-memory-matters
Without memory:
- Agents repeat the same mistakes across episodes
- No learning from successful extraction patterns
- Cannot maintain context across long scraping sessions
- Unable to share knowledge between multiple agents
- Limited by context window size
With unified memory:
- Learn successful extraction strategies
- Remember failed approaches to avoid repetition
- Maintain reasoning context across steps
- Share discoveries across agent instances
- Overcome context window limitations
memory-architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Unified Memory System β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββββ β
β β Short-Term β β Working β β Long-Term β β
β β Memory β β Memory β β Memory β β
β β (Episode) β β (Reasoning) β β (Persistent) β β
β ββββββββββ¬ββββββββ βββββββββ¬βββββββββ ββββββββββ¬ββββββββββ β
β β β β β
β ββββββββββββββββββββΌββββββββββββββββββββββ β
β β β
β βββββββββββΌβββββββββββ β
β β Memory Router β β
β β - Query planner β β
β β - Context builder β β
β β - Summarizer β β
β βββββββββββ¬βββββββββββ β
β β β
β ββββββββββββββββββββΌβββββββββββββββββββ β
β β β β β
β ββββββββββΌβββββββββ ββββββββΌββββββββββ βββββΌβββββββββββ β
β β Shared Memory β β Vector Index β β MCP Storage β β
β β (Multi-Agent) β β (FAISS/Qdrant)β β (File/DB) β β
β βββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
memory-layers
1-short-term-memory-per-episode
Purpose: Tracks the current scraping session state.
Lifecycle: Exists for one episode, cleared on reset().
Data Structure:
class EpisodeMemory(BaseModel):
episode_id: str
task_id: str
visited_urls: List[str] # Navigation history
extracted_data: Dict[str, Any] # Field β value mappings
actions_history: List[Action] # All actions taken
intermediate_notes: List[str] # Agent's reasoning notes
observations: List[Observation] # All observations received
page_summaries: Dict[str, str] # URL β content summary
extraction_attempts: Dict[str, List[Any]] # Field β list of attempts
timestamp_created: datetime
timestamp_updated: datetime
Use Cases:
- Track which pages have been visited to avoid cycles
- Remember what data has been extracted
- Maintain action history for debugging
- Store intermediate reasoning
Example:
# Agent navigating a multi-page catalog
episode_memory = {
"visited_urls": [
"/catalog/page/1",
"/catalog/page/2",
"/product/12345"
],
"extracted_data": {
"product_name": "Widget Pro",
"price": "$49.99"
},
"intermediate_notes": [
"Price found in span.product-price",
"Next page link present, continuing pagination"
]
}
2-working-memory-agent-thinking
Purpose: Temporary reasoning buffer for active decision-making.
Lifecycle: Cleared after each action decision, or kept for multi-step reasoning.
Data Structure:
class WorkingMemory(BaseModel):
current_goal: str # Active objective
reasoning_steps: List[str] # Chain of thought
considered_actions: List[Action] # Actions being evaluated
scratchpad: Dict[str, Any] # Temporary calculations
active_hypotheses: List[str] # Predictions to test
context_window: List[str] # Relevant memory chunks
attention_focus: Optional[str] # Current DOM element/area of focus
Use Cases:
- Chain-of-thought reasoning before action selection
- Evaluate multiple action candidates
- Maintain focus during complex extraction
- Store temporary parsing results
Example:
working_memory = {
"current_goal": "Extract product price from listing",
"reasoning_steps": [
"Step 1: Search HTML for price indicators ($, β¬, price)",
"Step 2: Found 3 candidates: $49.99, $39.99 (strikethrough), $5.99 (shipping)",
"Step 3: $49.99 is in <span class='product-price'>, most likely correct",
"Step 4: Extract using selector span.product-price"
],
"considered_actions": [
Action(action_type="EXTRACT_FIELD", selector="span.price"),
Action(action_type="EXTRACT_FIELD", selector="span.product-price"),
Action(action_type="SEARCH_PAGE", query="price.*\\$\\d+")
],
"attention_focus": "div.product-details"
}
3-long-term-memory-persistent
Purpose: Store learned patterns, strategies, and historical data across all episodes.
Lifecycle: Persists indefinitely via MCP storage and vector database.
Data Structure:
class LongTermMemory(BaseModel):
# Vector embeddings for semantic search
embeddings_index: VectorIndex # FAISS, Qdrant, or Pinecone
# Successful extraction patterns
learned_patterns: List[ExtractionPattern]
# Historical performance data
past_episodes: List[EpisodeSummary]
# Failed attempts (to avoid repetition)
failed_patterns: List[FailedPattern]
# Domain knowledge
website_schemas: Dict[str, WebsiteSchema] # domain β common patterns
# Selector library
selector_success_rate: Dict[str, float] # selector β success rate
Extraction Pattern:
class ExtractionPattern(BaseModel):
pattern_id: str
field_name: str # e.g., "price"
selector: str # e.g., "span.product-price"
selector_type: str # "css" | "xpath" | "label"
success_count: int # How many times it worked
failure_count: int # How many times it failed
domains: List[str] # Which websites it works on
confidence: float # 0.0 to 1.0
examples: List[str] # Sample extracted values
created_at: datetime
last_used: datetime
Use Cases:
- Retrieve successful selectors for similar tasks
- Avoid repeating failed extraction attempts
- Learn website-specific patterns
- Build a library of proven strategies
Example Query:
# Agent needs to extract "price" from a new e-commerce page
similar_patterns = long_term_memory.search(
query="price extraction e-commerce",
filters={"field_name": "price", "confidence": ">0.8"},
limit=5
)
# Returns:
[
ExtractionPattern(
selector="span.product-price",
success_count=42,
confidence=0.95,
domains=["shop.example.com", "store.example.org"]
),
ExtractionPattern(
selector="div.price-box span[itemprop='price']",
success_count=38,
confidence=0.92,
domains=["ecommerce.example.net"]
),
...
]
4-shared-memory-multi-agent
Purpose: Enable knowledge sharing across multiple agent instances.
Lifecycle: Persistent, synchronized across all agents.
Data Structure:
class SharedMemory(BaseModel):
global_knowledge_base: Dict[str, Any] # Shared facts and patterns
agent_messages: List[AgentMessage] # Inter-agent communication
task_state: Dict[str, TaskState] # Collaborative task status
distributed_discoveries: List[Discovery] # Findings from all agents
consensus_data: Dict[str, ConsensusValue] # Voted/validated facts
Use Cases:
- Multiple agents scraping different sections of a large site
- Collaborative fact verification
- Distributed catalog scraping
- Consensus-based data validation
Example:
# Agent A discovers a pattern
agent_a.shared_memory.broadcast(
AgentMessage(
sender="agent_a",
message_type="PATTERN_DISCOVERED",
data={
"pattern": "Product SKU always in span.sku-code",
"confidence": 0.89,
"domain": "shop.example.com"
}
)
)
# Agent B receives and applies the pattern
agent_b_discovers = agent_b.shared_memory.receive_messages(
message_type="PATTERN_DISCOVERED"
)
# Agent B can now use this selector without rediscovering it
memory-operations
core-actions
The memory system exposes the following actions to the agent:
1-write-memory
Store information in the appropriate memory layer.
class WriteMemoryAction(Action):
action_type: Literal["WRITE_MEMORY"]
memory_layer: Literal["short_term", "working", "long_term", "shared"]
key: str
value: Any
metadata: Optional[Dict[str, Any]] = None
ttl: Optional[int] = None # Time-to-live in seconds (for working memory)
Example:
# Store a successful extraction pattern
Action(
action_type="WRITE_MEMORY",
memory_layer="long_term",
key="pattern:price:span.product-price",
value={
"selector": "span.product-price",
"field": "price",
"success_count": 1,
"domain": "shop.example.com"
},
metadata={"task_id": "task_medium", "episode_id": "ep_123"}
)
2-read-memory
Retrieve information from memory.
class ReadMemoryAction(Action):
action_type: Literal["READ_MEMORY"]
memory_layer: Literal["short_term", "working", "long_term", "shared"]
key: Optional[str] = None # Specific key (exact match)
query: Optional[str] = None # Semantic search query
filters: Optional[Dict] = None # Metadata filters
limit: int = 10 # Max results
Example:
# Semantic search for price extraction patterns
Action(
action_type="READ_MEMORY",
memory_layer="long_term",
query="how to extract price from e-commerce product page",
filters={"field_name": "price", "confidence": ">0.7"},
limit=5
)
3-search-memory
Advanced semantic search across memory layers.
class SearchMemoryAction(Action):
action_type: Literal["SEARCH_MEMORY"]
query: str # Natural language query
memory_layers: List[str] # Which layers to search
search_mode: Literal["semantic", "keyword", "hybrid"]
time_range: Optional[TimeRange] # Filter by recency
min_relevance: float = 0.5 # Minimum similarity score
Example:
# Find all successful pagination strategies
Action(
action_type="SEARCH_MEMORY",
query="successful pagination next page navigation strategies",
memory_layers=["long_term", "shared"],
search_mode="semantic",
min_relevance=0.7
)
4-summarize-memory
Compress and summarize memory to manage context window.
class SummarizeMemoryAction(Action):
action_type: Literal["SUMMARIZE_MEMORY"]
memory_layer: str
summarization_strategy: Literal["importance", "recency", "relevance"]
target_size: int # Target summary size in tokens
preserve_keys: List[str] # Never summarize these
5-prune-memory
Remove low-value or outdated memories.
class PruneMemoryAction(Action):
action_type: Literal["PRUNE_MEMORY"]
memory_layer: str
pruning_strategy: Literal["lru", "low_confidence", "old_age"]
threshold: float # Confidence/age threshold
implementation-details
vector-database-integration
Supported Backends:
- FAISS (default, local, no external dependencies)
- Qdrant (distributed, production-ready)
- Pinecone (managed, cloud-based)
- Weaviate (open-source, GraphQL API)
Configuration:
class VectorDBConfig(BaseModel):
provider: Literal["faiss", "qdrant", "pinecone", "weaviate"]
embedding_model: str = "text-embedding-3-small" # OpenAI
dimension: int = 1536
similarity_metric: Literal["cosine", "euclidean", "dot_product"] = "cosine"
index_type: str = "IVF" # FAISS-specific
connection_params: Dict[str, Any] # Provider-specific
Embedding Pipeline:
class MemoryEmbedder:
def embed_pattern(self, pattern: ExtractionPattern) -> np.ndarray:
"""Convert extraction pattern to embedding."""
text = f"""
Field: {pattern.field_name}
Selector: {pattern.selector}
Type: {pattern.selector_type}
Context: {' '.join(pattern.examples[:3])}
"""
return self.embedding_model.encode(text)
def embed_query(self, query: str) -> np.ndarray:
"""Convert search query to embedding."""
return self.embedding_model.encode(query)
mcp-storage-integration
Storage Backends:
- File System MCP (local JSON/SQLite files)
- PostgreSQL MCP (relational storage)
- MongoDB MCP (document storage)
- Redis MCP (fast cache + pub/sub for shared memory)
Example MCP Configuration:
{
"mcpServers": {
"memory-storage": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "./memory_data"],
"enabled": true,
"autoDownload": false
},
"memory-cache": {
"command": "redis-mcp-server",
"args": ["--host", "localhost", "--port", "6379"],
"enabled": true,
"autoDownload": true
}
}
}
memory-router
The Memory Router intelligently decides which memory layer to query based on the request:
class MemoryRouter:
def route_query(self, query: str, context: Dict) -> List[str]:
"""Determine which memory layers to search."""
layers = []
# Recent action history β short-term
if "last few" in query or "current episode" in query:
layers.append("short_term")
# Active reasoning β working
if "consider" in query or "evaluate" in query:
layers.append("working")
# Historical patterns β long-term
if "similar" in query or "previously" in query or "learned" in query:
layers.append("long_term")
# Other agents' discoveries β shared
if "other agents" in query or "consensus" in query:
layers.append("shared")
return layers if layers else ["long_term"] # Default
context-window-optimization
Problem: LLMs have limited context windows. Memory must be compressed.
Solutions:
- Hierarchical Summarization:
class MemorySummarizer:
def summarize_episode(self, episode_memory: EpisodeMemory) -> str:
"""Compress episode into key points."""
summary = f"Episode {episode_memory.episode_id} ({episode_memory.task_id}):\n"
summary += f"- Visited {len(episode_memory.visited_urls)} pages\n"
summary += f"- Extracted {len(episode_memory.extracted_data)} fields\n"
summary += f"- {len(episode_memory.actions_history)} actions taken\n"
# Highlight key discoveries
if episode_memory.intermediate_notes:
summary += f"\nKey findings:\n"
for note in episode_memory.intermediate_notes[-3:]: # Last 3 notes
summary += f" β’ {note}\n"
return summary
- Importance Scoring:
class MemoryImportanceScorer:
def score(self, memory_item: Any) -> float:
"""Rate importance of memory (0.0 to 1.0)."""
score = 0.0
# Recency bonus
age_days = (datetime.now() - memory_item.created_at).days
score += max(0, 1.0 - age_days / 30) * 0.3
# Success rate bonus
if hasattr(memory_item, 'success_count'):
score += memory_item.confidence * 0.4
# Usage frequency bonus
if hasattr(memory_item, 'last_used'):
days_since_use = (datetime.now() - memory_item.last_used).days
score += max(0, 1.0 - days_since_use / 7) * 0.3
return min(score, 1.0)
- Automatic Pruning:
class MemoryPruner:
def prune_low_value(self, memory_store: Dict, threshold: float = 0.3):
"""Remove memories below importance threshold."""
scorer = MemoryImportanceScorer()
to_remove = []
for key, item in memory_store.items():
if scorer.score(item) < threshold:
to_remove.append(key)
for key in to_remove:
del memory_store[key]
return len(to_remove)
configuration
settings-panel
Memory Settings Tab:
class MemorySettings(BaseModel):
# Enable/disable layers
enable_short_term: bool = True
enable_working: bool = True
enable_long_term: bool = True
enable_shared: bool = False # Off by default (multi-agent)
# Size limits
max_episode_memory_mb: int = 10
max_working_memory_items: int = 50
max_long_term_patterns: int = 10000
# Vector DB settings
vector_db_provider: str = "faiss"
embedding_model: str = "text-embedding-3-small"
# MCP storage settings
storage_backend: str = "filesystem"
storage_path: str = "./memory_data"
# Pruning settings
auto_prune: bool = True
prune_threshold: float = 0.3
prune_interval_hours: int = 24
# Context window optimization
auto_summarize: bool = True
max_context_tokens: int = 4000
UI Example:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Memory Settings β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Enable Short-Term Memory (Episode) β
β Enable Working Memory (Reasoning) β
β Enable Long-Term Memory (Persistent) β
β Enable Shared Memory (Multi-Agent) β
β β
β Memory Size Limits: β
β Short-Term: [10] MB per episode β
β Working: [50] items max β
β Long-Term: [10000] patterns max β
β β
β Vector Database: β
β Provider: [FAISS βΌ] β
β Embedding: [text-embedding-3-small βΌ] β
β β
β Storage Backend: β
β Type: [Filesystem βΌ] β
β Path: [./memory_data ] [Browse] β
β β
β Auto-Pruning: β
β Enabled β
β Threshold: [0.3] (0.0 = keep all, 1.0 = keep only best) β
β Interval: [24] hours β
β β
β [Save Settings] [Reset to Defaults] β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
best-practices
1-memory-hygiene
Do:
- Summarize episode memory before storing in long-term
- Prune low-confidence patterns regularly
- Validate patterns before adding to long-term memory
- Tag memories with metadata (task_id, domain, confidence)
Don't:
- Store raw HTML in long-term memory (use summaries)
- Keep failed patterns without analysis
- Allow unbounded memory growth
- Store sensitive data without encryption
2-query-optimization
Do:
- Use semantic search for conceptual queries ("how to extract price")
- Use exact key lookup for known patterns
- Apply filters to narrow search space
- Limit results to top-K most relevant
Don't:
- Search all layers for every query (route intelligently)
- Ignore relevance scores (filter low scores)
- Retrieve full objects when summaries suffice
3-context-window-management
Do:
- Prioritize recent and high-confidence memories
- Summarize old episodes aggressively
- Use hierarchical memory retrieval (summary β details on demand)
- Monitor token usage and trigger summarization proactively
Don't:
- Include entire memory in every agent call
- Ignore context window limits
- Retrieve memories without relevance ranking
4-multi-agent-coordination
Do:
- Broadcast significant discoveries to shared memory
- Implement consensus mechanisms for conflicting data
- Use message queues for asynchronous updates
- Version shared knowledge to handle conflicts
Don't:
- Allow race conditions on shared writes
- Broadcast every minor action (create noise)
- Trust shared data without validation
performance-metrics
Track these metrics to evaluate memory system effectiveness:
class MemoryMetrics(BaseModel):
# Retrieval performance
avg_retrieval_time_ms: float
cache_hit_rate: float
# Effectiveness
pattern_reuse_rate: float # % of times learned patterns helped
memory_assisted_success_rate: float # Success with vs without memory
# Efficiency
memory_size_mb: float
pruned_items_count: int
summarization_ratio: float # Compressed size / original size
# Quality
avg_pattern_confidence: float
false_positive_rate: float # Patterns that failed when reused
example-usage
full-episode-with-memory
# Initialize environment with memory
env = WebScraperEnv(memory_config=MemorySettings())
# Reset episode
obs = env.reset(task_id="task_medium", seed=42)
# Agent checks long-term memory for similar tasks
memory_query = Action(
action_type="SEARCH_MEMORY",
query=f"successful extraction patterns for {obs.task_description}",
memory_layers=["long_term"],
search_mode="semantic",
limit=5
)
similar_patterns = env.step(memory_query)
# Agent reasons using working memory
working_memory = {
"current_goal": "Extract product price",
"reasoning_steps": [
f"Retrieved {len(similar_patterns)} similar patterns",
f"Top pattern: {similar_patterns[0].selector} (confidence: {similar_patterns[0].confidence})",
"Will try this selector first"
],
"considered_actions": [...]
}
# Agent extracts using learned pattern
extract_action = Action(
action_type="EXTRACT_FIELD",
target_field="price",
selector=similar_patterns[0].selector
)
obs, reward, done, info = env.step(extract_action)
# If successful, reinforce the pattern
if reward.value > 0:
env.step(Action(
action_type="WRITE_MEMORY",
memory_layer="long_term",
key=f"pattern:price:{similar_patterns[0].selector}",
value={
**similar_patterns[0].dict(),
"success_count": similar_patterns[0].success_count + 1,
"last_used": datetime.now()
}
))
# Store episode summary
if done:
env.step(Action(
action_type="WRITE_MEMORY",
memory_layer="long_term",
key=f"episode:{obs.episode_id}",
value=env.summarize_episode()
))
future-enhancements
- Active Learning: Agent can request human labeling for ambiguous patterns
- Federated Memory: Share memory across organizations without revealing raw data
- Memory Replay: Train on stored episodes for offline RL
- Causal Memory: Track cause-effect relationships between actions and outcomes
- Memory Debugging: Visualize which memories influenced each decision
Next: See api.md for multi-model API integration.
document-metadata
| key | value |
|---|---|
| document | memory.md |
| status | active |
document-flow
flowchart TD
A[document] --> B[key-sections]
B --> C[implementation]
B --> D[operations]
B --> E[validation]