# SPARKNET: Technical Report ## AI-Powered Multi-Agent System for Research Valorization --- ## Table of Contents 1. [Executive Summary](#1-executive-summary) 2. [Introduction](#2-introduction) 3. [System Architecture](#3-system-architecture) 4. [Theoretical Foundations](#4-theoretical-foundations) 5. [Core Components](#5-core-components) 6. [Workflow Engine](#6-workflow-engine) 7. [Implementation Details](#7-implementation-details) 8. [Use Case: Patent Wake-Up](#8-use-case-patent-wake-up) 9. [Performance Considerations](#9-performance-considerations) 10. [Conclusion](#10-conclusion) --- ## 1. Executive Summary SPARKNET is an autonomous multi-agent AI system designed for research valorization and technology transfer. Built on modern agentic AI principles, it leverages LangGraph for workflow orchestration, LangChain for LLM integration, and ChromaDB for vector-based memory. The system transforms dormant intellectual property into commercialization opportunities throughs a coordinated pipeline of specialized agents. **Key Capabilities:** - Multi-agent orchestration with cyclic refinement - Local LLM deployment via Ollama (privacy-preserving) - Vector-based episodic and semantic memory - Automated patent analysis and Technology Readiness Level (TRL) assessment - Market opportunity identification and stakeholder matching - Professional valorization brief generation --- ## 2. Introduction ### 2.1 Problem Statement University technology transfer offices face significant challenges: - **Volume**: Thousands of patents remain dormant in institutional portfolios - **Complexity**: Manual analysis requires deep domain expertise - **Time**: Traditional evaluation takes days to weeks per patent - **Resources**: Limited staff cannot process the backlog efficiently ### 2.2 Solution Approach SPARKNET addresses these challenges through an **agentic AI architecture** that: 1. Automates document analysis and information extraction 2. Applies domain expertise through specialized agents 3. Provides structured, actionable outputs 4. Learns from past experiences to improve future performance ### 2.3 Design Principles | Principle | Implementation | |-----------|----------------| | **Autonomy** | Agents operate independently with defined goals | | **Specialization** | Each agent focuses on specific tasks | | **Collaboration** | Agents share information through structured state | | **Iteration** | Quality-driven refinement cycles | | **Memory** | Vector stores for contextual learning | | **Privacy** | Local LLM deployment via Ollama | --- ## 3. System Architecture ### 3.1 High-Level Architecture ``` ┌──────────────────────────────────────────────────────────────────────┐ │ SPARKNET SYSTEM │ ├──────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │ │ │ Frontend │ │ Backend │ │ LLM Layer │ │ │ │ Next.js │◄──►│ FastAPI │◄──►│ Ollama (4 Models) │ │ │ │ Port 3000 │ │ Port 8000 │ │ - llama3.1:8b │ │ │ └─────────────┘ └──────┬──────┘ │ - mistral:latest │ │ │ │ │ - qwen2.5:14b │ │ │ ▼ │ - gemma2:2b │ │ │ ┌────────────────┐ └─────────────────────────┘ │ │ │ LangGraph │ │ │ │ Workflow │◄──► ChromaDB (Vector Store) │ │ │ (StateGraph) │ │ │ └───────┬────────┘ │ │ │ │ │ ┌──────────────────┼──────────────────┐ │ │ ▼ ▼ ▼ │ │ ┌───────────┐ ┌─────────────┐ ┌───────────┐ │ │ │ Planner │ │ Executor │ │ Critic │ │ │ │ Agent │ │ Agents │ │ Agent │ │ │ └───────────┘ └─────────────┘ └───────────┘ │ │ │ │ ┌───────────┐ ┌─────────────┐ ┌───────────┐ │ │ │ Memory │ │ VisionOCR │ │ Tools │ │ │ │ Agent │ │ Agent │ │ Registry │ │ │ └───────────┘ └─────────────┘ └───────────┘ │ │ │ └──────────────────────────────────────────────────────────────────────┘ ``` ### 3.2 Layer Description | Layer | Technology | Purpose | |-------|------------|---------| | **Presentation** | Next.js, React, TypeScript | User interface, file upload, results display | | **API** | FastAPI, Python 3.10+ | RESTful endpoints, async processing | | **Orchestration** | LangGraph (StateGraph) | Workflow execution, conditional routing | | **Agent** | LangChain, Custom Agents | Task-specific processing | | **LLM** | Ollama (Local) | Natural language understanding and generation | | **Memory** | ChromaDB | Vector storage, semantic search | --- ## 4. Theoretical Foundations ### 4.1 Agentic AI Paradigm SPARKNET implements the modern **agentic AI** paradigm characterized by: #### 4.1.1 Agent Definition An agent in SPARKNET is defined as a tuple: ``` Agent = (S, A, T, R, π) ``` Where: - **S** = State space (AgentState in LangGraph) - **A** = Action space (tool calls, LLM invocations) - **T** = Transition function (workflow edges) - **R** = Reward signal (validation score) - **π** = Policy (LLM-based decision making) #### 4.1.2 Multi-Agent Coordination The system employs **hierarchical coordination**: ``` Coordinator (Workflow) │ ┌─────────────────┼─────────────────┐ ▼ ▼ ▼ Planner Executors Critic (Strategic) (Tactical) (Evaluative) │ │ │ └────────────────┴─────────────────┘ ▼ Shared State (AgentState) ``` ### 4.2 State Machine Formalism The LangGraph workflow is formally a **Finite State Machine with Memory**: ``` FSM-M = (Q, Σ, δ, q₀, F, M) ``` Where: - **Q** = {PLANNER, ROUTER, EXECUTOR, CRITIC, REFINE, FINISH} - **Σ** = Input alphabet (task descriptions, documents) - **δ** = Transition function (conditional edges) - **q₀** = PLANNER (initial state) - **F** = {FINISH} (accepting states) - **M** = AgentState (memory/context) ### 4.3 Quality-Driven Refinement The system implements a **feedback control loop**: ``` ┌─────────────────────────────┐ │ │ ▼ │ Input → PLAN → EXECUTE → VALIDATE ──YES──→ OUTPUT │ NO (score < threshold) │ ▼ REFINE │ └─────────────────→ (back to PLAN) ``` **Convergence Condition:** ``` terminate iff (validation_score ≥ quality_threshold) OR (iterations ≥ max_iterations) ``` ### 4.4 Vector Memory Architecture The memory system uses **dense vector embeddings** for semantic retrieval: ``` Memory Types: ├── Episodic Memory → Past workflow executions, outcomes ├── Semantic Memory → Domain knowledge, legal frameworks └── Stakeholder Memory → Partner profiles, capabilities ``` **Retrieval Function:** ```python retrieve(query, top_k) = argmax_k(cosine_similarity(embed(query), embed(documents))) ``` --- ## 5. Core Components ### 5.1 BaseAgent Abstract Class All agents inherit from `BaseAgent`, providing: ```python class BaseAgent(ABC): """Core agent interface""" # Attributes name: str # Agent identifier description: str # Agent purpose llm_client: OllamaClient # LLM interface model: str # Model to use system_prompt: str # Agent persona tools: Dict[str, BaseTool] # Available tools messages: List[Message] # Conversation history # Core Methods async def call_llm(prompt, messages, temperature) -> str async def execute_tool(tool_name, **kwargs) -> ToolResult async def process_task(task: Task) -> Task # Abstract async def send_message(recipient: Agent, content: str) -> str ``` ### 5.2 Specialized Agents | Agent | Purpose | Model | Complexity | |-------|---------|-------|------------| | **PlannerAgent** | Task decomposition, dependency analysis | qwen2.5:14b | Complex | | **CriticAgent** | Output validation, quality scoring | mistral:latest | Analysis | | **MemoryAgent** | Context retrieval, episode storage | nomic-embed-text | Embeddings | | **VisionOCRAgent** | Image/PDF text extraction | llava:7b | Vision | | **DocumentAnalysisAgent** | Patent structure extraction | llama3.1:8b | Standard | | **MarketAnalysisAgent** | Market opportunity identification | mistral:latest | Analysis | | **MatchmakingAgent** | Stakeholder matching | qwen2.5:14b | Complex | | **OutreachAgent** | Brief generation | llama3.1:8b | Standard | ### 5.3 Tool System Tools extend agent capabilities: ```python class BaseTool(ABC): name: str description: str parameters: Dict[str, ToolParameter] async def execute(**kwargs) -> ToolResult async def safe_execute(**kwargs) -> ToolResult # With error handling ``` **Built-in Tools:** - `file_reader`, `file_writer`, `file_search`, `directory_list` - `python_executor`, `bash_executor` - `gpu_monitor`, `gpu_select` - `document_generator_tool` (PDF creation) --- ## 6. Workflow Engine ### 6.1 LangGraph StateGraph The workflow is defined as a directed graph: ```python class SparknetWorkflow: def _build_graph(self) -> StateGraph: workflow = StateGraph(AgentState) # Define nodes (processing functions) workflow.add_node("planner", self._planner_node) workflow.add_node("router", self._router_node) workflow.add_node("executor", self._executor_node) workflow.add_node("critic", self._critic_node) workflow.add_node("refine", self._refine_node) workflow.add_node("finish", self._finish_node) # Define edges (transitions) workflow.set_entry_point("planner") workflow.add_edge("planner", "router") workflow.add_edge("router", "executor") workflow.add_edge("executor", "critic") # Conditional routing based on validation workflow.add_conditional_edges( "critic", self._should_refine, {"refine": "refine", "finish": "finish"} ) workflow.add_edge("refine", "planner") # Cyclic refinement workflow.add_edge("finish", END) return workflow ``` ### 6.2 AgentState Schema The shared state passed between nodes: ```python class AgentState(TypedDict): # Message History (auto-managed by LangGraph) messages: Annotated[Sequence[BaseMessage], add_messages] # Task Information task_id: str task_description: str scenario: ScenarioType # PATENT_WAKEUP, AGREEMENT_SAFETY, etc. status: TaskStatus # PENDING → PLANNING → EXECUTING → VALIDATING → COMPLETED # Workflow Execution current_agent: Optional[str] iteration_count: int max_iterations: int # Planning Outputs subtasks: Optional[List[Dict]] execution_order: Optional[List[List[str]]] # Execution Outputs agent_outputs: Dict[str, Any] intermediate_results: List[Dict] # Validation validation_score: Optional[float] validation_feedback: Optional[str] validation_issues: List[str] validation_suggestions: List[str] # Memory Context retrieved_context: List[Dict] document_metadata: Dict[str, Any] input_data: Dict[str, Any] # Final Output final_output: Optional[Any] success: bool error: Optional[str] # Timing start_time: datetime end_time: Optional[datetime] execution_time_seconds: Optional[float] ``` ### 6.3 Workflow Execution Flow ``` ┌─────────────────────────────────────────────────────────────────────┐ │ WORKFLOW EXECUTION FLOW │ ├─────────────────────────────────────────────────────────────────────┤ │ │ │ 1. PLANNER NODE │ │ ├─ Retrieve context from MemoryAgent │ │ ├─ Decompose task into subtasks │ │ ├─ Determine execution order (dependency resolution) │ │ └─ Output: subtasks[], execution_order[] │ │ │ │ │ ▼ │ │ 2. ROUTER NODE │ │ ├─ Identify scenario type (PATENT_WAKEUP, etc.) │ │ ├─ Select appropriate executor agents │ │ └─ Output: agents_to_use[] │ │ │ │ │ ▼ │ │ 3. EXECUTOR NODE │ │ ├─ Route to scenario-specific pipeline │ │ │ └─ Patent Wake-Up: Doc → Market → Match → Outreach │ │ ├─ Execute each specialized agent sequentially │ │ └─ Output: agent_outputs{}, final_output │ │ │ │ │ ▼ │ │ 4. CRITIC NODE │ │ ├─ Validate output quality (0.0-1.0 score) │ │ ├─ Identify issues and suggestions │ │ └─ Output: validation_score, validation_feedback │ │ │ │ │ ▼ │ │ 5. CONDITIONAL ROUTING │ │ ├─ IF score ≥ threshold (0.85) → FINISH │ │ ├─ IF iterations ≥ max → FINISH (with warning) │ │ └─ ELSE → REFINE → back to PLANNER │ │ │ │ │ ▼ │ │ 6. FINISH NODE │ │ ├─ Store episode in MemoryAgent (if quality ≥ 0.75) │ │ ├─ Calculate execution statistics │ │ └─ Return WorkflowOutput │ │ │ └─────────────────────────────────────────────────────────────────────┘ ``` --- ## 7. Implementation Details ### 7.1 LLM Integration (Ollama) SPARKNET uses **Ollama** for local LLM deployment: ```python class LangChainOllamaClient: """LangChain-compatible Ollama client with model routing""" COMPLEXITY_MODELS = { "simple": "gemma2:2b", # Classification, routing "standard": "llama3.1:8b", # General tasks "analysis": "mistral:latest", # Analysis, reasoning "complex": "qwen2.5:14b", # Complex multi-step } def get_llm(self, complexity: str) -> ChatOllama: """Get LLM instance for specified complexity level""" model = self.COMPLEXITY_MODELS.get(complexity, "llama3.1:8b") return ChatOllama(model=model, base_url=self.base_url) def get_embeddings(self) -> OllamaEmbeddings: """Get embeddings model for vector operations""" return OllamaEmbeddings(model="nomic-embed-text:latest") ``` ### 7.2 Memory System (ChromaDB) Three specialized collections: ```python class MemoryAgent: def _initialize_collections(self): # Episodic: Past workflow executions self.episodic_memory = Chroma( collection_name="episodic_memory", embedding_function=self.embeddings, persist_directory="data/vector_store/episodic" ) # Semantic: Domain knowledge self.semantic_memory = Chroma( collection_name="semantic_memory", embedding_function=self.embeddings, persist_directory="data/vector_store/semantic" ) # Stakeholders: Partner profiles self.stakeholder_profiles = Chroma( collection_name="stakeholder_profiles", embedding_function=self.embeddings, persist_directory="data/vector_store/stakeholders" ) ``` ### 7.3 Pydantic Data Models Structured outputs ensure type safety: ```python class PatentAnalysis(BaseModel): patent_id: str title: str abstract: str independent_claims: List[Claim] dependent_claims: List[Claim] ipc_classification: List[str] technical_domains: List[str] key_innovations: List[str] trl_level: int = Field(ge=1, le=9) trl_justification: str commercialization_potential: str # High/Medium/Low potential_applications: List[str] confidence_score: float = Field(ge=0.0, le=1.0) class MarketOpportunity(BaseModel): sector: str market_size_usd: Optional[float] growth_rate_percent: Optional[float] technology_fit: str # Excellent/Good/Fair priority_score: float = Field(ge=0.0, le=1.0) class StakeholderMatch(BaseModel): stakeholder_name: str stakeholder_type: str # Investor/Company/University overall_fit_score: float technical_fit: float market_fit: float geographic_fit: float match_rationale: str recommended_approach: str ``` --- ## 8. Use Case: Patent Wake-Up ### 8.1 Scenario Overview The **Patent Wake-Up** workflow transforms dormant patents into commercialization opportunities: ``` Patent Document → Analysis → Market Opportunities → Partner Matching → Valorization Brief ``` ### 8.2 Pipeline Execution ```python async def _execute_patent_wakeup(self, state: AgentState) -> AgentState: """Four-stage Patent Wake-Up pipeline""" # Stage 1: Document Analysis doc_agent = DocumentAnalysisAgent(llm_client, memory_agent, vision_ocr_agent) patent_analysis = await doc_agent.analyze_patent(patent_path) # Output: PatentAnalysis (title, claims, TRL, innovations) # Stage 2: Market Analysis market_agent = MarketAnalysisAgent(llm_client, memory_agent) market_analysis = await market_agent.analyze_market(patent_analysis) # Output: MarketAnalysis (opportunities, sectors, strategy) # Stage 3: Stakeholder Matching matching_agent = MatchmakingAgent(llm_client, memory_agent) matches = await matching_agent.find_matches(patent_analysis, market_analysis) # Output: List[StakeholderMatch] (scored partners) # Stage 4: Brief Generation outreach_agent = OutreachAgent(llm_client, memory_agent) brief = await outreach_agent.create_valorization_brief( patent_analysis, market_analysis, matches ) # Output: ValorizationBrief (markdown + PDF) return state ``` ### 8.3 Example Output ```yaml Patent: AI-Powered Drug Discovery Platform ───────────────────────────────────────────── Technology Assessment: TRL Level: 7/9 (System Demonstration) Key Innovations: • Novel neural network for molecular interaction prediction • Transfer learning from existing drug databases • Automated screening pipeline (60% time reduction) Market Opportunities (Top 3): 1. Pharmaceutical R&D Automation ($150B market, 12% CAGR) 2. Biotechnology Platform Services ($45B market, 15% CAGR) 3. Clinical Trial Optimization ($8B market, 18% CAGR) Top Partner Matches: 1. PharmaTech Solutions Inc. (Basel) - 92% fit score 2. BioVentures Capital (Toronto) - 88% fit score 3. European Patent Office Services (Munich) - 85% fit score Output: outputs/valorization_brief_patent_20251204.pdf ``` --- ## 9. Performance Considerations ### 9.1 Model Selection Strategy | Task Complexity | Model | VRAM | Latency | |-----------------|-------|------|---------| | Simple (routing, classification) | gemma2:2b | 1.6 GB | ~1s | | Standard (extraction, generation) | llama3.1:8b | 4.9 GB | ~3s | | Analysis (reasoning, evaluation) | mistral:latest | 4.4 GB | ~4s | | Complex (planning, multi-step) | qwen2.5:14b | 9.0 GB | ~8s | ### 9.2 GPU Resource Management ```python class GPUManager: """Multi-GPU resource allocation""" def select_best_gpu(self, min_memory_gb: float = 4.0) -> int: """Select GPU with most available memory""" gpus = self.get_gpu_status() available = [g for g in gpus if g.free_memory_gb >= min_memory_gb] return max(available, key=lambda g: g.free_memory_gb).id @contextmanager def gpu_context(self, min_memory_gb: float): """Context manager for GPU allocation""" gpu_id = self.select_best_gpu(min_memory_gb) os.environ["CUDA_VISIBLE_DEVICES"] = str(gpu_id) yield gpu_id ``` ### 9.3 Workflow Timing | Stage | Typical Duration | Notes | |-------|------------------|-------| | Planning | 5-10s | Depends on task complexity | | Document Analysis | 15-30s | OCR adds ~10s for scanned PDFs | | Market Analysis | 10-20s | Context retrieval included | | Stakeholder Matching | 20-40s | Semantic search + scoring | | Brief Generation | 15-25s | Includes PDF rendering | | Validation | 5-10s | Per iteration | | **Total** | **2-5 minutes** | Single patent, no refinement | ### 9.4 Scalability - **Batch Processing**: Process multiple patents in parallel - **ChromaDB Capacity**: Supports 10,000+ stakeholder profiles - **Checkpointing**: Resume failed workflows from last checkpoint - **Memory Persistence**: Vector stores persist across sessions --- ## 10. Conclusion ### 10.1 Summary SPARKNET demonstrates a practical implementation of **agentic AI** for research valorization: 1. **Multi-Agent Architecture**: Specialized agents collaborate through shared state 2. **LangGraph Orchestration**: Cyclic workflows with quality-driven refinement 3. **Local LLM Deployment**: Privacy-preserving inference via Ollama 4. **Vector Memory**: Contextual learning from past experiences 5. **Structured Outputs**: Pydantic models ensure data integrity ### 10.2 Key Contributions | Aspect | Innovation | |--------|------------| | **Architecture** | Hierarchical multi-agent system with conditional routing | | **Workflow** | State machine with memory and iterative refinement | | **Memory** | Tri-partite vector store (episodic, semantic, stakeholder) | | **Privacy** | Full local deployment without cloud dependencies | | **Output** | Professional PDF briefs with actionable recommendations | ### 10.3 Future Directions 1. **LangSmith Integration**: Observability and debugging 2. **Real Stakeholder Database**: CRM integration for live partner data 3. **Scenario Expansion**: Agreement Safety, Partner Matching workflows 4. **Multi-Language Support**: International patent processing 5. **Advanced Learning**: Reinforcement learning from user feedback --- ## Appendix A: Technology Stack | Component | Technology | Version | |-----------|------------|---------| | Runtime | Python | 3.10+ | | Orchestration | LangGraph | 0.2+ | | LLM Framework | LangChain | 1.0+ | | Local LLM | Ollama | Latest | | Vector Store | ChromaDB | 1.3+ | | API | FastAPI | 0.100+ | | Frontend | Next.js | 16+ | | Validation | Pydantic | 2.0+ | ## Appendix B: Model Requirements ```bash # Required models (download via Ollama) ollama pull llama3.1:8b # Standard tasks (4.9 GB) ollama pull mistral:latest # Analysis tasks (4.4 GB) ollama pull qwen2.5:14b # Complex reasoning (9.0 GB) ollama pull gemma2:2b # Simple routing (1.6 GB) ollama pull nomic-embed-text # Embeddings (274 MB) ollama pull llava:7b # Vision/OCR (optional, 4.7 GB) ``` ## Appendix C: Running SPARKNET ```bash # 1. Start Ollama server ollama serve # 2. Activate environment conda activate sparknet # 3. Start backend cd /home/mhamdan/SPARKNET python -m uvicorn api.main:app --reload --port 8000 # 4. Start frontend (separate terminal) cd frontend && npm run dev # 5. Access application # Frontend: http://localhost:3000 # API Docs: http://localhost:8000/api/docs ``` --- **Document Generated:** December 2025 **SPARKNET Version:** 1.0 (Production Ready)