Spaces:

victordibia
/

flow

Sleeping

App Files Files Community

victordibia commited on 11 days ago

Commit

33be1ce

1 Parent(s): a08910d

Deploy 2026-02-03 08:46:33

Browse files

Files changed (5) hide show

README.md +19 -226
src/flow/experiments/models.py +9 -2
src/flow/harness/langgraph/harness.py +33 -9
src/flow/harness/miniagent/harness.py +108 -9
src/flow/ui/schemas/config.py +10 -3

README.md CHANGED Viewed

@@ -1,237 +1,30 @@
-# Flow
-> [!NOTE]
-> Flow is an experimental prototype and changing rapidly.
 Flow helps you find the best configuration for your AI coding agent. Define your agent spec, provide evaluation tasks, and Flow automatically generates variants, scores them, and shows you the quality vs. cost tradeoffs.
 - **Simplified experimentation** — Automates the search for optimal agent configurations
 - **Transparency** — See exactly what was tested, scores, and tradeoffs on a Pareto chart
 - **User control** — Choose your tasks, evaluation criteria, and approve variants
-- **Framework agnostic** — Standardized agent spec with pluggable runtime adapters (MAF built-in, extensible)
-![Flow UI](docs/flow.png)
-## How It Works
-```mermaid
-flowchart LR
-    A[Agent Spec] --> D[Optimizer]
-    B[Tasks] --> D
-    C[Evaluator] --> D
-    D --> E[Agent Variants/Candidates]
-    E --> F[Pareto Graph]
-```
-## Core Concepts
-| Component      | What It Is                                                                          |
-| -------------- | ----------------------------------------------------------------------------------- |
-| **Agent Spec** | Agent configuration (model, tools, compaction, instructions) with pluggable runtime |
-| **Task**       | A coding challenge with evaluation criteria                                         |
-| **Evaluator**  | Scores agent output (LLM-as-Judge, heuristics, or trace-based)                      |
-| **Optimizer**  | Generates variants and runs experiments (GridSearch, extensible)                    |
-## Quick Start
-### 1. Install
-```bash
-git clone https://github.com/victordibia/flow
-cd flow
-uv sync
-```
-### 2. Configure
-Create a `.env` file in the project root:
-```bash
-AZURE_OPENAI_API_KEY=your-api-key-here
-AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
-AZURE_OPENAI_CHAT_DEPLOYMENT_NAME=gpt-4o-mini
-```
-**Important:** Make sure your Azure OpenAI deployment has adequate rate limits:
-- **Minimum:** 10,000 tokens per minute (TPM)
-- **Recommended:** 30,000+ TPM for optimization runs
-See [Azure Portal](https://portal.azure.com) → Your OpenAI resource → Deployments to adjust rate limits.
-### 3. Test Your Setup
-Before running optimization, verify your Azure OpenAI connection:
-```bash
-# Test Azure OpenAI connection
-uv run python scripts/test_azure_connection.py
-# Test basic agent execution
-uv run python scripts/test_basic_agent.py
-# Test LLM evaluator
-uv run python scripts/test_evaluator.py
-```
-All tests should pass with non-zero scores and token counts.
-### 4. Run
-```bash
-# Launch the web UI
-uv run flow serve
-# Or run optimization from CLI (base agent + variations + tasks)
-uv run flow optimize --agent base.yaml --vary compaction,memory --tasks tasks.jsonl
-```
-## Agent Spec
-Define your agent configuration:
-```python
-from flow.experiments.models import Agent, CompactionConfig
-agent = Agent(
-    name="my-agent",
-    framework="maf",  # default; extensible to other runtimes
-    instructions="You are a coding assistant",
-    tools="standard",  # or "minimal", "full", "readonly"
-    compaction=CompactionConfig.head_tail(10, 40),  # keep first 10 + last 40 messages
-)
-```
-Flow tests variations like:
-- **Compaction strategies** — `none`, `head_tail(N, M)`, `last_n(N)`
-- **Tool configurations** — different tool sets
-- **Instructions** — prompt variations
-## Task Format
-Tasks are JSONL with evaluation criteria:
-```json
-{
-  "name": "fizzbuzz",
-  "prompt": "Create fizzbuzz.py and run it",
-  "criteria": [
-    { "name": "correct", "instruction": "Output shows FizzBuzz pattern" }
-  ]
-}
-```
-## Web UI
-Launch with `uv run flow serve`. Create agents, import task suites, run optimization jobs, and view results with Pareto analysis. Test agents interactively with live trace streaming.
-## CLI Commands
-```bash
-# Web UI
-flow serve                                          # Start the web UI
-# Optimization
-flow optimize --agent base.yaml --tasks tasks.jsonl # Optimize base agent
-flow optimize --vary compaction,memory              # Vary specific parameters
-flow optimize --suite coding                        # Use built-in task suite
-# Single Task Execution
-flow run "Create hello.py"                          # Run a single task
-flow run --config best.yaml "task"                  # Run with optimized config
-# Testing & Diagnostics
-python scripts/test_azure_connection.py             # Test Azure OpenAI connection
-python scripts/test_basic_agent.py                  # Test basic agent execution
-python scripts/test_evaluator.py                    # Test LLM evaluator
-```
-## Optimizer
-Flow includes multiple optimization strategies for finding the best agent configuration.
-### Grid Search (Default)
-Test predefined variations of your agent:
-```bash
-# Vary compaction and memory settings
-flow optimize --agent examples/base_agent.yaml --vary compaction,memory --tasks examples/coding_tasks.jsonl
-# Or define variations in a config file
-flow optimize --config variations.yaml --agent base_agent.yaml --tasks tasks.jsonl
-```
-### GEPA (Active Learning)
-Use GEPA (Generative Evolutionary Prompt Adjustment) for automatic prompt optimization:
-```bash
-# Run GEPA optimization
-flow optimize \
-  --config examples/gepa_strategy.yaml \
-  --agent examples/base_agent.yaml \
-  --tasks examples/coding_tasks.jsonl \
-  --budget 10 \
-  --parallel 2
-```
-**GEPA Configuration:**
-1. **Strategy Config** (`examples/gepa_strategy.yaml`):
-   ```yaml
-   strategy_type: gepa
-   config:
-     reflection_lm: gpt-4o-mini  # Model for GEPA's reflection
-   ```
-2. **Base Agent** (`examples/base_agent.yaml`):
-   ```yaml
-   name: coding-assistant
-   model: gpt-4o-mini            # Model for agent execution
-   tools: standard
-   instructions: |
-     Your initial prompt that GEPA will optimize...
-   ```
-3. **Run Optimization:**
-   - `--budget`: Number of optimization iterations (default: 10)
-   - `--parallel`: Concurrent evaluations (default: 4)
-   - Tasks must include evaluation criteria for LLM scoring
-**Example Output:**
-```
-[1/10] coding-assistant_gepa_eval/fibonacci: ✓ score=0.85 tokens=1,245
-[2/10] coding-assistant_gepa_eval/palindrome: ✓ score=0.78 tokens=982
-...
-Best agent exported to: ~/.flow/optimizations/<timestamp>/agents/best_score.yaml
-```
-### Requirements for Optimization
-- **Azure OpenAI Deployment:** Create a deployment with your chosen model (e.g., `gpt-4o-mini`)
-- **Rate Limits:** Minimum 10K TPM; 30K+ recommended for smooth runs
-- **Task Criteria:** Tasks need evaluation criteria for LLM-based scoring:
-  ```json
-  {
-    "name": "task_name",
-    "prompt": "Task description",
-    "criteria": [
-      {"name": "correctness", "instruction": "Solution is correct", "weight": 1.0},
-      {"name": "quality", "instruction": "Code is clean and documented", "weight": 0.7}
-    ]
-  }
-  ```
-## Development
-```bash
-uv sync --dev            # Install dev dependencies
-uv run pytest tests/ -v  # Run tests
-uv run pyright src/      # Type checking
-uv run ruff check src/   # Linting
-```
-## License
-MIT License - see [LICENSE](LICENSE) for details.

+---
+title: Flow
+emoji: 🌊
+colorFrom: blue
+colorTo: purple
+sdk: docker
+app_port: 7860
+pinned: false
+---
+# Flow
 Flow helps you find the best configuration for your AI coding agent. Define your agent spec, provide evaluation tasks, and Flow automatically generates variants, scores them, and shows you the quality vs. cost tradeoffs.
 - **Simplified experimentation** — Automates the search for optimal agent configurations
 - **Transparency** — See exactly what was tested, scores, and tradeoffs on a Pareto chart
 - **User control** — Choose your tasks, evaluation criteria, and approve variants
+- **Framework agnostic** — Standardized agent spec with pluggable runtime adapters
+## Usage
+1. Create or import an agent configuration
+2. Define evaluation tasks with criteria
+3. Run optimization to generate and test variants
+4. View results on the Pareto chart (quality vs. cost)
+## Links
+- **GitHub**: [victordibia/flow](https://github.com/victordibia/flow)
+- **Documentation**: See GitHub README for full documentation

src/flow/experiments/models.py CHANGED Viewed

@@ -293,7 +293,9 @@ class Agent:
         description: Human-readable description
         instructions: System prompt / instructions (optional, uses framework default if None)
         instructions_preset: Preset name for instructions ("coding", "benchmark", etc.)
-        model: Model deployment name (e.g., "gpt-4o")
         compaction: Compaction strategy configuration
         tools: Tool configuration - can be:
             - str: Preset name ("standard", "minimal", "full", "readonly")
@@ -306,7 +308,7 @@ class Agent:
     description: str = ""
     instructions: str | None = None
     instructions_preset: str | None = None  # e.g., "coding", "benchmark", "research"
-    model: str | None = None
     compaction: CompactionConfig = field(default_factory=CompactionConfig)
     tools: str | list[str] | dict[str, dict[str, Any]] = "standard"
@@ -487,6 +489,11 @@ class GridSearchStrategy:
                         name_parts.append(f"tools=[{len(v)}]")
                     else:
                         name_parts.append(f"tools=[{len(v)}]")
                 elif isinstance(v, bool):
                     name_parts.append(f"{k}={'on' if v else 'off'}")
                 else:

         description: Human-readable description
         instructions: System prompt / instructions (optional, uses framework default if None)
         instructions_preset: Preset name for instructions ("coding", "benchmark", etc.)
+        llm_config: LLM configuration with provider and model info:
+            {"provider": "azure|openai|anthropic", "model": "gpt-4o"}
+            If None, auto-detects from environment variables.
         compaction: Compaction strategy configuration
         tools: Tool configuration - can be:
             - str: Preset name ("standard", "minimal", "full", "readonly")
     description: str = ""
     instructions: str | None = None
     instructions_preset: str | None = None  # e.g., "coding", "benchmark", "research"
+    llm_config: dict[str, Any] | None = None  # {"provider": "azure", "model": "gpt-4o"}
     compaction: CompactionConfig = field(default_factory=CompactionConfig)
     tools: str | list[str] | dict[str, dict[str, Any]] = "standard"
                         name_parts.append(f"tools=[{len(v)}]")
                     else:
                         name_parts.append(f"tools=[{len(v)}]")
+                elif k == "llm_config" and isinstance(v, dict):
+                    # Format llm_config as provider/model
+                    provider = v.get("provider", "unknown")
+                    model = v.get("model", "")
+                    name_parts.append(f"{provider}/{model}" if model else provider)
                 elif isinstance(v, bool):
                     name_parts.append(f"{k}={'on' if v else 'off'}")
                 else:

src/flow/harness/langgraph/harness.py CHANGED Viewed

@@ -77,8 +77,8 @@ class LangGraphHarness(BaseHarness):
         memory_path.mkdir(parents=True, exist_ok=True)
         tools = build_langgraph_tools(tools_spec, workspace, memory_path)
-        # Create model
-        model = cls._create_model(agent.model)
         # Create compaction hook if enabled
         pre_model_hook = None
@@ -100,22 +100,46 @@ class LangGraphHarness(BaseHarness):
         return cls(graph=graph, agent_name=agent.name, workspace=workspace)
     @staticmethod
-    def _create_model(model_spec: str | None):
-        """Create a LangChain chat model from spec.
         Args:
-            model_spec: Model specification, e.g., "openai:gpt-4o" or "gpt-4o"
         Returns:
             A LangChain chat model instance
         """
         import os
-        if model_spec and ":" in model_spec:
-            # "provider:model" syntax - use init_chat_model
-            from langchain.chat_models import init_chat_model
-            return init_chat_model(model_spec)
         # Default: Azure OpenAI from environment
         from langchain_openai import AzureChatOpenAI

         memory_path.mkdir(parents=True, exist_ok=True)
         tools = build_langgraph_tools(tools_spec, workspace, memory_path)
+        # Create model from llm_config
+        model = cls._create_model(agent.llm_config)
         # Create compaction hook if enabled
         pre_model_hook = None
         return cls(graph=graph, agent_name=agent.name, workspace=workspace)
     @staticmethod
+    def _create_model(llm_config: dict[str, Any] | None):
+        """Create a LangChain chat model from llm_config.
         Args:
+            llm_config: LLM config dict with provider and model keys
         Returns:
             A LangChain chat model instance
         """
         import os
+        if llm_config:
+            provider = llm_config.get("provider", "").lower()
+            model = llm_config.get("model", "gpt-4o")
+            if provider == "openai":
+                from langchain_openai import ChatOpenAI
+                return ChatOpenAI(
+                    model=model,
+                    api_key=os.environ.get("OPENAI_API_KEY"),
+                )
+            elif provider in ("azure", "azure_openai"):
+                from langchain_openai import AzureChatOpenAI
+                return AzureChatOpenAI(
+                    deployment_name=model,
+                    api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
+                    azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
+                    api_version=os.environ.get("AZURE_OPENAI_API_VERSION", "2024-02-15-preview"),
+                )
+            elif provider == "anthropic":
+                from langchain_anthropic import ChatAnthropic
+                return ChatAnthropic(
+                    model=model,
+                    api_key=os.environ.get("ANTHROPIC_API_KEY"),
+                )
         # Default: Azure OpenAI from environment
         from langchain_openai import AzureChatOpenAI

src/flow/harness/miniagent/harness.py CHANGED Viewed

@@ -87,23 +87,24 @@ class MiniAgentHarness(BaseHarness):
         tools_spec = resolve_tools(agent.tools)
         tools = cls._build_tools(tools_spec, workspace)
-        # 3. Create OTEL hooks for trace collection
-        from .otel import create_otel_hooks
-        otel_hooks = create_otel_hooks(model=agent.model or "gpt-4o")
-        # 4. Create ChatClient from LLM config or env
         from .client import ClientConfig
         if llm_config is not None:
-            # Use provided LLM config
             config = cls._create_client_config_from_llm_config(llm_config)
         else:
-            # Fall back to env vars
             config = ClientConfig.from_env()
-            if agent.model:
-                config.model = agent.model
         chat_client = ChatClient(config)
         # Resolve instructions: explicit > preset > default "coding"
         if agent.instructions:
             instructions = agent.instructions
@@ -173,6 +174,104 @@ class MiniAgentHarness(BaseHarness):
                     f"Supported: openai, azure_openai, custom"
                 )
     @classmethod
     def _create_context_strategy(cls, agent: "Agent") -> ContextStrategy:
         """Map Flow's CompactionConfig to MiniAgent's ContextStrategy."""

         tools_spec = resolve_tools(agent.tools)
         tools = cls._build_tools(tools_spec, workspace)
+        # 3. Create ChatClient from LLM config or env
         from .client import ClientConfig
         if llm_config is not None:
+            # Use provided LLM config (from Flow's config system)
             config = cls._create_client_config_from_llm_config(llm_config)
+        elif agent.llm_config:
+            # Use agent's llm_config dict
+            config = cls._create_client_config_from_dict(agent.llm_config)
         else:
+            # Fall back to env vars auto-detection
             config = ClientConfig.from_env()
         chat_client = ChatClient(config)
+        # 4. Create OTEL hooks for trace collection
+        from .otel import create_otel_hooks
+        otel_hooks = create_otel_hooks(model=config.model)
         # Resolve instructions: explicit > preset > default "coding"
         if agent.instructions:
             instructions = agent.instructions
                     f"Supported: openai, azure_openai, custom"
                 )
+    @classmethod
+    def _create_client_config_from_dict(
+        cls, llm_config: dict[str, Any]
+    ) -> "ClientConfig":
+        """Create ClientConfig from agent's llm_config dict.
+        Supports a simple format for YAML configuration:
+            llm_config:
+              provider: azure  # or openai, anthropic
+              model: gpt-4o   # model/deployment name
+        Reads credentials from environment variables based on provider.
+        Args:
+            llm_config: Dict with 'provider' and 'model' keys
+        Returns:
+            ClientConfig for the specified provider
+        Raises:
+            ValueError: If required fields or env vars are missing
+        """
+        import os
+        from .client import ClientConfig
+        provider = llm_config.get("provider", "").lower()
+        model = llm_config.get("model")
+        if not provider:
+            raise ValueError("llm_config requires 'provider' field")
+        if provider in ("azure", "azure_openai"):
+            # Azure OpenAI - requires endpoint and deployment
+            endpoint = llm_config.get("endpoint") or os.environ.get("AZURE_OPENAI_ENDPOINT")
+            api_key = llm_config.get("api_key") or os.environ.get("AZURE_OPENAI_API_KEY")
+            deployment = model or os.environ.get("AZURE_OPENAI_DEPLOYMENT", "gpt-4o")
+            api_version = llm_config.get("api_version") or os.environ.get(
+                "AZURE_OPENAI_API_VERSION", "2024-02-15-preview"
+            )
+            if not endpoint:
+                raise ValueError(
+                    "AZURE_OPENAI_ENDPOINT env var required for azure provider"
+                )
+            if not api_key:
+                raise ValueError(
+                    "AZURE_OPENAI_API_KEY env var required for azure provider"
+                )
+            return ClientConfig(
+                api_key=api_key,
+                model=deployment,
+                endpoint=endpoint,
+                api_version=api_version,
+            )
+        elif provider == "openai":
+            # Standard OpenAI
+            api_key = llm_config.get("api_key") or os.environ.get("OPENAI_API_KEY")
+            model_name = model or os.environ.get("OPENAI_MODEL", "gpt-4o")
+            base_url = llm_config.get("base_url")
+            if not api_key:
+                raise ValueError(
+                    "OPENAI_API_KEY env var required for openai provider"
+                )
+            return ClientConfig(
+                api_key=api_key,
+                model=model_name,
+                endpoint=base_url,
+            )
+        elif provider == "anthropic":
+            # Anthropic Claude - use OpenAI-compatible endpoint
+            api_key = llm_config.get("api_key") or os.environ.get("ANTHROPIC_API_KEY")
+            model_name = model or "claude-3-5-sonnet-20241022"
+            base_url = llm_config.get("base_url") or os.environ.get(
+                "ANTHROPIC_BASE_URL", "https://api.anthropic.com/v1"
+            )
+            if not api_key:
+                raise ValueError(
+                    "ANTHROPIC_API_KEY env var required for anthropic provider"
+                )
+            return ClientConfig(
+                api_key=api_key,
+                model=model_name,
+                endpoint=base_url,
+            )
+        else:
+            raise ValueError(
+                f"Unknown provider: {provider}. "
+                f"Supported: azure, openai, anthropic"
+            )
     @classmethod
     def _create_context_strategy(cls, agent: "Agent") -> ContextStrategy:
         """Map Flow's CompactionConfig to MiniAgent's ContextStrategy."""

src/flow/ui/schemas/config.py CHANGED Viewed

@@ -15,6 +15,13 @@ class CompactionConfigSchema(BaseModel):
     params: dict[str, Any] = {"head_size": 10, "tail_size": 40}
 class AgentCreate(BaseModel):
     """Request schema for creating an agent.
@@ -33,7 +40,7 @@ class AgentCreate(BaseModel):
     description: str = ""
     framework: str = "maf"
     instructions: str | None = None
-    model: str | None = None
     compaction: CompactionConfigSchema = CompactionConfigSchema()
     tools: str | list[str] | dict[str, dict[str, Any]] = "standard"
@@ -42,7 +49,7 @@ class AgentCreate(BaseModel):
         return {
             "framework": self.framework,
             "instructions": self.instructions,
-            "model": self.model,
             "compaction": self.compaction.model_dump(),
             "tools": self.tools,
         }
@@ -55,7 +62,7 @@ class AgentUpdate(BaseModel):
     description: str | None = None
     framework: str | None = None
     instructions: str | None = None
-    model: str | None = None
     compaction: CompactionConfigSchema | None = None
     tools: str | list[str] | dict[str, dict[str, Any]] | None = None
     is_public: bool | None = None

     params: dict[str, Any] = {"head_size": 10, "tail_size": 40}
+class LLMConfigSchema(BaseModel):
+    """LLM configuration with provider and model."""
+    provider: str = "azure"  # azure, openai, anthropic
+    model: str = "gpt-4o"
 class AgentCreate(BaseModel):
     """Request schema for creating an agent.
     description: str = ""
     framework: str = "maf"
     instructions: str | None = None
+    llm_config: LLMConfigSchema | None = None
     compaction: CompactionConfigSchema = CompactionConfigSchema()
     tools: str | list[str] | dict[str, dict[str, Any]] = "standard"
         return {
             "framework": self.framework,
             "instructions": self.instructions,
+            "llm_config": self.llm_config.model_dump() if self.llm_config else None,
             "compaction": self.compaction.model_dump(),
             "tools": self.tools,
         }
     description: str | None = None
     framework: str | None = None
     instructions: str | None = None
+    llm_config: LLMConfigSchema | None = None
     compaction: CompactionConfigSchema | None = None
     tools: str | list[str] | dict[str, dict[str, Any]] | None = None
     is_public: bool | None = None