# arf-api ARF API Control Plane (FastAPI) ## Live Demo The API is deployed and accessible at: - **Base URL**: [https://a-r-f-agentic-reliability-framework-api.hf.space](https://a-r-f-agentic-reliability-framework-api.hf.space) - **Interactive Documentation**: [https://a-r-f-agentic-reliability-framework-api.hf.space/docs](https://a-r-f-agentic-reliability-framework-api.hf.space/docs) ## Quick Start (Local Development) 1. **Install dependencies**: ```bash pip install -r requirements.txt ``` Note: `requirements.txt` installs `agentic-reliability-framework` directly from the project's Git repository. 2. **Set environment variables** (optional, in `.env`): ```text ARF_HMC_MODEL – path to HMC model JSON (default: models/hmc_model.json) ARF_USE_HYPERPRIORS – true/false API_KEY – optional (currently not enforced) ``` 3. **Run the app locally**: ```bash uvicorn app.main:app --reload --port 8000 ``` 4. **Health check**: ```bash GET http://localhost:8000/health ``` ## Causal Explainer Endpoint The ARF API includes a heuristic causal explainer that evaluates the impact of proposed healing actions using deterministic rules. This module provides counterfactual reasoning without requiring a fitted causal model or external ML dependencies. The explainer estimates how system metrics such as latency would change if a different action were taken. ### Mathematical Model The counterfactual outcome is computed as: ```text counterfactual_outcome = factual_outcome * (1 + effect_frac) ``` Where: - `effect_frac` is a predefined impact factor based on the action type - effects are multiplicative - a fixed ±10% uncertainty interval is applied to the estimated outcome ### Example Request ```bash curl -X POST "http://localhost:8000/api/v1/v1/incidents/evaluate" -H "Content-Type: application/json" -d '{ "component": "checkout-service", "latency_p99": 600, "error_rate": 0.2, "service_mesh": "default" }' ``` ### Example Response ```json { "healing_intent": { "action": "restart_container", "component": "checkout-service", "parameters": {}, "justification": "Causal: If we apply restart_container instead of no_action, latency would change from 600.00 to 510.00 (Δ = -90.00). Based on heuristic causal model.", "confidence": 0.85, "risk_score": 0.54, "status": "oss_advisory_only" }, "causal_explanation": { "factual_outcome": 600, "counterfactual_outcome": 510, "effect": -90, "explanation_text": "If we apply restart_container instead of no_action, latency would change from 600.00 to 510.00 (Δ = -90.00). Based on heuristic causal model.", "is_model_based": false, "warnings": [ "Using heuristic causal model (no fitted SCM)." ] }, "utility_decision": { "best_action": "restart_container", "expected_utility": 0.5, "explanation": "Heuristic decision based on latency/error thresholds" } } ``` ### Important Notes - This endpoint is advisory only (`status = oss_advisory_only`) - No Structural Causal Model (SCM) is fitted - No machine learning models are used - All effects are based on predefined heuristics Tests ----- Run `pytest`. Tests use a temporary SQLite DB (`sqlite:///./test.db`) created by the test fixtures. Notes ----- - The governance endpoints use an in-process `RiskEngine` initialized at startup. - The outcome recording endpoint is not implemented in this repository and returns HTTP 501.