petter2025's picture
Update README.md
6ade678 verified
metadata
title: ARF API Control Plane
sdk: docker
colorFrom: blue
colorTo: green

arf-api

ARF API Control Plane (FastAPI)

Live Demo

The API is deployed and accessible at:

Quick Start (Local Development)

  1. Install dependencies:
pip install -r requirements.txt

Note: requirements.txt installs agentic-reliability-framework directly from the project's Git repository.

  1. Set environment variables (optional, in .env):
ARF_HMC_MODEL – path to HMC model JSON (default: models/hmc_model.json)

ARF_USE_HYPERPRIORS – true/false

API_KEY – optional (currently not enforced)
  1. Run the app locally:
uvicorn app.main:app --reload --port 8000
  1. Health check:
GET http://localhost:8000/health

Causal Explainer Endpoint

The ARF API includes a heuristic causal explainer that evaluates the impact of proposed healing actions using deterministic rules. This module provides counterfactual reasoning without requiring a fitted causal model or external ML dependencies.

The explainer estimates how system metrics such as latency would change if a different action were taken.

Mathematical Model

The counterfactual outcome is computed as:

counterfactual_outcome = factual_outcome * (1 + effect_frac)

Where:

  • effect_frac is a predefined impact factor based on the action type
  • effects are multiplicative
  • a fixed ±10% uncertainty interval is applied to the estimated outcome

Example Request

curl -X POST "http://localhost:8000/api/v1/v1/incidents/evaluate"   -H "Content-Type: application/json"   -d '{
    "component": "checkout-service",
    "latency_p99": 600,
    "error_rate": 0.2,
    "service_mesh": "default"
  }'

Example Response

{
  "healing_intent": {
    "action": "restart_container",
    "component": "checkout-service",
    "parameters": {},
    "justification": "Causal: If we apply restart_container instead of no_action, latency would change from 600.00 to 510.00 (Δ = -90.00). Based on heuristic causal model.",
    "confidence": 0.85,
    "risk_score": 0.54,
    "status": "oss_advisory_only"
  },
  "causal_explanation": {
    "factual_outcome": 600,
    "counterfactual_outcome": 510,
    "effect": -90,
    "explanation_text": "If we apply restart_container instead of no_action, latency would change from 600.00 to 510.00 (Δ = -90.00). Based on heuristic causal model.",
    "is_model_based": false,
    "warnings": [
      "Using heuristic causal model (no fitted SCM)."
    ]
  },
  "utility_decision": {
    "best_action": "restart_container",
    "expected_utility": 0.5,
    "explanation": "Heuristic decision based on latency/error thresholds"
  }
}

Important Notes

  • This endpoint is advisory only (status = oss_advisory_only)
  • No Structural Causal Model (SCM) is fitted
  • No machine learning models are used
  • All effects are based on predefined heuristics

Tests

Run pytest. Tests use a temporary SQLite DB (sqlite:///./test.db) created by the test fixtures.

Notes

  • The governance endpoints use an in-process RiskEngine initialized at startup.
  • The outcome recording endpoint is not implemented in this repository and returns HTTP 501.