Spaces:

A-R-F
/

Agentic-Reliability-Framework-API

Running

App Files Files Community

Agentic-Reliability-Framework-API / README.md

petter2025

Update README.md

6ade678 verified about 19 hours ago

3.54 kB

title: ARF API Control Plane
sdk: docker
colorFrom: blue
colorTo: green

arf-api

ARF API Control Plane (FastAPI)

Live Demo

The API is deployed and accessible at:

Base URL: https://a-r-f-agentic-reliability-framework-api.hf.space
Interactive Documentation: https://a-r-f-agentic-reliability-framework-api.hf.space/docs

Quick Start (Local Development)

Install dependencies:

pip install -r requirements.txt

Note: requirements.txt installs agentic-reliability-framework directly from the project's Git repository.

Set environment variables (optional, in .env):

ARF_HMC_MODEL – path to HMC model JSON (default: models/hmc_model.json)

ARF_USE_HYPERPRIORS – true/false

API_KEY – optional (currently not enforced)

Run the app locally:

uvicorn app.main:app --reload --port 8000

Health check:

GET http://localhost:8000/health

Causal Explainer Endpoint

The ARF API includes a heuristic causal explainer that evaluates the impact of proposed healing actions using deterministic rules. This module provides counterfactual reasoning without requiring a fitted causal model or external ML dependencies.

The explainer estimates how system metrics such as latency would change if a different action were taken.

Mathematical Model

The counterfactual outcome is computed as:

counterfactual_outcome = factual_outcome * (1 + effect_frac)

Where:

effect_frac is a predefined impact factor based on the action type
effects are multiplicative
a fixed ±10% uncertainty interval is applied to the estimated outcome

Example Request

curl -X POST "http://localhost:8000/api/v1/v1/incidents/evaluate"   -H "Content-Type: application/json"   -d '{
    "component": "checkout-service",
    "latency_p99": 600,
    "error_rate": 0.2,
    "service_mesh": "default"
  }'

Example Response

{
  "healing_intent": {
    "action": "restart_container",
    "component": "checkout-service",
    "parameters": {},
    "justification": "Causal: If we apply restart_container instead of no_action, latency would change from 600.00 to 510.00 (Δ = -90.00). Based on heuristic causal model.",
    "confidence": 0.85,
    "risk_score": 0.54,
    "status": "oss_advisory_only"
  },
  "causal_explanation": {
    "factual_outcome": 600,
    "counterfactual_outcome": 510,
    "effect": -90,
    "explanation_text": "If we apply restart_container instead of no_action, latency would change from 600.00 to 510.00 (Δ = -90.00). Based on heuristic causal model.",
    "is_model_based": false,
    "warnings": [
      "Using heuristic causal model (no fitted SCM)."
    ]
  },
  "utility_decision": {
    "best_action": "restart_container",
    "expected_utility": 0.5,
    "explanation": "Heuristic decision based on latency/error thresholds"
  }
}

Important Notes

This endpoint is advisory only (status = oss_advisory_only)
No Structural Causal Model (SCM) is fitted
No machine learning models are used
All effects are based on predefined heuristics

Tests

Run pytest. Tests use a temporary SQLite DB (sqlite:///./test.db) created by the test fixtures.

Notes

The governance endpoints use an in-process RiskEngine initialized at startup.
The outcome recording endpoint is not implemented in this repository and returns HTTP 501.