metadata
title: ARF API Control Plane
sdk: docker
colorFrom: blue
colorTo: green
arf-api
ARF API Control Plane (FastAPI)
Live Demo
The API is deployed and accessible at:
- Base URL: https://a-r-f-agentic-reliability-framework-api.hf.space
- Interactive Documentation: https://a-r-f-agentic-reliability-framework-api.hf.space/docs
Quick Start (Local Development)
- Install dependencies:
pip install -r requirements.txt
Note: requirements.txt installs agentic-reliability-framework directly from the project's Git repository.
- Set environment variables (optional, in
.env):
ARF_HMC_MODEL – path to HMC model JSON (default: models/hmc_model.json)
ARF_USE_HYPERPRIORS – true/false
API_KEY – optional (currently not enforced)
- Run the app locally:
uvicorn app.main:app --reload --port 8000
- Health check:
GET http://localhost:8000/health
Causal Explainer Endpoint
The ARF API includes a heuristic causal explainer that evaluates the impact of proposed healing actions using deterministic rules. This module provides counterfactual reasoning without requiring a fitted causal model or external ML dependencies.
The explainer estimates how system metrics such as latency would change if a different action were taken.
Mathematical Model
The counterfactual outcome is computed as:
counterfactual_outcome = factual_outcome * (1 + effect_frac)
Where:
effect_fracis a predefined impact factor based on the action type- effects are multiplicative
- a fixed ±10% uncertainty interval is applied to the estimated outcome
Example Request
curl -X POST "http://localhost:8000/api/v1/v1/incidents/evaluate" -H "Content-Type: application/json" -d '{
"component": "checkout-service",
"latency_p99": 600,
"error_rate": 0.2,
"service_mesh": "default"
}'
Example Response
{
"healing_intent": {
"action": "restart_container",
"component": "checkout-service",
"parameters": {},
"justification": "Causal: If we apply restart_container instead of no_action, latency would change from 600.00 to 510.00 (Δ = -90.00). Based on heuristic causal model.",
"confidence": 0.85,
"risk_score": 0.54,
"status": "oss_advisory_only"
},
"causal_explanation": {
"factual_outcome": 600,
"counterfactual_outcome": 510,
"effect": -90,
"explanation_text": "If we apply restart_container instead of no_action, latency would change from 600.00 to 510.00 (Δ = -90.00). Based on heuristic causal model.",
"is_model_based": false,
"warnings": [
"Using heuristic causal model (no fitted SCM)."
]
},
"utility_decision": {
"best_action": "restart_container",
"expected_utility": 0.5,
"explanation": "Heuristic decision based on latency/error thresholds"
}
}
Important Notes
- This endpoint is advisory only (
status = oss_advisory_only) - No Structural Causal Model (SCM) is fitted
- No machine learning models are used
- All effects are based on predefined heuristics
Tests
Run pytest. Tests use a temporary SQLite DB (sqlite:///./test.db) created by the test fixtures.
Notes
- The governance endpoints use an in-process
RiskEngineinitialized at startup. - The outcome recording endpoint is not implemented in this repository and returns HTTP 501.