petter2025's picture
Update README.md
6ade678 verified
---
title: ARF API Control Plane
sdk: docker
colorFrom: blue
colorTo: green
---
# arf-api
ARF API Control Plane (FastAPI)
## Live Demo
The API is deployed and accessible at:
- **Base URL**: [https://a-r-f-agentic-reliability-framework-api.hf.space](https://a-r-f-agentic-reliability-framework-api.hf.space)
- **Interactive Documentation**: [https://a-r-f-agentic-reliability-framework-api.hf.space/docs](https://a-r-f-agentic-reliability-framework-api.hf.space/docs)
## Quick Start (Local Development)
1. **Install dependencies**:
```bash
pip install -r requirements.txt
```
Note: `requirements.txt` installs `agentic-reliability-framework` directly from the project's Git repository.
2. **Set environment variables** (optional, in `.env`):
```text
ARF_HMC_MODEL – path to HMC model JSON (default: models/hmc_model.json)
ARF_USE_HYPERPRIORS – true/false
API_KEY – optional (currently not enforced)
```
3. **Run the app locally**:
```bash
uvicorn app.main:app --reload --port 8000
```
4. **Health check**:
```bash
GET http://localhost:8000/health
```
## Causal Explainer Endpoint
The ARF API includes a heuristic causal explainer that evaluates the impact of proposed healing actions using deterministic rules. This module provides counterfactual reasoning without requiring a fitted causal model or external ML dependencies.
The explainer estimates how system metrics such as latency would change if a different action were taken.
### Mathematical Model
The counterfactual outcome is computed as:
```text
counterfactual_outcome = factual_outcome * (1 + effect_frac)
```
Where:
- `effect_frac` is a predefined impact factor based on the action type
- effects are multiplicative
- a fixed ±10% uncertainty interval is applied to the estimated outcome
### Example Request
```bash
curl -X POST "http://localhost:8000/api/v1/v1/incidents/evaluate" -H "Content-Type: application/json" -d '{
"component": "checkout-service",
"latency_p99": 600,
"error_rate": 0.2,
"service_mesh": "default"
}'
```
### Example Response
```json
{
"healing_intent": {
"action": "restart_container",
"component": "checkout-service",
"parameters": {},
"justification": "Causal: If we apply restart_container instead of no_action, latency would change from 600.00 to 510.00 (Δ = -90.00). Based on heuristic causal model.",
"confidence": 0.85,
"risk_score": 0.54,
"status": "oss_advisory_only"
},
"causal_explanation": {
"factual_outcome": 600,
"counterfactual_outcome": 510,
"effect": -90,
"explanation_text": "If we apply restart_container instead of no_action, latency would change from 600.00 to 510.00 (Δ = -90.00). Based on heuristic causal model.",
"is_model_based": false,
"warnings": [
"Using heuristic causal model (no fitted SCM)."
]
},
"utility_decision": {
"best_action": "restart_container",
"expected_utility": 0.5,
"explanation": "Heuristic decision based on latency/error thresholds"
}
}
```
### Important Notes
- This endpoint is advisory only (`status = oss_advisory_only`)
- No Structural Causal Model (SCM) is fitted
- No machine learning models are used
- All effects are based on predefined heuristics
Tests
-----
Run `pytest`. Tests use a temporary SQLite DB (`sqlite:///./test.db`) created by the test fixtures.
Notes
-----
- The governance endpoints use an in-process `RiskEngine` initialized at startup.
- The outcome recording endpoint is not implemented in this repository and returns HTTP 501.