# arf-api

ARF API Control Plane (FastAPI)

## Live Demo

The API is deployed and accessible at:
- **Base URL**: [https://a-r-f-agentic-reliability-framework-api.hf.space](https://a-r-f-agentic-reliability-framework-api.hf.space)
- **Interactive Documentation**: [https://a-r-f-agentic-reliability-framework-api.hf.space/docs](https://a-r-f-agentic-reliability-framework-api.hf.space/docs)

## Quick Start (Local Development)

1. **Install dependencies**:
```bash
pip install -r requirements.txt
```

Note: `requirements.txt` installs `agentic-reliability-framework` directly from the project's Git repository.

2. **Set environment variables** (optional, in `.env`):

```text
ARF_HMC_MODEL – path to HMC model JSON (default: models/hmc_model.json)

ARF_USE_HYPERPRIORS – true/false

API_KEY – optional (currently not enforced)
```

3. **Run the app locally**:

```bash
uvicorn app.main:app --reload --port 8000
```

4. **Health check**:

```bash
GET http://localhost:8000/health
```

## Causal Explainer Endpoint

The ARF API includes a heuristic causal explainer that evaluates the impact of proposed healing actions using deterministic rules. This module provides counterfactual reasoning without requiring a fitted causal model or external ML dependencies.

The explainer estimates how system metrics such as latency would change if a different action were taken.

### Mathematical Model

The counterfactual outcome is computed as:

```text
counterfactual_outcome = factual_outcome * (1 + effect_frac)
```

Where:

- `effect_frac` is a predefined impact factor based on the action type
- effects are multiplicative
- a fixed ±10% uncertainty interval is applied to the estimated outcome

### Example Request

```bash
curl -X POST "http://localhost:8000/api/v1/v1/incidents/evaluate"   -H "Content-Type: application/json"   -d '{
    "component": "checkout-service",
    "latency_p99": 600,
    "error_rate": 0.2,
    "service_mesh": "default"
  }'
```

### Example Response

```json
{
  "healing_intent": {
    "action": "restart_container",
    "component": "checkout-service",
    "parameters": {},
    "justification": "Causal: If we apply restart_container instead of no_action, latency would change from 600.00 to 510.00 (Δ = -90.00). Based on heuristic causal model.",
    "confidence": 0.85,
    "risk_score": 0.54,
    "status": "oss_advisory_only"
  },
  "causal_explanation": {
    "factual_outcome": 600,
    "counterfactual_outcome": 510,
    "effect": -90,
    "explanation_text": "If we apply restart_container instead of no_action, latency would change from 600.00 to 510.00 (Δ = -90.00). Based on heuristic causal model.",
    "is_model_based": false,
    "warnings": [
      "Using heuristic causal model (no fitted SCM)."
    ]
  },
  "utility_decision": {
    "best_action": "restart_container",
    "expected_utility": 0.5,
    "explanation": "Heuristic decision based on latency/error thresholds"
  }
}
```

### Important Notes

- This endpoint is advisory only (`status = oss_advisory_only`)
- No Structural Causal Model (SCM) is fitted
- No machine learning models are used
- All effects are based on predefined heuristics

Tests
-----

Run `pytest`. Tests use a temporary SQLite DB (`sqlite:///./test.db`) created by the test fixtures.

Notes
-----

- The governance endpoints use an in-process `RiskEngine` initialized at startup.
- The outcome recording endpoint is not implemented in this repository and returns HTTP 501.