Spaces:

A-R-F
/

Agentic-Reliability-Framework-API

Running

App Files Files Community

Agentic-Reliability-Framework-API / README.md

petter2025

Upload folder using huggingface_hub

afa4de7 verified about 20 hours ago

preview code

raw

history blame

3.46 kB

	# arf-api

	ARF API Control Plane (FastAPI)

	## Live Demo

	The API is deployed and accessible at:
	- Base URL: [https://a-r-f-agentic-reliability-framework-api.hf.space](https://a-r-f-agentic-reliability-framework-api.hf.space)
	- Interactive Documentation: [https://a-r-f-agentic-reliability-framework-api.hf.space/docs](https://a-r-f-agentic-reliability-framework-api.hf.space/docs)

	## Quick Start (Local Development)

	1. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	Note: `requirements.txt` installs `agentic-reliability-framework` directly from the project's Git repository.

	2. Set environment variables (optional, in `.env`):

	```text
	ARF_HMC_MODEL – path to HMC model JSON (default: models/hmc_model.json)

	ARF_USE_HYPERPRIORS – true/false

	API_KEY – optional (currently not enforced)
	```

	3. Run the app locally:

	```bash
	uvicorn app.main:app --reload --port 8000
	```

	4. Health check:

	```bash
	GET http://localhost:8000/health
	```

	## Causal Explainer Endpoint

	The ARF API includes a heuristic causal explainer that evaluates the impact of proposed healing actions using deterministic rules. This module provides counterfactual reasoning without requiring a fitted causal model or external ML dependencies.

	The explainer estimates how system metrics such as latency would change if a different action were taken.

	### Mathematical Model

	The counterfactual outcome is computed as:

	```text
	counterfactual_outcome = factual_outcome * (1 + effect_frac)
	```

	Where:

	- `effect_frac` is a predefined impact factor based on the action type
	- effects are multiplicative
	- a fixed ±10% uncertainty interval is applied to the estimated outcome

	### Example Request

	```bash
	curl -X POST "http://localhost:8000/api/v1/v1/incidents/evaluate" -H "Content-Type: application/json" -d '{
	"component": "checkout-service",
	"latency_p99": 600,
	"error_rate": 0.2,
	"service_mesh": "default"
	}'
	```

	### Example Response

	```json
	{
	"healing_intent": {
	"action": "restart_container",
	"component": "checkout-service",
	"parameters": {},
	"justification": "Causal: If we apply restart_container instead of no_action, latency would change from 600.00 to 510.00 (Δ = -90.00). Based on heuristic causal model.",
	"confidence": 0.85,
	"risk_score": 0.54,
	"status": "oss_advisory_only"
	},
	"causal_explanation": {
	"factual_outcome": 600,
	"counterfactual_outcome": 510,
	"effect": -90,
	"explanation_text": "If we apply restart_container instead of no_action, latency would change from 600.00 to 510.00 (Δ = -90.00). Based on heuristic causal model.",
	"is_model_based": false,
	"warnings": [
	"Using heuristic causal model (no fitted SCM)."
	]
	},
	"utility_decision": {
	"best_action": "restart_container",
	"expected_utility": 0.5,
	"explanation": "Heuristic decision based on latency/error thresholds"
	}
	}
	```

	### Important Notes

	- This endpoint is advisory only (`status = oss_advisory_only`)
	- No Structural Causal Model (SCM) is fitted
	- No machine learning models are used
	- All effects are based on predefined heuristics

	Tests
	-----

	Run `pytest`. Tests use a temporary SQLite DB (`sqlite:///./test.db`) created by the test fixtures.

	Notes
	-----

	- The governance endpoints use an in-process `RiskEngine` initialized at startup.
	- The outcome recording endpoint is not implemented in this repository and returns HTTP 501.