lvvignesh2122 commited on
Commit
8c67043
·
1 Parent(s): 6eacca0

docs: improve evaluation section in README to focus on workflow

Browse files
Files changed (1) hide show
  1. README.md +7 -5
README.md CHANGED
@@ -9,7 +9,9 @@ pinned: false
9
 
10
  # 🧠 NexusGraph AI
11
 
12
- > **High Distinction Project**: An advanced "Agentic" Retrieval-Augmented Generation system that uses Graph Theory (LangGraph), Structural Retrieval (SQL), and Self-Correction to answer complex queries.
 
 
13
 
14
  ## 🚀 The "Master's Level" Difference
15
 
@@ -96,10 +98,10 @@ pytest
96
  ---
97
 
98
  ## 📊 Evaluation (The Science)
99
- We use an **LLM-as-a-Judge** approach (`run_evals.py`) to measure:
100
- * **Faithfulness**: Is the answer hallucinated?
101
- * **Relevancy**: Did we answer the prompt?
102
- * *Current Benchmarks*: ~0.92 Faithfulness / 0.89 Relevancy.
103
 
104
  ---
105
 
 
9
 
10
  # 🧠 NexusGraph AI
11
 
12
+ > **High Distinction Project**: An advanced "Agentic" Retrieval-Augmented Generation system that uses Graph Theory (LangGraph), Structured Retrieval (SQLite), and Self-Correction to answer complex queries.
13
+
14
+ *This repository contains the codebase for **NexusGraph AI**, deployed live on Hugging Face Spaces as [Gemini-Rag-Fastapi-Pro](https://huggingface.co/spaces/lvvignesh2122/Gemini-Rag-Fastapi-Pro).*
15
 
16
  ## 🚀 The "Master's Level" Difference
17
 
 
98
  ---
99
 
100
  ## 📊 Evaluation (The Science)
101
+ We use an **LLM-as-a-Judge** approach (`run_evals.py`) to programmatically score queries based on:
102
+ * **Faithfulness**: Verifying if the answer is derived strictly from the context (hallucination detection).
103
+ * **Relevancy**: Measuring how directly the answer addresses the user query.
104
+ * *Audit Execution*: Running `python run_evals.py` parses the production logs (`rag_eval_logs.jsonl`) and generates average system metrics.
105
 
106
  ---
107