sagar4tech commited on 4 days ago

Commit

a83190d

verified ·

1 Parent(s): 5325af2

Upload 16 files

Browse files

Files changed (17) hide show

.gitattributes +1 -0
README.md +373 -3
checkpoints/lightgcn_best.pt +3 -0
checkpoints/lightgcn_best_no_ips.pt +3 -0
checkpoints/mmoe_best.pt +3 -0
checkpoints/single_task_click.pt +3 -0
checkpoints/single_task_value.pt +3 -0
mlflow.db +3 -0
outputs/ab_simulation.csv +10 -0
outputs/calibration_curve.png +0 -0
outputs/faiss_benchmark.csv +2 -0
outputs/pareto_curve.png +0 -0
outputs/pareto_frontier.csv +2 -0
outputs/ranking_metrics.json +11 -0
outputs/results_table.csv +8 -0
outputs/results_table.md +9 -0
outputs/retrieval_metrics.json +10 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+mlflow.db filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,373 @@
----
-license: mit
----

+# GraphRecSys
+Production-style recommendation system that combines graph retrieval, causal debiasing, multi-objective ranking, calibrated probabilities, vector search, and low-latency serving.
+This project is designed as an end-to-end recommender systems portfolio piece: it starts from raw KuaiRec interaction logs, trains a debiased LightGCN retrieval model, indexes item embeddings with FAISS, ranks candidates with an MMoE multi-task model, calibrates click probabilities, and serves personalized recommendations through FastAPI with Redis-backed embedding caching.
+## Why This Project Exists
+Most recommender demos stop at model training. Real recommendation systems are pipelines: data quality, retrieval, ranking, calibration, serving latency, offline evaluation, and product trade-offs all matter at the same time.
+GraphRec-MultiOpt demonstrates those production concerns in one coherent system:
+- **Retrieval:** graph collaborative filtering with LightGCN.
+- **Debiasing:** inverse propensity weighting to reduce exposure bias.
+- **Ranking:** multi-task MMoE for click probability and expected value.
+- **Calibration:** Platt scaling and reliability diagrams for trustworthy probabilities.
+- **Serving:** FastAPI endpoint with FAISS candidate retrieval, Redis cache, scalarization, and diversity reranking.
+- **Decision support:** mock A/B simulation and Pareto frontier analysis for engagement vs. value trade-offs.
+## System Architecture
+```mermaid
+flowchart LR
+    raw["KuaiRec raw logs"] --> loader["Schema validation + labels"]
+    loader --> split["Temporal train/val/test split"]
+    split --> graph_data["PyG bipartite graph"]
+    split --> propensity["Item propensity estimates"]
+    graph_data --> lightgcn["LightGCN retrieval"]
+    propensity --> lightgcn
+    lightgcn --> embeddings["User/item embeddings"]
+    embeddings --> faiss["FAISS IVF-PQ index"]
+    embeddings --> features["Ranking feature builder"]
+    split --> features
+    features --> mmoe["MMoE ranker"]
+    mmoe --> calibration["Platt calibration"]
+    faiss --> api["FastAPI /recommend"]
+    mmoe --> api
+    calibration --> api
+    redis["Redis embedding cache"] --> api
+    api --> response["Top-10 recommendations"]
+    mmoe --> ab["Mock A/B simulation"]
+    ab --> pareto["Pareto frontier"]
+```
+## Technical Highlights
+| Area | Implementation |
+|---|---|
+| Dataset | KuaiRec dense multi-action logs |
+| Retrieval | LightGCN with 3 graph propagation layers |
+| Retrieval loss | BPR with optional inverse propensity weighting |
+| Negative sampling | Uniform sampler with API reserved for hard negatives |
+| Vector search | FAISS IVF-PQ, configurable `nprobe` |
+| Ranking model | Multi-gate Mixture-of-Experts with click and value towers |
+| Ranking targets | `label_click = watch_ratio >= 0.5`, `label_value = log1p(watch_ratio)` |
+| Calibration | Platt scaling on validation logits |
+| Diversity | Maximal Marginal Relevance reranking |
+| Serving | Async FastAPI app with latency breakdown |
+| Cache | Redis user embedding cache with TTL |
+| Evaluation | Recall@K, NDCG@K, AUC, MSE/RMSE, ECE, latency, Pareto sweep |
+| Tracking | MLflow metrics and artifacts |
+## Repository Layout
+```text
+.
+├── data/
+│   ├── download.py
+│   ├── raw/
+│   └── processed/
+├── src/
+│   ├── data/          # loading, splitting, graph construction, propensity
+│   ├── retrieval/     # LightGCN, BPR, negative sampling, retrieval eval
+│   ├── indexing/      # FAISS index build/query/benchmark
+│   ├── ranking/       # feature builder, MMoE, calibration, ranking eval
+│   ├── serving/       # FastAPI, Redis cache, schemas, scoring
+│   └── evaluation/    # A/B simulation, Pareto frontier, results report
+├── configs/
+├── tests/
+├── scripts/
+├── outputs/
+├── checkpoints/
+├── Dockerfile
+├── implementation_plan.md
+└── recsys_architecture.md
+```
+## Modeling Approach
+### 1. Data And Labels
+The data layer validates KuaiRec interaction logs, derives model targets, and creates train/validation/test splits.
+```python
+label_click = (watch_ratio >= 0.5).astype(int)
+label_value = np.log1p(watch_ratio)
+```
+The graph builder creates a PyTorch Geometric `HeteroData` bipartite graph:
+- Node types: `user`, `item`
+- Edge type: `("user", "interacts", "item")`
+- Reverse edge type for message passing
+- Edge weights from clipped watch ratio
+### 2. Debiased Retrieval
+The retrieval stage trains LightGCN using Bayesian Personalized Ranking:
+```text
+loss = -mean(IPS(item) * log sigmoid(score(user, positive) - score(user, negative)))
+```
+The IPS term upweights less frequently exposed items, reducing the tendency of the retrieval model to overfit historical exposure patterns.
+### 3. Multi-Objective Ranking
+The ranking model uses MMoE to optimize two related objectives:
+- **pClick tower:** calibrated probability that the user meaningfully engages.
+- **E-value tower:** expected value proxy based on watch ratio.
+Ranking features combine:
+- user embedding
+- item embedding
+- time/session context
+- item duration
+- category representation
+Total feature dimension: `1046`.
+### 4. Serving-Time Optimization
+The serving endpoint follows the same shape used by production recommendation stacks:
+1. Fetch user embedding from Redis or local embedding table.
+2. Retrieve top-K candidates from FAISS.
+3. Build ranking features for candidates.
+4. Score candidates with MMoE.
+5. Apply Platt calibration.
+6. Scalarize engagement and value.
+7. Apply MMR diversity reranking.
+8. Return top-10 items with latency breakdown.
+## Quickstart
+### Install
+```bash
+python -m venv .venv
+source .venv/bin/activate
+pip install -r requirements.txt
+```
+### Run The Pipeline
+```bash
+bash scripts/run_pipeline.sh
+```
+The pipeline follows the architecture sequence:
+```text
+download -> preprocess -> graph -> propensity -> LightGCN -> FAISS -> ranking -> calibration -> evaluation -> serving
+```
+For raw data without timestamps, the split script can use a deterministic fallback:
+```bash
+python -m src.data.splits --allow_no_timestamp
+```
+### Run FAISS Benchmark
+```bash
+bash scripts/run_benchmark.sh
+```
+Benchmark output is written to:
+```text
+outputs/faiss_benchmark.csv
+```
+## Serving API
+Start the service:
+```bash
+uvicorn src.serving.app:app --host 0.0.0.0 --port 8000
+```
+Health check:
+```bash
+curl http://localhost:8000/health
+```
+Recommendation request:
+```bash
+curl http://localhost:8000/recommend/0
+```
+Example response shape:
+```json
+{
+  "user_id": 0,
+  "items": [
+    {
+      "item_id": 123,
+      "p_click": 0.71,
+      "e_value": 1.42,
+      "final_score": 0.82
+    }
+  ],
+  "retrieval_latency_ms": 6.4,
+  "ranking_latency_ms": 14.8,
+  "total_latency_ms": 23.1,
+  "cache_hit": true
+}
+```
+Prometheus-compatible metrics:
+```bash
+curl http://localhost:8000/metrics
+```
+Reload model artifacts:
+```bash
+curl -X POST http://localhost:8000/reload
+```
+## Evaluation
+The project evaluates recommender quality at multiple layers.
+| Layer | Metrics |
+|---|---|
+| Retrieval | Recall@10, Recall@20, Recall@50, Recall@500, NDCG@10 |
+| Ranking | ROC-AUC, MSE, RMSE |
+| Calibration | ECE before/after Platt scaling, reliability curve |
+| Serving | p50, p95, p99 latency |
+| Product trade-off | Simulated CTR, GMV proxy, diversity, Pareto frontier |
+Generate the final results table:
+```bash
+python -m src.evaluation.report
+```
+Outputs:
+```text
+outputs/results_table.csv
+outputs/results_table.md
+outputs/calibration_curve.png
+outputs/pareto_curve.png
+```
+## Results
+Metrics are generated after running the full pipeline. This table is intentionally artifact-driven so reported numbers come from reproducible runs rather than hand-edited README values.
+| Metric           | LightGCN + IPS   | MMoE single-task   | MMoE multi-task   |
+|:-----------------|:-----------------|:-------------------|:------------------|
+| Recall@500       | 0.0011           | -                  | -                 |
+| NDCG@10          | 0.0443           | -                  | -                 |
+| AUC (pClick)     | -                | 0.8319             | 0.8223            |
+| ECE (after cal.) | -                | -                  | 0.0677            |
+| MSE (E-value)    | -                | 0.1172             | 0.0787            |
+| p50 latency ms   | 0.04             | -                  | -                 |
+| p99 latency ms   | 0.13             | -                  | -                 |
+## Configuration
+The system is config-driven:
+- `configs/retrieval.yaml`
+- `configs/ranking.yaml`
+- `configs/serving.yaml`
+Examples:
+```yaml
+model:
+  emb_dim: 512
+  num_layers: 3
+training:
+  lr: 1.0e-3
+  batch_size: 4096
+  epochs: 100
+ips:
+  clip_max: 10.0
+```
+Serving trade-offs can be tuned without changing model code:
+```yaml
+scoring:
+  w_engagement: 0.6
+  w_revenue: 0.4
+  lambda_diversity: 0.3
+  top_n_serve: 10
+```
+## Docker
+Build:
+```bash
+docker build -t graphrec-multiopt .
+```
+Run:
+```bash
+docker run -p 8000:8000 graphrec-multiopt
+```
+For real experiments, mount model artifacts and processed data as volumes:
+```bash
+docker run \
+  -p 8000:8000 \
+  -v "$(pwd)/data:/app/data" \
+  -v "$(pwd)/checkpoints:/app/checkpoints" \
+  graphrec-multiopt
+```
+## Engineering Notes
+This repository is structured to show senior-level recommender systems judgment:
+- Separates retrieval and ranking instead of forcing one model to do both.
+- Includes causal debiasing through IPS rather than optimizing only observed engagement.
+- Treats probability calibration as a first-class serving concern.
+- Uses vector search and caching to reflect real serving constraints.
+- Adds diversity reranking to avoid purely exploitative recommendations.
+- Exposes business-level trade-offs through scalarization and Pareto analysis.
+- Keeps training, serving, and evaluation configuration outside model code.
+## Known Limitations
+- KuaiRec timestamp availability varies by source file; the splitter supports temporal mode when timestamps are present and an explicit deterministic fallback otherwise.
+- The current hard-negative sampling interface is reserved, while uniform negative sampling is implemented.
+- Full reported metrics require running the pipeline on the downloaded dataset.
+- Redis is optional for local development but recommended for serving realism.
+- FAISS IVF-PQ configuration may need scaling down for tiny smoke-test datasets.
+## Roadmap
+- Add hard negative sampling from FAISS retrieval misses.
+- Add popularity and matrix-factorization baselines.
+- Add online feature store abstraction for serving-time context.
+- Add load tests for concurrent recommendation traffic.
+- [x] Add Docker Compose for API + Redis + MLflow.
+- [x] Add CI workflow for unit tests, linting, and smoke-mode pipeline execution.
+## Resume Summary
+Built an end-to-end production-style recommendation system using PyTorch, PyTorch Geometric, FAISS, Redis, FastAPI, and MLflow. Implemented LightGCN retrieval with IPS debiasing, MMoE multi-task ranking, Platt calibration, MMR diversity reranking, vector-search serving, offline A/B simulation, and Pareto frontier analysis for engagement/value trade-off optimization.

checkpoints/lightgcn_best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9b4d4e8c17e2fb9d2eb5dfcbad499cba90fd395d0d5168622e892c1b98087d01
+size 36373013

checkpoints/lightgcn_best_no_ips.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8e7699fe7084f4e5ccc70622440b7e2ffb285d04f1c0289a068f43994cf57006
+size 36373069

checkpoints/mmoe_best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6b12c6df6df49efd17453b4b9ea83b2f7ce3938e5550ad01a9078d69fff42e10
+size 9834895

checkpoints/single_task_click.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a353ac0c461504c4d451e571609de6379a1fb3a60168158815bcf1d8de11a8be
+size 7050277

checkpoints/single_task_value.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e44e057a59d27df654c80f2a738b599f691763c459f9f6758521fe9185b9989a
+size 7050277

mlflow.db ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f4a30daf015d13f5ae519f17465f5e3588ae03ca07e1d5ce9b7ec668c39bbc08
+size 712704

outputs/ab_simulation.csv ADDED Viewed

	@@ -0,0 +1,10 @@

+config_name,mean_ctr,mean_gmv,mean_diversity
+engagement_only,0.7879629851992801,0.6699224461231732,0.1296688796019064
+eng_heavy,0.7860530466158114,0.6683910443037643,0.1310944389908302
+balanced_60_40,0.7852833735275404,0.6677558209184719,0.13160043393019494
+balanced_50_50,0.7851963786686844,0.667657224486518,0.1316552643316338
+balanced_40_60,0.7850962956267667,0.6675383233584722,0.1317110315947394
+rev_heavy,0.7850037807272648,0.66743280691597,0.13175905591456455
+revenue_only,0.7849211391196194,0.6673545125188091,0.13178014269456384
+diversity_boost,0.8054681297399574,0.6897920035116222,0.11168996680298231
+no_diversity,0.6588650968884171,0.522946486440979,0.14319167434535576

outputs/calibration_curve.png ADDED Viewed

outputs/faiss_benchmark.csv ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ index_size,nprobe,p50_ms,p95_ms,p99_ms,recall_at_10
2	+ 1000,4,0.039127499803726096,0.10123260008185743,0.13328411967449938,

outputs/pareto_curve.png ADDED Viewed

outputs/pareto_frontier.csv ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ config_name,mean_ctr,mean_gmv,mean_diversity,is_pareto
2	+ diversity_boost,0.8054681297399574,0.6897920035116222,0.1116899668029823,True

outputs/ranking_metrics.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+  "auc_click": 0.8223280482233246,
+  "auc_click_cal": 0.8223280486775686,
+  "ece_before": 0.07843196266342997,
+  "ece_after": 0.06774758427485816,
+  "mse_value": 0.07866260409355164,
+  "rmse_value": 0.28046854385750936,
+  "ablation_auc_single_click": 0.8319328938440997,
+  "ablation_mse_single_value": 0.11718457192182541,
+  "mmoe_vs_single_auc_delta": -0.009604845620775126
+}

outputs/results_table.csv ADDED Viewed

	@@ -0,0 +1,8 @@

+Metric,LightGCN + IPS,MMoE single-task,MMoE multi-task
+Recall@500,0.0011,-,-
+NDCG@10,0.0443,-,-
+AUC (pClick),-,0.8319,0.8223
+ECE (after cal.),-,-,0.0677
+MSE (E-value),-,0.1172,0.0787
+p50 latency ms,0.04,-,-
+p99 latency ms,0.13,-,-

outputs/results_table.md ADDED Viewed

	@@ -0,0 +1,9 @@

+| Metric           | LightGCN + IPS   | MMoE single-task   | MMoE multi-task   |
+|:-----------------|:-----------------|:-------------------|:------------------|
+| Recall@500       | 0.0011           | -                  | -                 |
+| NDCG@10          | 0.0443           | -                  | -                 |
+| AUC (pClick)     | -                | 0.8319             | 0.8223            |
+| ECE (after cal.) | -                | -                  | 0.0677            |
+| MSE (E-value)    | -                | 0.1172             | 0.0787            |
+| p50 latency ms   | 0.04             | -                  | -                 |
+| p99 latency ms   | 0.13             | -                  | -                 |

outputs/retrieval_metrics.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "Recall@10": 0.0009196945210176058,
+  "NDCG@10": 0.04432149863105337,
+  "Recall@20": 0.001114816652242921,
+  "NDCG@20": 0.03212390658160397,
+  "Recall@50": 0.0011345188621706159,
+  "NDCG@50": 0.017680448036093564,
+  "Recall@500": 0.0011345188621706159,
+  "NDCG@500": 0.0034158566325525165
+}