sagar4tech commited on
Commit
a83190d
·
verified ·
1 Parent(s): 5325af2

Upload 16 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ mlflow.db filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,373 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GraphRecSys
2
+
3
+ Production-style recommendation system that combines graph retrieval, causal debiasing, multi-objective ranking, calibrated probabilities, vector search, and low-latency serving.
4
+
5
+ This project is designed as an end-to-end recommender systems portfolio piece: it starts from raw KuaiRec interaction logs, trains a debiased LightGCN retrieval model, indexes item embeddings with FAISS, ranks candidates with an MMoE multi-task model, calibrates click probabilities, and serves personalized recommendations through FastAPI with Redis-backed embedding caching.
6
+
7
+ ## Why This Project Exists
8
+
9
+ Most recommender demos stop at model training. Real recommendation systems are pipelines: data quality, retrieval, ranking, calibration, serving latency, offline evaluation, and product trade-offs all matter at the same time.
10
+
11
+ GraphRec-MultiOpt demonstrates those production concerns in one coherent system:
12
+
13
+ - **Retrieval:** graph collaborative filtering with LightGCN.
14
+ - **Debiasing:** inverse propensity weighting to reduce exposure bias.
15
+ - **Ranking:** multi-task MMoE for click probability and expected value.
16
+ - **Calibration:** Platt scaling and reliability diagrams for trustworthy probabilities.
17
+ - **Serving:** FastAPI endpoint with FAISS candidate retrieval, Redis cache, scalarization, and diversity reranking.
18
+ - **Decision support:** mock A/B simulation and Pareto frontier analysis for engagement vs. value trade-offs.
19
+
20
+ ## System Architecture
21
+
22
+ ```mermaid
23
+ flowchart LR
24
+ raw["KuaiRec raw logs"] --> loader["Schema validation + labels"]
25
+ loader --> split["Temporal train/val/test split"]
26
+ split --> graph_data["PyG bipartite graph"]
27
+ split --> propensity["Item propensity estimates"]
28
+
29
+ graph_data --> lightgcn["LightGCN retrieval"]
30
+ propensity --> lightgcn
31
+ lightgcn --> embeddings["User/item embeddings"]
32
+
33
+ embeddings --> faiss["FAISS IVF-PQ index"]
34
+ embeddings --> features["Ranking feature builder"]
35
+ split --> features
36
+ features --> mmoe["MMoE ranker"]
37
+ mmoe --> calibration["Platt calibration"]
38
+
39
+ faiss --> api["FastAPI /recommend"]
40
+ mmoe --> api
41
+ calibration --> api
42
+ redis["Redis embedding cache"] --> api
43
+ api --> response["Top-10 recommendations"]
44
+
45
+ mmoe --> ab["Mock A/B simulation"]
46
+ ab --> pareto["Pareto frontier"]
47
+ ```
48
+
49
+ ## Technical Highlights
50
+
51
+ | Area | Implementation |
52
+ |---|---|
53
+ | Dataset | KuaiRec dense multi-action logs |
54
+ | Retrieval | LightGCN with 3 graph propagation layers |
55
+ | Retrieval loss | BPR with optional inverse propensity weighting |
56
+ | Negative sampling | Uniform sampler with API reserved for hard negatives |
57
+ | Vector search | FAISS IVF-PQ, configurable `nprobe` |
58
+ | Ranking model | Multi-gate Mixture-of-Experts with click and value towers |
59
+ | Ranking targets | `label_click = watch_ratio >= 0.5`, `label_value = log1p(watch_ratio)` |
60
+ | Calibration | Platt scaling on validation logits |
61
+ | Diversity | Maximal Marginal Relevance reranking |
62
+ | Serving | Async FastAPI app with latency breakdown |
63
+ | Cache | Redis user embedding cache with TTL |
64
+ | Evaluation | Recall@K, NDCG@K, AUC, MSE/RMSE, ECE, latency, Pareto sweep |
65
+ | Tracking | MLflow metrics and artifacts |
66
+
67
+ ## Repository Layout
68
+
69
+ ```text
70
+ .
71
+ ├── data/
72
+ │ ├── download.py
73
+ │ ├── raw/
74
+ │ └── processed/
75
+ ├── src/
76
+ │ ├── data/ # loading, splitting, graph construction, propensity
77
+ │ ├── retrieval/ # LightGCN, BPR, negative sampling, retrieval eval
78
+ │ ├── indexing/ # FAISS index build/query/benchmark
79
+ │ ├── ranking/ # feature builder, MMoE, calibration, ranking eval
80
+ │ ├── serving/ # FastAPI, Redis cache, schemas, scoring
81
+ │ └── evaluation/ # A/B simulation, Pareto frontier, results report
82
+ ├── configs/
83
+ ├── tests/
84
+ ├── scripts/
85
+ ├── outputs/
86
+ ├── checkpoints/
87
+ ├── Dockerfile
88
+ ├── implementation_plan.md
89
+ └── recsys_architecture.md
90
+ ```
91
+
92
+ ## Modeling Approach
93
+
94
+ ### 1. Data And Labels
95
+
96
+ The data layer validates KuaiRec interaction logs, derives model targets, and creates train/validation/test splits.
97
+
98
+ ```python
99
+ label_click = (watch_ratio >= 0.5).astype(int)
100
+ label_value = np.log1p(watch_ratio)
101
+ ```
102
+
103
+ The graph builder creates a PyTorch Geometric `HeteroData` bipartite graph:
104
+
105
+ - Node types: `user`, `item`
106
+ - Edge type: `("user", "interacts", "item")`
107
+ - Reverse edge type for message passing
108
+ - Edge weights from clipped watch ratio
109
+
110
+ ### 2. Debiased Retrieval
111
+
112
+ The retrieval stage trains LightGCN using Bayesian Personalized Ranking:
113
+
114
+ ```text
115
+ loss = -mean(IPS(item) * log sigmoid(score(user, positive) - score(user, negative)))
116
+ ```
117
+
118
+ The IPS term upweights less frequently exposed items, reducing the tendency of the retrieval model to overfit historical exposure patterns.
119
+
120
+ ### 3. Multi-Objective Ranking
121
+
122
+ The ranking model uses MMoE to optimize two related objectives:
123
+
124
+ - **pClick tower:** calibrated probability that the user meaningfully engages.
125
+ - **E-value tower:** expected value proxy based on watch ratio.
126
+
127
+ Ranking features combine:
128
+
129
+ - user embedding
130
+ - item embedding
131
+ - time/session context
132
+ - item duration
133
+ - category representation
134
+
135
+ Total feature dimension: `1046`.
136
+
137
+ ### 4. Serving-Time Optimization
138
+
139
+ The serving endpoint follows the same shape used by production recommendation stacks:
140
+
141
+ 1. Fetch user embedding from Redis or local embedding table.
142
+ 2. Retrieve top-K candidates from FAISS.
143
+ 3. Build ranking features for candidates.
144
+ 4. Score candidates with MMoE.
145
+ 5. Apply Platt calibration.
146
+ 6. Scalarize engagement and value.
147
+ 7. Apply MMR diversity reranking.
148
+ 8. Return top-10 items with latency breakdown.
149
+
150
+ ## Quickstart
151
+
152
+ ### Install
153
+
154
+ ```bash
155
+ python -m venv .venv
156
+ source .venv/bin/activate
157
+ pip install -r requirements.txt
158
+ ```
159
+
160
+ ### Run The Pipeline
161
+
162
+ ```bash
163
+ bash scripts/run_pipeline.sh
164
+ ```
165
+
166
+ The pipeline follows the architecture sequence:
167
+
168
+ ```text
169
+ download -> preprocess -> graph -> propensity -> LightGCN -> FAISS -> ranking -> calibration -> evaluation -> serving
170
+ ```
171
+
172
+ For raw data without timestamps, the split script can use a deterministic fallback:
173
+
174
+ ```bash
175
+ python -m src.data.splits --allow_no_timestamp
176
+ ```
177
+
178
+ ### Run FAISS Benchmark
179
+
180
+ ```bash
181
+ bash scripts/run_benchmark.sh
182
+ ```
183
+
184
+ Benchmark output is written to:
185
+
186
+ ```text
187
+ outputs/faiss_benchmark.csv
188
+ ```
189
+
190
+ ## Serving API
191
+
192
+ Start the service:
193
+
194
+ ```bash
195
+ uvicorn src.serving.app:app --host 0.0.0.0 --port 8000
196
+ ```
197
+
198
+ Health check:
199
+
200
+ ```bash
201
+ curl http://localhost:8000/health
202
+ ```
203
+
204
+ Recommendation request:
205
+
206
+ ```bash
207
+ curl http://localhost:8000/recommend/0
208
+ ```
209
+
210
+ Example response shape:
211
+
212
+ ```json
213
+ {
214
+ "user_id": 0,
215
+ "items": [
216
+ {
217
+ "item_id": 123,
218
+ "p_click": 0.71,
219
+ "e_value": 1.42,
220
+ "final_score": 0.82
221
+ }
222
+ ],
223
+ "retrieval_latency_ms": 6.4,
224
+ "ranking_latency_ms": 14.8,
225
+ "total_latency_ms": 23.1,
226
+ "cache_hit": true
227
+ }
228
+ ```
229
+
230
+ Prometheus-compatible metrics:
231
+
232
+ ```bash
233
+ curl http://localhost:8000/metrics
234
+ ```
235
+
236
+ Reload model artifacts:
237
+
238
+ ```bash
239
+ curl -X POST http://localhost:8000/reload
240
+ ```
241
+
242
+ ## Evaluation
243
+
244
+ The project evaluates recommender quality at multiple layers.
245
+
246
+ | Layer | Metrics |
247
+ |---|---|
248
+ | Retrieval | Recall@10, Recall@20, Recall@50, Recall@500, NDCG@10 |
249
+ | Ranking | ROC-AUC, MSE, RMSE |
250
+ | Calibration | ECE before/after Platt scaling, reliability curve |
251
+ | Serving | p50, p95, p99 latency |
252
+ | Product trade-off | Simulated CTR, GMV proxy, diversity, Pareto frontier |
253
+
254
+ Generate the final results table:
255
+
256
+ ```bash
257
+ python -m src.evaluation.report
258
+ ```
259
+
260
+ Outputs:
261
+
262
+ ```text
263
+ outputs/results_table.csv
264
+ outputs/results_table.md
265
+ outputs/calibration_curve.png
266
+ outputs/pareto_curve.png
267
+ ```
268
+
269
+ ## Results
270
+
271
+ Metrics are generated after running the full pipeline. This table is intentionally artifact-driven so reported numbers come from reproducible runs rather than hand-edited README values.
272
+
273
+ | Metric | LightGCN + IPS | MMoE single-task | MMoE multi-task |
274
+ |:-----------------|:-----------------|:-------------------|:------------------|
275
+ | Recall@500 | 0.0011 | - | - |
276
+ | NDCG@10 | 0.0443 | - | - |
277
+ | AUC (pClick) | - | 0.8319 | 0.8223 |
278
+ | ECE (after cal.) | - | - | 0.0677 |
279
+ | MSE (E-value) | - | 0.1172 | 0.0787 |
280
+ | p50 latency ms | 0.04 | - | - |
281
+ | p99 latency ms | 0.13 | - | - |
282
+
283
+ ## Configuration
284
+
285
+ The system is config-driven:
286
+
287
+ - `configs/retrieval.yaml`
288
+ - `configs/ranking.yaml`
289
+ - `configs/serving.yaml`
290
+
291
+ Examples:
292
+
293
+ ```yaml
294
+ model:
295
+ emb_dim: 512
296
+ num_layers: 3
297
+
298
+ training:
299
+ lr: 1.0e-3
300
+ batch_size: 4096
301
+ epochs: 100
302
+
303
+ ips:
304
+ clip_max: 10.0
305
+ ```
306
+
307
+ Serving trade-offs can be tuned without changing model code:
308
+
309
+ ```yaml
310
+ scoring:
311
+ w_engagement: 0.6
312
+ w_revenue: 0.4
313
+ lambda_diversity: 0.3
314
+ top_n_serve: 10
315
+ ```
316
+
317
+ ## Docker
318
+
319
+ Build:
320
+
321
+ ```bash
322
+ docker build -t graphrec-multiopt .
323
+ ```
324
+
325
+ Run:
326
+
327
+ ```bash
328
+ docker run -p 8000:8000 graphrec-multiopt
329
+ ```
330
+
331
+ For real experiments, mount model artifacts and processed data as volumes:
332
+
333
+ ```bash
334
+ docker run \
335
+ -p 8000:8000 \
336
+ -v "$(pwd)/data:/app/data" \
337
+ -v "$(pwd)/checkpoints:/app/checkpoints" \
338
+ graphrec-multiopt
339
+ ```
340
+
341
+ ## Engineering Notes
342
+
343
+ This repository is structured to show senior-level recommender systems judgment:
344
+
345
+ - Separates retrieval and ranking instead of forcing one model to do both.
346
+ - Includes causal debiasing through IPS rather than optimizing only observed engagement.
347
+ - Treats probability calibration as a first-class serving concern.
348
+ - Uses vector search and caching to reflect real serving constraints.
349
+ - Adds diversity reranking to avoid purely exploitative recommendations.
350
+ - Exposes business-level trade-offs through scalarization and Pareto analysis.
351
+ - Keeps training, serving, and evaluation configuration outside model code.
352
+
353
+ ## Known Limitations
354
+
355
+ - KuaiRec timestamp availability varies by source file; the splitter supports temporal mode when timestamps are present and an explicit deterministic fallback otherwise.
356
+ - The current hard-negative sampling interface is reserved, while uniform negative sampling is implemented.
357
+ - Full reported metrics require running the pipeline on the downloaded dataset.
358
+ - Redis is optional for local development but recommended for serving realism.
359
+ - FAISS IVF-PQ configuration may need scaling down for tiny smoke-test datasets.
360
+
361
+ ## Roadmap
362
+
363
+ - Add hard negative sampling from FAISS retrieval misses.
364
+ - Add popularity and matrix-factorization baselines.
365
+ - Add online feature store abstraction for serving-time context.
366
+ - Add load tests for concurrent recommendation traffic.
367
+ - [x] Add Docker Compose for API + Redis + MLflow.
368
+ - [x] Add CI workflow for unit tests, linting, and smoke-mode pipeline execution.
369
+
370
+ ## Resume Summary
371
+
372
+ Built an end-to-end production-style recommendation system using PyTorch, PyTorch Geometric, FAISS, Redis, FastAPI, and MLflow. Implemented LightGCN retrieval with IPS debiasing, MMoE multi-task ranking, Platt calibration, MMR diversity reranking, vector-search serving, offline A/B simulation, and Pareto frontier analysis for engagement/value trade-off optimization.
373
+
checkpoints/lightgcn_best.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b4d4e8c17e2fb9d2eb5dfcbad499cba90fd395d0d5168622e892c1b98087d01
3
+ size 36373013
checkpoints/lightgcn_best_no_ips.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8e7699fe7084f4e5ccc70622440b7e2ffb285d04f1c0289a068f43994cf57006
3
+ size 36373069
checkpoints/mmoe_best.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b12c6df6df49efd17453b4b9ea83b2f7ce3938e5550ad01a9078d69fff42e10
3
+ size 9834895
checkpoints/single_task_click.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a353ac0c461504c4d451e571609de6379a1fb3a60168158815bcf1d8de11a8be
3
+ size 7050277
checkpoints/single_task_value.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e44e057a59d27df654c80f2a738b599f691763c459f9f6758521fe9185b9989a
3
+ size 7050277
mlflow.db ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f4a30daf015d13f5ae519f17465f5e3588ae03ca07e1d5ce9b7ec668c39bbc08
3
+ size 712704
outputs/ab_simulation.csv ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ config_name,mean_ctr,mean_gmv,mean_diversity
2
+ engagement_only,0.7879629851992801,0.6699224461231732,0.1296688796019064
3
+ eng_heavy,0.7860530466158114,0.6683910443037643,0.1310944389908302
4
+ balanced_60_40,0.7852833735275404,0.6677558209184719,0.13160043393019494
5
+ balanced_50_50,0.7851963786686844,0.667657224486518,0.1316552643316338
6
+ balanced_40_60,0.7850962956267667,0.6675383233584722,0.1317110315947394
7
+ rev_heavy,0.7850037807272648,0.66743280691597,0.13175905591456455
8
+ revenue_only,0.7849211391196194,0.6673545125188091,0.13178014269456384
9
+ diversity_boost,0.8054681297399574,0.6897920035116222,0.11168996680298231
10
+ no_diversity,0.6588650968884171,0.522946486440979,0.14319167434535576
outputs/calibration_curve.png ADDED
outputs/faiss_benchmark.csv ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ index_size,nprobe,p50_ms,p95_ms,p99_ms,recall_at_10
2
+ 1000,4,0.039127499803726096,0.10123260008185743,0.13328411967449938,
outputs/pareto_curve.png ADDED
outputs/pareto_frontier.csv ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ config_name,mean_ctr,mean_gmv,mean_diversity,is_pareto
2
+ diversity_boost,0.8054681297399574,0.6897920035116222,0.1116899668029823,True
outputs/ranking_metrics.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "auc_click": 0.8223280482233246,
3
+ "auc_click_cal": 0.8223280486775686,
4
+ "ece_before": 0.07843196266342997,
5
+ "ece_after": 0.06774758427485816,
6
+ "mse_value": 0.07866260409355164,
7
+ "rmse_value": 0.28046854385750936,
8
+ "ablation_auc_single_click": 0.8319328938440997,
9
+ "ablation_mse_single_value": 0.11718457192182541,
10
+ "mmoe_vs_single_auc_delta": -0.009604845620775126
11
+ }
outputs/results_table.csv ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ Metric,LightGCN + IPS,MMoE single-task,MMoE multi-task
2
+ Recall@500,0.0011,-,-
3
+ NDCG@10,0.0443,-,-
4
+ AUC (pClick),-,0.8319,0.8223
5
+ ECE (after cal.),-,-,0.0677
6
+ MSE (E-value),-,0.1172,0.0787
7
+ p50 latency ms,0.04,-,-
8
+ p99 latency ms,0.13,-,-
outputs/results_table.md ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ | Metric | LightGCN + IPS | MMoE single-task | MMoE multi-task |
2
+ |:-----------------|:-----------------|:-------------------|:------------------|
3
+ | Recall@500 | 0.0011 | - | - |
4
+ | NDCG@10 | 0.0443 | - | - |
5
+ | AUC (pClick) | - | 0.8319 | 0.8223 |
6
+ | ECE (after cal.) | - | - | 0.0677 |
7
+ | MSE (E-value) | - | 0.1172 | 0.0787 |
8
+ | p50 latency ms | 0.04 | - | - |
9
+ | p99 latency ms | 0.13 | - | - |
outputs/retrieval_metrics.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "Recall@10": 0.0009196945210176058,
3
+ "NDCG@10": 0.04432149863105337,
4
+ "Recall@20": 0.001114816652242921,
5
+ "NDCG@20": 0.03212390658160397,
6
+ "Recall@50": 0.0011345188621706159,
7
+ "NDCG@50": 0.017680448036093564,
8
+ "Recall@500": 0.0011345188621706159,
9
+ "NDCG@500": 0.0034158566325525165
10
+ }