| # ProtFunc Checklist |
| *Updated: 2026-04-16 | Goal: best GO-MF on insects + max cross-taxon transfer* |
| *Pipeline running: `artifacts/pipeline_run.log` | PID: bash 1963, python 1966* |
|
|
| --- |
|
|
| ## CRITICAL FINDINGS |
|
|
| | Finding | Impact | |
| |---------|--------| |
| | `graph_hpo_best.pth` val_fmax **0.9540**, test_fmax **0.9533**, CAFA **0.6536** β joint insect+mammal training | Massive jump over ablation A (0.8947/0.6338). No overfit. Mammals eval pending. | |
| | HPO only ran **2 trials** (target was 40) | Best params likely suboptimal. Re-run HPO with full budget. | |
| | Threshold A (current v3): ~1448 preds/protein, precision=0.002 | **Broken for inference**. Use threshold C (novelty-gated) instead. | |
| | All gen_ratios < 0.50, mammals n=7 only | Stats unreliable. Need β₯100 mammal proteins. | |
| | AF features (Model C) kill transfer: gen_ratio=0.02 | Never use `esm_all` features for cross-taxon models. | |
| | graph_hpo mammal gen_ratio=**0.233** (n=4672) | Joint insect+mammal HPO didn't significantly improve transfer vs ablation A (0.4347 at n=7, unreliable). Phase 7 needed. | |
| | `protfunc_v3_fixed.pth` referenced in server.py but NOT in HF Space | server.py loads it as priority β will silently fall back if missing. | |
|
|
| --- |
|
|
| ## Phase 1 β Ablation β
DONE |
|
|
| | Model | Val Fmax | Test Fmax | Test CAFA | Mammal gen_ratio | |
| |-------|----------|-----------|-----------|-----------------| |
| | A: ESM only (320d) | 0.8900 | 0.8947 | 0.6338 | 0.4347 β best | |
| | B: ESM+seq (331d) | β | 0.8999 | 0.6360 | 0.4167 | |
| | C: ESM+seq+AF (360d) | 0.8900 | 0.8902 | 0.6326 | 0.0225 β οΈ AF hurts | |
| |
| Winner: **Model A** (ESM only) β best insect+mammal balance. |
| |
| --- |
| |
| ## Phase 2 β HPO (graph_hpo joint pipeline) |
|
|
| - β
**2a.** HPO script ran β but only 2/40 trials completed |
| - β
**2b.** Best params saved: `graph_hpo/hpo_results.json` (hidden=2048, n_blocks=8, feat_level=esm_seq, score=0.7756) |
| - β
**2c.** Full train on best params done β `graph_hpo_best.pth`, val_fmax=**0.9533** |
| - β¬ **2d.** Re-run HPO with full 40 trials β current best may not be global optimum |
| ```bash |
| cd "/Users/siddhantbhat/Desktop/Research Files" |
| .venv/bin/python3 scripts/hpo.py \ |
| --mammal artifacts/generalization/mammal_full_v1.parquet \ |
| --n_trials 40 --epochs 20 --patience 6 --alpha 0.6 \ |
| --startup_trials 5 --warmup_steps 5 \ |
| --multivariate_tpe --group_tpe \ |
| --out artifacts/graph_hpo/hpo_results.json |
| ``` |
| - β
**2e.** Test eval on `graph_hpo_best.pth` β test_micro_fmax=**0.9533**, CAFA=**0.6536**, P=0.9553, R=0.9514, t*=0.94 |
| - β
**2f.** Mammal gen eval on `graph_hpo_best.pth` β n=4672, micro_fmax=**0.2224**, CAFA=0.201, **gen_ratio=0.233** β οΈ poor transfer |
| ```bash |
| .venv/bin/python3 scripts/eval_generalization.py \ |
| --checkpoint artifacts/graph_hpo/graph_hpo_best.pth \ |
| --thresholds artifacts/graph_hpo/graph_hpo_best_thresholds.json \ |
| --mlb "Important Files/mlb_public_v1.pkl" \ |
| --taxon_parquet artifacts/generalization/mammal_embeddings_v3.parquet \ |
| --taxon_name mammals_graph_hpo --obo go-basic.obo \ |
| --out artifacts/graph_hpo/generalization_results.json |
| ``` |
| |
| --- |
| |
| ## Phase 3 β Threshold Fix β οΈ BROKEN |
| |
| Current v3 thresholds output **1448 preds/protein at 0.2% precision** β unusable. |
| |
| - β
**3a.** Comparison run β `artifacts/threshold_comparison_results.json` |
| - β
**3b.** Winner: **C (novelty-gated)** β F1=0.0733, 6.69 preds/protein, novelty subset F1=0.2757 |
| - β
**3c.** Threshold comparison on `graph_hpo_best.pth` β A (per-label t*): P=0.143/R=0.984/F1=0.250/20.6preds β
use this | B: P=0.878/F1=0.219/0.43preds (83% zero) | CβB |
| ```bash |
| .venv/bin/python3 scripts/threshold_comparison.py |
| # Edit script to point to graph_hpo_best.pth first |
| ``` |
| - β¬ **3d.** Update server.py to use novelty-gated thresholds by default (currently falls back to broken A thresholds) |
|
|
| --- |
|
|
| ## Phase 4 β Mammal Dataset Expansion β οΈ URGENT |
|
|
| n=7 proteins β all gen_ratio stats are noise. |
| |
| - β¬ **4a.** Run `build_mammal_dataset.py` for β₯100 mammal proteins with GO-MF annotations |
| ```bash |
| .venv/bin/python3 scripts/build_mammal_dataset.py |
| # Check script args β output should go to artifacts/generalization/mammal_full_v2.parquet |
| ``` |
| - β¬ **4b.** Re-run gen eval for A, B, graph_hpo_best with new mammal set |
| - β¬ **4c.** Update CHECKLIST gen_ratio table with reliable numbers |
|
|
| --- |
|
|
| ## Phase 5 β Broader Taxon Coverage |
|
|
| - β¬ **5a.** Get FASTAs: fungi, plants (arabidopsis), fish (zebrafish), archaea, nematode |
| - β¬ **5b.** For each: `prep_taxon.py` β `eval_generalization.py` |
| ```bash |
| BASE="/Users/siddhantbhat/Desktop/Research Files" |
| .venv/bin/python3 scripts/prep_taxon.py \ |
| --fasta "Important Files/<taxon>.fasta" \ |
| --taxon_name <taxon> \ |
| --mlb "Important Files/mlb_public_v1.pkl" \ |
| --out artifacts/generalization/<taxon>_embeddings.parquet |
| ``` |
| - β¬ **5c.** Fill generalization table below |
|
|
| --- |
|
|
| ## Phase 6 β HF Upload & Webapp |
|
|
| - β
**6a.** server.py updated locally (generalization API + v3_fixed priority) β commit `bd99db9e` |
| - β
**6b.** Uploaded `graph_hpo_best.pth` β HF as `protfunc_v3_fixed.pth` + thresholds (test_fmax=0.9533 confirmed better) |
| ```bash |
| huggingface-cli upload Sbhat2026/protfunc-models \ |
| "artifacts/graph_hpo/graph_hpo_best.pth" protfunc_v3_fixed.pth |
| huggingface-cli upload Sbhat2026/protfunc-models \ |
| "artifacts/graph_hpo/graph_hpo_best_thresholds.json" protfunc_v3_fixed_thresholds.json |
| ``` |
| - β
**6c.** Pushed `static/interface.html` to HF Space (commit 2aa49963) β collapsible lower-confidence UI |
| ```bash |
| cd /Users/siddhantbhat/insecta_webapp |
| git add server.py static/interface.html |
| git commit -m "fix: use novelty-gated thresholds; add generalization panel" |
| git push |
| ``` |
| - β¬ **6d.** Add `/api/generalization` endpoint to serve `generalization_results.json` for all taxons (currently only mammals) |
|
|
| --- |
|
|
| ## Phase 7 β Generalization Improvement (if gen_ratio < 0.85) |
| |
| - β¬ **7a.** Mixed-taxon fine-tuning with more mammal data (Phase 4 first) |
| - β¬ **7b.** Domain adaptation: freeze ESM layers, fine-tune MLP head on target taxon |
| - β¬ **7c.** Re-eval after changes |
| |
| --- |
| |
| ## Generalization Table |
| |
| *gen_ratio = taxon micro_fmax / insect test micro_fmax. Target β₯ 0.85.* |
|
|
| | Taxon | Model | n | micro_fmax | cafa_fmax | gen_ratio | Status | |
| |-------|-------|---|------------|-----------|-----------|--------| |
| | insects | A | ~250k | 0.8947 | 0.6338 | 1.00 (ref) | β
| |
| | mammals | A | 7 β οΈ | 0.3889 | 0.3917 | 0.4347 | β οΈ n too small | |
| | mammals | B | 7 β οΈ | 0.3750 | 0.3088 | 0.4167 | β οΈ n too small | |
| | mammals | C | 7 β οΈ | 0.0200 | 0.0056 | 0.0225 β οΈ | AF kills transfer | |
| | insects | graph_hpo | ~250k | 0.9533 | 0.6536 | 1.00 (ref) | β
2e done | |
| | mammals | graph_hpo | 4672 | 0.2224 | 0.201 | **0.233** β οΈ | β
2f done | |
| | fungi | β | β | β | β | β | β¬ Phase 5 | |
| | plants | β | β | β | β | β | β¬ Phase 5 | |
| | fish | β | β | β | β | β | β¬ Phase 5 | |
| | archaea | β | β | β | β | β | β¬ Phase 5 | |
| | nematode | β | β | β | β | β | β¬ Phase 5 | |
| |
| --- |
| |
| ## Priority Order (do in this order) |
| |
| 1. π **Phase 4a** β mammal build running (500k max, `artifacts/pipeline_run.log`) |
| 2. π **Phase 2e** β test eval on graph_hpo_best.pth (queued after mammal build) |
| 3. π **Phase 2f** β mammal gen eval on graph_hpo_best.pth (queued) |
| 4. π **Phase 2d** β 40-trial HPO re-run (queued) |
| 5. π **Phase 6b-6c** β upload + push to HF (queued) |
| 6. β¬ **Phase 3c-3d** β fix threshold in server.py (skipped per user; revisit after push) |
| 7. β¬ **Phase 5** β broader taxon coverage |
| 8. β¬ **Phase 7** β generalization improvement if needed |
|
|
| New scripts: |
| - `scripts/eval_checkpoint.py` β test eval any .pth on insect test set |
| - `scripts/run_full_pipeline.sh` β chains all steps 1-5 above |
|
|
| --- |
|
|
| ## Directory |
|
|
| ``` |
| Research Files/ |
| βββ artifacts/ |
| β βββ checkpoints/ β model .pth files |
| β β βββ ablation_A_ESM_only.pth β
best ablation |
| β β βββ ablation_B_ESM_seq.pth β
|
| β β βββ ablation_C_ESM_seq_AF.pth β
(AF hurts transfer) |
| β β βββ improved_res.pth baseline |
| β βββ graph_hpo/ β joint insect+mammal HPO pipeline |
| β β βββ graph_hpo_best.pth β
val_fmax=0.9533 (BEST) |
| β β βββ graph_hpo_best_thresholds.json |
| β β βββ graph_hpo_best_log.json |
| β β βββ hpo_results.json β only 2 trials run |
| β β βββ methodology.json |
| β βββ logs/ β training JSON logs |
| β βββ thresholds/ β per-label threshold JSON files |
| β βββ splits/ β train/val/test index splits |
| β βββ generalization/ β taxon embeddings + eval results |
| β β βββ mammal_embeddings_v3.parquet (7 proteins only β οΈ) |
| β β βββ mammal_full_v1.parquet (used in HPO training) |
| β β βββ generalization_results.json |
| β βββ threshold_comparison_results.json β use C (novelty-gated) |
| β βββ hpo_test.json β old 2-trial HPO result |
| βββ scripts/ |
| β βββ train_v3_fixed.py |
| β βββ hpo.py β graph-aware joint HPO |
| β βββ graph_hpo_sequence.py β runs full pipeline |
| β βββ eval_generalization.py |
| β βββ prep_taxon.py |
| β βββ build_mammal_dataset.py |
| β βββ threshold_comparison.py |
| β βββ archive/ |
| βββ Important Files/ β mlb, parquets, fastas |
| βββ CHECKLIST.md |
| ``` |
|
|