ProtFunc Checklist
Updated: 2026-04-16 | Goal: best GO-MF on insects + max cross-taxon transfer
Pipeline running: artifacts/pipeline_run.log | PID: bash 1963, python 1966
CRITICAL FINDINGS
| Finding | Impact |
|---|---|
graph_hpo_best.pth val_fmax 0.9540, test_fmax 0.9533, CAFA 0.6536 β joint insect+mammal training |
Massive jump over ablation A (0.8947/0.6338). No overfit. Mammals eval pending. |
| HPO only ran 2 trials (target was 40) | Best params likely suboptimal. Re-run HPO with full budget. |
| Threshold A (current v3): ~1448 preds/protein, precision=0.002 | Broken for inference. Use threshold C (novelty-gated) instead. |
| All gen_ratios < 0.50, mammals n=7 only | Stats unreliable. Need β₯100 mammal proteins. |
| AF features (Model C) kill transfer: gen_ratio=0.02 | Never use esm_all features for cross-taxon models. |
| graph_hpo mammal gen_ratio=0.233 (n=4672) | Joint insect+mammal HPO didn't significantly improve transfer vs ablation A (0.4347 at n=7, unreliable). Phase 7 needed. |
protfunc_v3_fixed.pth referenced in server.py but NOT in HF Space |
server.py loads it as priority β will silently fall back if missing. |
Phase 1 β Ablation β DONE
| Model | Val Fmax | Test Fmax | Test CAFA | Mammal gen_ratio |
|---|---|---|---|---|
| A: ESM only (320d) | 0.8900 | 0.8947 | 0.6338 | 0.4347 β best |
| B: ESM+seq (331d) | β | 0.8999 | 0.6360 | 0.4167 |
| C: ESM+seq+AF (360d) | 0.8900 | 0.8902 | 0.6326 | 0.0225 β οΈ AF hurts |
Winner: Model A (ESM only) β best insect+mammal balance.
Phase 2 β HPO (graph_hpo joint pipeline)
- β 2a. HPO script ran β but only 2/40 trials completed
- β
2b. Best params saved:
graph_hpo/hpo_results.json(hidden=2048, n_blocks=8, feat_level=esm_seq, score=0.7756) - β
2c. Full train on best params done β
graph_hpo_best.pth, val_fmax=0.9533 - β¬ 2d. Re-run HPO with full 40 trials β current best may not be global optimum
cd "/Users/siddhantbhat/Desktop/Research Files" .venv/bin/python3 scripts/hpo.py \ --mammal artifacts/generalization/mammal_full_v1.parquet \ --n_trials 40 --epochs 20 --patience 6 --alpha 0.6 \ --startup_trials 5 --warmup_steps 5 \ --multivariate_tpe --group_tpe \ --out artifacts/graph_hpo/hpo_results.json - β
2e. Test eval on
graph_hpo_best.pthβ test_micro_fmax=0.9533, CAFA=0.6536, P=0.9553, R=0.9514, t*=0.94 - β
2f. Mammal gen eval on
graph_hpo_best.pthβ n=4672, micro_fmax=0.2224, CAFA=0.201, gen_ratio=0.233 β οΈ poor transfer.venv/bin/python3 scripts/eval_generalization.py \ --checkpoint artifacts/graph_hpo/graph_hpo_best.pth \ --thresholds artifacts/graph_hpo/graph_hpo_best_thresholds.json \ --mlb "Important Files/mlb_public_v1.pkl" \ --taxon_parquet artifacts/generalization/mammal_embeddings_v3.parquet \ --taxon_name mammals_graph_hpo --obo go-basic.obo \ --out artifacts/graph_hpo/generalization_results.json
Phase 3 β Threshold Fix β οΈ BROKEN
Current v3 thresholds output 1448 preds/protein at 0.2% precision β unusable.
- β
3a. Comparison run β
artifacts/threshold_comparison_results.json - β 3b. Winner: C (novelty-gated) β F1=0.0733, 6.69 preds/protein, novelty subset F1=0.2757
- β
3c. Threshold comparison on
graph_hpo_best.pthβ A (per-label t*): P=0.143/R=0.984/F1=0.250/20.6preds β use this | B: P=0.878/F1=0.219/0.43preds (83% zero) | CβB.venv/bin/python3 scripts/threshold_comparison.py # Edit script to point to graph_hpo_best.pth first - β¬ 3d. Update server.py to use novelty-gated thresholds by default (currently falls back to broken A thresholds)
Phase 4 β Mammal Dataset Expansion β οΈ URGENT
n=7 proteins β all gen_ratio stats are noise.
- β¬ 4a. Run
build_mammal_dataset.pyfor β₯100 mammal proteins with GO-MF annotations.venv/bin/python3 scripts/build_mammal_dataset.py # Check script args β output should go to artifacts/generalization/mammal_full_v2.parquet - β¬ 4b. Re-run gen eval for A, B, graph_hpo_best with new mammal set
- β¬ 4c. Update CHECKLIST gen_ratio table with reliable numbers
Phase 5 β Broader Taxon Coverage
- β¬ 5a. Get FASTAs: fungi, plants (arabidopsis), fish (zebrafish), archaea, nematode
- β¬ 5b. For each:
prep_taxon.pyβeval_generalization.pyBASE="/Users/siddhantbhat/Desktop/Research Files" .venv/bin/python3 scripts/prep_taxon.py \ --fasta "Important Files/<taxon>.fasta" \ --taxon_name <taxon> \ --mlb "Important Files/mlb_public_v1.pkl" \ --out artifacts/generalization/<taxon>_embeddings.parquet - β¬ 5c. Fill generalization table below
Phase 6 β HF Upload & Webapp
- β
6a. server.py updated locally (generalization API + v3_fixed priority) β commit
bd99db9e - β
6b. Uploaded
graph_hpo_best.pthβ HF asprotfunc_v3_fixed.pth+ thresholds (test_fmax=0.9533 confirmed better)huggingface-cli upload Sbhat2026/protfunc-models \ "artifacts/graph_hpo/graph_hpo_best.pth" protfunc_v3_fixed.pth huggingface-cli upload Sbhat2026/protfunc-models \ "artifacts/graph_hpo/graph_hpo_best_thresholds.json" protfunc_v3_fixed_thresholds.json - β
6c. Pushed
static/interface.htmlto HF Space (commit 2aa49963) β collapsible lower-confidence UIcd /Users/siddhantbhat/insecta_webapp git add server.py static/interface.html git commit -m "fix: use novelty-gated thresholds; add generalization panel" git push - β¬ 6d. Add
/api/generalizationendpoint to servegeneralization_results.jsonfor all taxons (currently only mammals)
Phase 7 β Generalization Improvement (if gen_ratio < 0.85)
- β¬ 7a. Mixed-taxon fine-tuning with more mammal data (Phase 4 first)
- β¬ 7b. Domain adaptation: freeze ESM layers, fine-tune MLP head on target taxon
- β¬ 7c. Re-eval after changes
Generalization Table
gen_ratio = taxon micro_fmax / insect test micro_fmax. Target β₯ 0.85.
| Taxon | Model | n | micro_fmax | cafa_fmax | gen_ratio | Status |
|---|---|---|---|---|---|---|
| insects | A | ~250k | 0.8947 | 0.6338 | 1.00 (ref) | β |
| mammals | A | 7 β οΈ | 0.3889 | 0.3917 | 0.4347 | β οΈ n too small |
| mammals | B | 7 β οΈ | 0.3750 | 0.3088 | 0.4167 | β οΈ n too small |
| mammals | C | 7 β οΈ | 0.0200 | 0.0056 | 0.0225 β οΈ | AF kills transfer |
| insects | graph_hpo | ~250k | 0.9533 | 0.6536 | 1.00 (ref) | β 2e done |
| mammals | graph_hpo | 4672 | 0.2224 | 0.201 | 0.233 β οΈ | β 2f done |
| fungi | β | β | β | β | β | β¬ Phase 5 |
| plants | β | β | β | β | β | β¬ Phase 5 |
| fish | β | β | β | β | β | β¬ Phase 5 |
| archaea | β | β | β | β | β | β¬ Phase 5 |
| nematode | β | β | β | β | β | β¬ Phase 5 |
Priority Order (do in this order)
- π Phase 4a β mammal build running (500k max,
artifacts/pipeline_run.log) - π Phase 2e β test eval on graph_hpo_best.pth (queued after mammal build)
- π Phase 2f β mammal gen eval on graph_hpo_best.pth (queued)
- π Phase 2d β 40-trial HPO re-run (queued)
- π Phase 6b-6c β upload + push to HF (queued)
- β¬ Phase 3c-3d β fix threshold in server.py (skipped per user; revisit after push)
- β¬ Phase 5 β broader taxon coverage
- β¬ Phase 7 β generalization improvement if needed
New scripts:
scripts/eval_checkpoint.pyβ test eval any .pth on insect test setscripts/run_full_pipeline.shβ chains all steps 1-5 above
Directory
Research Files/
βββ artifacts/
β βββ checkpoints/ β model .pth files
β β βββ ablation_A_ESM_only.pth β
best ablation
β β βββ ablation_B_ESM_seq.pth β
β β βββ ablation_C_ESM_seq_AF.pth β
(AF hurts transfer)
β β βββ improved_res.pth baseline
β βββ graph_hpo/ β joint insect+mammal HPO pipeline
β β βββ graph_hpo_best.pth β
val_fmax=0.9533 (BEST)
β β βββ graph_hpo_best_thresholds.json
β β βββ graph_hpo_best_log.json
β β βββ hpo_results.json β only 2 trials run
β β βββ methodology.json
β βββ logs/ β training JSON logs
β βββ thresholds/ β per-label threshold JSON files
β βββ splits/ β train/val/test index splits
β βββ generalization/ β taxon embeddings + eval results
β β βββ mammal_embeddings_v3.parquet (7 proteins only β οΈ)
β β βββ mammal_full_v1.parquet (used in HPO training)
β β βββ generalization_results.json
β βββ threshold_comparison_results.json β use C (novelty-gated)
β βββ hpo_test.json β old 2-trial HPO result
βββ scripts/
β βββ train_v3_fixed.py
β βββ hpo.py β graph-aware joint HPO
β βββ graph_hpo_sequence.py β runs full pipeline
β βββ eval_generalization.py
β βββ prep_taxon.py
β βββ build_mammal_dataset.py
β βββ threshold_comparison.py
β βββ archive/
βββ Important Files/ β mlb, parquets, fastas
βββ CHECKLIST.md