| --- |
| license: mit |
| library_name: pytorch |
| tags: |
| - boolean-networks |
| - neuro-symbolic |
| - program-induction |
| - gene-regulatory-networks |
| - systems-biology |
| - active-learning |
| --- |
| |
| # ABLE: Active Boolean Learning Engine |
|
|
| Model weights accompanying the paper **"ABLE: Choosing Perturbation |
| Experiments to Recover Gene Logic"** (AI for Science Workshop at ICML |
| 2026). |
|
|
| ABLE is a neuro-symbolic pipeline for recovering executable Boolean |
| regulatory rules from perturbation-state transition data, with |
| support-conditional uniqueness certificates and active experiment |
| planning. This repo hosts the paper's released checkpoints. The public |
| code lives in a companion package (`able-public`); see the |
| reproducibility README there for install and reproduction commands. |
|
|
| ## Contents |
|
|
| | File | Size (bytes) | SHA-256 | Purpose | |
| |---|---:|---|---| |
| | `checkpoint_n50_ncf_best.pt` | 24,097,458 | `57c968490a2f1535582cc009fc38f659b6fe4b56f89bf72c9bcfb285640a0c8d` | Main 50-variable NCF-pointer proposer. Used for BBM (Table 2, Figs. 2/3/4/6), Ablation A (Table 9 row), and all default evaluation commands in the public README. | |
| | `checkpoint_n15_ncf_best.pt` | 23,965,466 | `26cdef1bb4bfb39fbb4c278d2f40528c1328664a80c22c97ee99a901fe4a34f0` | 15-variable NCF-pointer proposer used for Table 1 (four curated biological networks). | |
| | `checkpoint_n50_unconstrained_best.pt` | 25,312,058 | `03510ef826edce9a53cfa87049abf77cd17ea564e87ef4f06167d19e5b952f83` | Ablation B: 50-variable NCF-free decoder variant (unconstrained truth-table head), used only for Appendix Table 9 / Ablation B. See provenance note below. | |
|
|
| All three are plain PyTorch state dicts saved via |
| `torch.save({"model_state_dict": ..., "optimizer_state_dict": ..., |
| "config": ..., "step": ..., "best_metric": ...}, path)`; load them with |
| `torch.load(path, map_location=..., weights_only=False)`. |
| |
| ## Training recipe (reference) |
| |
| - Synthetic streaming dataset of k-junta Boolean networks (see |
| `NCFStreamingDataset` in the paper codebase). |
| - Transformer backbone: `d_model=256`, `n_heads=8`, 4 encoder + 2 decoder |
| layers, pointer dim 64. |
| - `num_steps=300000`, AdamW with `lr=1e-4`, `weight_decay=1e-5`. |
| - `n=50` runs: `num_obs=200`, `noise_rate=0.05`, mixture noise schedule, |
| `batch_size=16`. |
| - `n=15` run: `num_obs=60`, `batch_size=64`. |
| - Seed 42; single-GPU training. |
|
|
| Exact configs are embedded in each `.pt` under the `"config"` key, and |
| are also committed alongside the public training scripts. |
|
|
| ## Provenance note for `checkpoint_n50_unconstrained_best.pt` |
| |
| The original post-paper checkpoint for the Ablation B (`unconstrained`) |
| variant was unrecoverable at release time. The file in this repo is a |
| **retrain** produced from the same committed training script and |
| configuration (seed 42, same `DEFAULT_CONFIG`). It reproduces the |
| paper's expected ablation regime on the synthetic held-out eval |
| (`transition_acc` bouncing in `[0.014, 0.022]`, `tt_bit_acc ~= 0.836`, |
| `regulator_set_f1 ~= 0.60`, `functional_agreement ~= 0.92`) but will |
| **not be byte-identical** to the artifact that originally produced the |
| paper's Appendix Table 9 / Ablation B numbers, because synthetic data |
| streaming is sensitive to dataloader-order PRNG draws. Downstream BBM |
| Lift-Cert numbers are expected to be statistically equivalent but |
| may differ within run-to-run noise. If bit-exact reproduction of the |
| paper table is required, rerun the Lift-Cert pipeline against this |
| checkpoint and report the refreshed numbers. |
|
|
| ## Intended use |
|
|
| - Reproduction of the ICML-2026 AI4Science paper numbers. The companion |
| CLI `able-download-checkpoints` consumes this repo. |
| - Research extensions on k-junta Boolean-network recovery from |
| perturbation transitions (neuro-symbolic, active-learning, and |
| certificate-style work). |
|
|
| ## Limitations |
|
|
| - Trained on **synthetic** Boolean networks matched to the paper's |
| structural priors (max-indegree 6, mean-indegree ~2.5, NCF-majority |
| distributional prior). Out-of-distribution biological networks may |
| require retraining or domain adaptation. |
| - Ablation-B checkpoint (`*_unconstrained_*`) is only meaningful as a |
| control: it removes the NCF prior from the decoder head. It is **not** |
| the recommended proposer for downstream work. |
| - The decoder consumes quantised occupancy statistics, not raw state |
| trajectories; inference pipelines must feed data through the paired |
| preprocessing code in `able-public`. |
|
|
| ## Download |
|
|
| The companion code package is available at https://github.com/phuayj/able. |
| Install it and run the bundled checkpoint downloader: |
|
|
| ```bash |
| git clone https://github.com/phuayj/able.git |
| cd able |
| pip install -e . |
| able-download-checkpoints --output-dir checkpoints |
| ``` |
|
|
| This places all three checkpoint files under `checkpoints/`. No |
| authentication is required for downloads. |
|
|
| ## Citation |
|
|
| See `CITATION.cff` in the paper codebase. |
|
|
| ## License |
|
|
| MIT (weights released alongside the paper code). |
|
|