able / README.md

docs: drop private-repo / HF_TOKEN wording, point to github.com/phuayj/able

a8d8621 verified 7 days ago

4.89 kB

	---
	license: mit
	library_name: pytorch
	tags:
	- boolean-networks
	- neuro-symbolic
	- program-induction
	- gene-regulatory-networks
	- systems-biology
	- active-learning
	---

	# ABLE: Active Boolean Learning Engine

	Model weights accompanying the paper **"ABLE: Choosing Perturbation
	Experiments to Recover Gene Logic"** (AI for Science Workshop at ICML
	2026).

	ABLE is a neuro-symbolic pipeline for recovering executable Boolean
	regulatory rules from perturbation-state transition data, with
	support-conditional uniqueness certificates and active experiment
	planning. This repo hosts the paper's released checkpoints. The public
	code lives in a companion package (`able-public`); see the
	reproducibility README there for install and reproduction commands.

	## Contents

	\| File \| Size (bytes) \| SHA-256 \| Purpose \|
	\|---\|---:\|---\|---\|
	\| `checkpoint_n50_ncf_best.pt` \| 24,097,458 \| `57c968490a2f1535582cc009fc38f659b6fe4b56f89bf72c9bcfb285640a0c8d` \| Main 50-variable NCF-pointer proposer. Used for BBM (Table 2, Figs. 2/3/4/6), Ablation A (Table 9 row), and all default evaluation commands in the public README. \|
	\| `checkpoint_n15_ncf_best.pt` \| 23,965,466 \| `26cdef1bb4bfb39fbb4c278d2f40528c1328664a80c22c97ee99a901fe4a34f0` \| 15-variable NCF-pointer proposer used for Table 1 (four curated biological networks). \|
	\| `checkpoint_n50_unconstrained_best.pt` \| 25,312,058 \| `03510ef826edce9a53cfa87049abf77cd17ea564e87ef4f06167d19e5b952f83` \| Ablation B: 50-variable NCF-free decoder variant (unconstrained truth-table head), used only for Appendix Table 9 / Ablation B. See provenance note below. \|

	All three are plain PyTorch state dicts saved via
	`torch.save({"model_state_dict": ..., "optimizer_state_dict": ...,
	"config": ..., "step": ..., "best_metric": ...}, path)`; load them with
	`torch.load(path, map_location=..., weights_only=False)`.

	## Training recipe (reference)

	- Synthetic streaming dataset of k-junta Boolean networks (see
	`NCFStreamingDataset` in the paper codebase).
	- Transformer backbone: `d_model=256`, `n_heads=8`, 4 encoder + 2 decoder
	layers, pointer dim 64.
	- `num_steps=300000`, AdamW with `lr=1e-4`, `weight_decay=1e-5`.
	- `n=50` runs: `num_obs=200`, `noise_rate=0.05`, mixture noise schedule,
	`batch_size=16`.
	- `n=15` run: `num_obs=60`, `batch_size=64`.
	- Seed 42; single-GPU training.

	Exact configs are embedded in each `.pt` under the `"config"` key, and
	are also committed alongside the public training scripts.

	## Provenance note for `checkpoint_n50_unconstrained_best.pt`

	The original post-paper checkpoint for the Ablation B (`unconstrained`)
	variant was unrecoverable at release time. The file in this repo is a
	retrain produced from the same committed training script and
	configuration (seed 42, same `DEFAULT_CONFIG`). It reproduces the
	paper's expected ablation regime on the synthetic held-out eval
	(`transition_acc` bouncing in `[0.014, 0.022]`, `tt_bit_acc ~= 0.836`,
	`regulator_set_f1 ~= 0.60`, `functional_agreement ~= 0.92`) but will
	not be byte-identical to the artifact that originally produced the
	paper's Appendix Table 9 / Ablation B numbers, because synthetic data
	streaming is sensitive to dataloader-order PRNG draws. Downstream BBM
	Lift-Cert numbers are expected to be statistically equivalent but
	may differ within run-to-run noise. If bit-exact reproduction of the
	paper table is required, rerun the Lift-Cert pipeline against this
	checkpoint and report the refreshed numbers.

	## Intended use

	- Reproduction of the ICML-2026 AI4Science paper numbers. The companion
	CLI `able-download-checkpoints` consumes this repo.
	- Research extensions on k-junta Boolean-network recovery from
	perturbation transitions (neuro-symbolic, active-learning, and
	certificate-style work).

	## Limitations

	- Trained on synthetic Boolean networks matched to the paper's
	structural priors (max-indegree 6, mean-indegree ~2.5, NCF-majority
	distributional prior). Out-of-distribution biological networks may
	require retraining or domain adaptation.
	- Ablation-B checkpoint (`_unconstrained_`) is only meaningful as a
	control: it removes the NCF prior from the decoder head. It is not
	the recommended proposer for downstream work.
	- The decoder consumes quantised occupancy statistics, not raw state
	trajectories; inference pipelines must feed data through the paired
	preprocessing code in `able-public`.

	## Download

	The companion code package is available at https://github.com/phuayj/able.
	Install it and run the bundled checkpoint downloader:

	```bash
	git clone https://github.com/phuayj/able.git
	cd able
	pip install -e .
	able-download-checkpoints --output-dir checkpoints
	```

	This places all three checkpoint files under `checkpoints/`. No
	authentication is required for downloads.

	## Citation

	See `CITATION.cff` in the paper codebase.

	## License

	MIT (weights released alongside the paper code).