MYTH-Lab
/

FLUID

Model card Files Files and versions

FLUID / README.md

nielsr's picture

nielsr HF Staff

Add model card and metadata

ef7f26d verified 19 days ago

|

2.38 kB

	---
	library_name: transformers
	pipeline_tag: text-generation
	base_model: openPangu/openPangu-Embedded-7B
	tags:
	- diffusion
	- parallel-generation
	---

	# FLUID-7B

	FLUID (Flexible Unidirectional Inference Diffusion) is a framework designed to efficiently adapt pre-trained Autoregressive (AR) backbones into parallel diffusion models. By enforcing Strictly Causal Alignment and introducing Elastic Horizons, FLUID achieves state-of-the-art performance with significantly less training data compared to standard diffusion models.

	- Paper: [From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons](https://huggingface.co/papers/2605.27387)
	- GitHub Repository: [Oli-lab-nun/FLUID](https://github.com/Oli-lab-nun/FLUID)

	## Key Features

	* Strictly Causal Alignment: Unlike bidirectional diffusion, FLUID uses a lower-triangular attention mask to maintain the inductive biases of AR priors. This enables seamless initialization from GPT-style checkpoints like openPangu-Embedded-7B.
	* Elastic Horizon Modeling: An entropy-driven mechanism that dynamically modulates denoising strides based on local information density. It "sprints" through predictable text and "downshifts" for complex reasoning.
	* Training Efficiency: Achieves superior results on reasoning benchmarks using only 2.7B tokens of adaptation data, outperforming models trained on trillions of tokens.

	## Performance

	FLUID-7B matches or exceeds top-tier AR and Diffusion baselines across standard benchmarks:

	\| Model \| Type \| Tokens \| MMLU \| GSM8K \| MATH500 \| HumanEval \|
	\| :--- \| :--- \| :--- \| :--- \| :--- \| :--- \| :--- \|
	\| LLaMA-3-8B \| AR \| 15T \| 68.4 \| 78.3 \| 27.4 \| 59.8 \|
	\| Qwen-2.5-7B \| AR \| 18T \| 76.6 \| 91.6 \| 84.8 \| 79.2 \|
	\| LLaDA-8B \| Diff \| 2.0T \| 65.5 \| 36.2 \| 34.2 \| 47.6 \|
	\| FLUID-7B (Ours) \| Diff \| 2.7B \| 67.8 \| 91.9 \| 61.8 \| 60.4 \|

	## Acknowledgements

	FLUID-7B is adapted from the openPangu-Embedded-7B base model. We gratefully acknowledge the developers of openPangu for releasing their model and related resources to the community.

	## Citation

	```bibtex
	@inproceedings{fluid2026,
	title={From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons},
	author={Anonymous},
	booktitle={Submission to ACL 2026},
	year={2026}
	}
	```