Update README.md
Browse files
README.md
CHANGED
|
@@ -9,20 +9,25 @@ pipeline_tag: any-to-any
|
|
| 9 |
---
|
| 10 |
|
| 11 |
|
| 12 |
-
#
|
| 13 |
[Paper](https://arxiv.org/pdf/2501.11587) | [Project Page](https://NUS-HPC-AI-Lab.github.io/Recurrent-Parameter-Generation/) | [Github](https://github.com/NUS-HPC-AI-Lab/Recurrent-Parameter-Generation) | [Twitter](https://x.com/VictorKaiWang1/status/1881380005118419435)
|
| 14 |
|
| 15 |
|
| 16 |
## Abstract
|
| 17 |
-
Parameter generation has long struggled to scale
|
| 18 |
-
In this
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
|
| 28 |
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
|
| 12 |
+
# Scaling Up Parameter Generation: A Recurrent Diffusion Approach
|
| 13 |
[Paper](https://arxiv.org/pdf/2501.11587) | [Project Page](https://NUS-HPC-AI-Lab.github.io/Recurrent-Parameter-Generation/) | [Github](https://github.com/NUS-HPC-AI-Lab/Recurrent-Parameter-Generation) | [Twitter](https://x.com/VictorKaiWang1/status/1881380005118419435)
|
| 14 |
|
| 15 |
|
| 16 |
## Abstract
|
| 17 |
+
Parameter generation has long struggled to match the scale of today’s large vision and language
|
| 18 |
+
models, curbing its broader utility. In this paper, we introduce **R**ecurrent Diffusion for Large-Scale
|
| 19 |
+
**P**arameter **G**eneration (**RPG**), a novel framework that generates full neural network parameters—up
|
| 20 |
+
to **hundreds of millions**—on a **single GPU**. Our approach first partitions a network’s parameters
|
| 21 |
+
into non-overlapping ‘tokens’, each corresponding to a distinct portion of the model. A recurrent
|
| 22 |
+
mechanism then learns the inter-token relationships, producing ‘prototypes’ which serve as conditions
|
| 23 |
+
for a diffusion process that ultimately synthesizes the full parameters. Across a spectrum of
|
| 24 |
+
architectures and tasks—including ResNets, ConvNeXts and ViTs on ImageNet-1K and COCO,
|
| 25 |
+
and even LoRA-based LLMs—RPG achieves performance on par with fully trained networks while
|
| 26 |
+
avoiding excessive memory overhead. Notably, it generalizes beyond its training set to generate
|
| 27 |
+
valid parameters for previously unseen tasks, highlighting its flexibility in dynamic and open-ended
|
| 28 |
+
scenarios. By overcoming the longstanding memory and scalability barriers,
|
| 29 |
+
RPG serves as a critical advance in ‘AI generating AI’, potentially
|
| 30 |
+
enabling efficient weight generation at scales previously deemed infeasible.
|
| 31 |
|
| 32 |
|
| 33 |
|