MTDoven commited on
Commit
71e7732
·
verified ·
1 Parent(s): f67c246

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -10
README.md CHANGED
@@ -9,20 +9,25 @@ pipeline_tag: any-to-any
9
  ---
10
 
11
 
12
- # Recurrent Parameter Generation
13
  [Paper](https://arxiv.org/pdf/2501.11587) | [Project Page](https://NUS-HPC-AI-Lab.github.io/Recurrent-Parameter-Generation/) | [Github](https://github.com/NUS-HPC-AI-Lab/Recurrent-Parameter-Generation) | [Twitter](https://x.com/VictorKaiWang1/status/1881380005118419435)
14
 
15
 
16
  ## Abstract
17
- Parameter generation has long struggled to scale, significantly limiting its applications.
18
- In this study, we introduce **R**ecurrent diffusion for large-scale **P**arameter **G**eneration, or **RPG**,
19
- which models large-scale parameter generation through a recurrent diffusion process.
20
- We divide the trained parameters into non-overlapping parts and propose a recurrent model to learn their relationships.
21
- The outputs of this recurrent model, serving as conditions, are then input into a diffusion model to generate neural network parameters.
22
- Utilizing only a single GPU, our method can generate parameters for popular vision and language models, such as ConvNeXt-L and LoRA parameters for LLaMA-7B.
23
- Across various architectures and tasks, the generated parameters consistently achieve comparable performance to those of trained networks.
24
- Additionally, our approach demonstrates potential in generating models capable of handling unseen tasks,
25
- indicating that recurrent diffusion greatly enhances the practicality of parameter generation.
 
 
 
 
 
26
 
27
 
28
 
 
9
  ---
10
 
11
 
12
+ # Scaling Up Parameter Generation: A Recurrent Diffusion Approach
13
  [Paper](https://arxiv.org/pdf/2501.11587) | [Project Page](https://NUS-HPC-AI-Lab.github.io/Recurrent-Parameter-Generation/) | [Github](https://github.com/NUS-HPC-AI-Lab/Recurrent-Parameter-Generation) | [Twitter](https://x.com/VictorKaiWang1/status/1881380005118419435)
14
 
15
 
16
  ## Abstract
17
+ Parameter generation has long struggled to match the scale of today’s large vision and language
18
+ models, curbing its broader utility. In this paper, we introduce **R**ecurrent Diffusion for Large-Scale
19
+ **P**arameter **G**eneration (**RPG**), a novel framework that generates full neural network parameters—up
20
+ to **hundreds of millions**—on a **single GPU**. Our approach first partitions a network’s parameters
21
+ into non-overlapping ‘tokens’, each corresponding to a distinct portion of the model. A recurrent
22
+ mechanism then learns the inter-token relationships, producing ‘prototypes’ which serve as conditions
23
+ for a diffusion process that ultimately synthesizes the full parameters. Across a spectrum of
24
+ architectures and tasks—including ResNets, ConvNeXts and ViTs on ImageNet-1K and COCO,
25
+ and even LoRA-based LLMs—RPG achieves performance on par with fully trained networks while
26
+ avoiding excessive memory overhead. Notably, it generalizes beyond its training set to generate
27
+ valid parameters for previously unseen tasks, highlighting its flexibility in dynamic and open-ended
28
+ scenarios. By overcoming the longstanding memory and scalability barriers,
29
+ RPG serves as a critical advance in ‘AI generating AI’, potentially
30
+ enabling efficient weight generation at scales previously deemed infeasible.
31
 
32
 
33