michaelpopo commited on
Commit
1ee07b0
·
verified ·
1 Parent(s): 2647158

Initial clean PartRAG release

Browse files
README.md ADDED
@@ -0,0 +1,173 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: diffusers
4
+ pipeline_tag: image-to-3d
5
+ base_model: wgsxm/PartCrafter
6
+ tags:
7
+ - partrag
8
+ - partcrafter
9
+ - diffusers
10
+ - image-to-3d
11
+ - 3d-generation
12
+ - part-level-3d-generation
13
+ - retrieval-augmented-generation
14
+ - part-retrieval
15
+ - rectified-flow
16
+ - arxiv:2602.17033
17
+ ---
18
+
19
+ # PartRAG: Retrieval-Augmented Part-Level 3D Generation and Editing
20
+
21
+ This repository hosts trained PartRAG weights for the paper:
22
+
23
+ > **PartRAG: Retrieval-Augmented Part-Level 3D Generation and Editing**
24
+ > Peize Li, Zeyu Zhang, Hao Tang
25
+ > arXiv:2602.17033
26
+
27
+ PartRAG is a retrieval-augmented framework for single-image part-level 3D generation and editing. It builds on the open-source [PartCrafter](https://github.com/wgsxm/PartCrafter) implementation and extends it with the PartRAG retrieval and editing pipeline from the official code repository.
28
+
29
+ ## Links
30
+
31
+ - Paper: https://arxiv.org/abs/2602.17033
32
+ - Project page: https://aigeeksgroup.github.io/PartRAG/
33
+ - Code: https://github.com/AIGeeksGroup/PartRAG
34
+ - Base project: https://github.com/wgsxm/PartCrafter
35
+
36
+ ## Repository Contents
37
+
38
+ This Hugging Face repository contains model weights and Diffusers metadata. The runnable code, training scripts, inference scripts, retrieval database builder, editing pipeline, and dataset preprocessing tools are maintained in the official GitHub repository:
39
+
40
+ ```text
41
+ https://github.com/AIGeeksGroup/PartRAG
42
+ ```
43
+
44
+ The metadata in this repository is aligned with the PartRAG codebase:
45
+
46
+ - pipeline: `src.pipelines.pipeline_partrag.PartragPipeline`
47
+ - transformer: `src.models.transformers.partrag_transformer.PartragDiTModel`
48
+ - scheduler: `src.schedulers.scheduling_rectified_flow.RectifiedFlowScheduler`
49
+ - VAE: `src.models.autoencoders.autoencoder_kl_triposg.TripoSGVAEModel`
50
+
51
+ ## Model Description
52
+
53
+ PartRAG generates structured 3D objects from a single RGB image by producing multiple object parts. The framework augments part-level generation with retrieval and contrastive learning:
54
+
55
+ - part-level image-to-3D generation using a diffusion transformer;
56
+ - retrieval-augmented generation over part-level exemplars;
57
+ - contrastive objectives for stronger part and object representations;
58
+ - masked part-level editing that preserves non-target parts and part transforms.
59
+
60
+ ## Installation
61
+
62
+ Use the official code repository:
63
+
64
+ ```bash
65
+ git clone https://github.com/AIGeeksGroup/PartRAG.git
66
+ cd PartRAG
67
+ ```
68
+
69
+ Install dependencies following the repository setup:
70
+
71
+ ```bash
72
+ bash settings/setup.sh
73
+ ```
74
+
75
+ If you prefer to install dependencies manually:
76
+
77
+ ```bash
78
+ pip install torch-cluster -f https://data.pyg.org/whl/torch-2.5.1+cu124.html
79
+ pip install -r settings/requirements.txt
80
+ sudo apt-get install libegl1 libegl1-mesa libgl1-mesa-dev -y
81
+ ```
82
+
83
+ ## Download Weights
84
+
85
+ Download this checkpoint into the path expected by the PartRAG scripts:
86
+
87
+ ```bash
88
+ huggingface-cli download michaelpopo/PartRAG \
89
+ --local-dir pretrained_weights/PartRAG
90
+ ```
91
+
92
+ ## Inference
93
+
94
+ Run inference with the PartRAG checkpoint script from the GitHub repository:
95
+
96
+ ```bash
97
+ python scripts/inference_partrag_with_checkpoint.py \
98
+ --image_path <input_image> \
99
+ --num_parts 4 \
100
+ --pretrained_model_path pretrained_weights/PartRAG \
101
+ --checkpoint_path pretrained_weights/PartRAG \
102
+ --output_dir results \
103
+ --render
104
+ ```
105
+
106
+ The script exports individual part meshes as `part_XX.glb` and a merged object mesh as `object.glb`.
107
+
108
+ ## Retrieval Database
109
+
110
+ The retrieval database is not included in this weights repository. Build it with the official PartRAG script:
111
+
112
+ ```bash
113
+ python scripts/build_partrag_retrieval_database.py \
114
+ --config configs/partrag_stage1.yaml \
115
+ --output_dir retrieval_database_high_quality \
116
+ --subset_size 1236 \
117
+ --build_faiss
118
+ ```
119
+
120
+ Then enable retrieval during checkpoint inference:
121
+
122
+ ```bash
123
+ python scripts/inference_partrag_with_checkpoint.py \
124
+ --image_path <input_image> \
125
+ --num_parts 4 \
126
+ --pretrained_model_path pretrained_weights/PartRAG \
127
+ --checkpoint_path pretrained_weights/PartRAG \
128
+ --use_retrieval \
129
+ --database_path retrieval_database_high_quality \
130
+ --num_retrieved_images 3 \
131
+ --output_dir results \
132
+ --render
133
+ ```
134
+
135
+ ## Editing
136
+
137
+ Part-level masked editing is provided by the official code repository:
138
+
139
+ ```bash
140
+ python scripts/edit_partrag.py \
141
+ --checkpoint_path pretrained_weights/PartRAG \
142
+ --pretrained_path pretrained_weights/PartRAG \
143
+ --input_image <input_image> \
144
+ --num_parts 4 \
145
+ --target_parts 1,3 \
146
+ --edit_text "replace legs" \
147
+ --retrieval_db retrieval_database_high_quality \
148
+ --render
149
+ ```
150
+
151
+ ## Training Configuration
152
+
153
+ The checkpoint metadata is provided in `params.yaml`. The paper-protocol training configs in the code repository are:
154
+
155
+ - `configs/partrag_stage1.yaml`
156
+ - `configs/partrag_stage2.yaml`
157
+
158
+ ## Citation
159
+
160
+ If you use this model, please cite PartRAG:
161
+
162
+ ```bibtex
163
+ @article{li2026partrag,
164
+ title={PartRAG: Retrieval-Augmented Part-Level 3D Generation and Editing},
165
+ author={Li, Peize and Zhang, Zeyu and Tang, Hao},
166
+ journal={arXiv preprint arXiv:2602.17033},
167
+ year={2026}
168
+ }
169
+ ```
170
+
171
+ ## Attribution
172
+
173
+ PartRAG builds on the open-source PartCrafter implementation. Upstream-derived components keep the same general module layout and are extended in the official PartRAG codebase with retrieval and editing-specific logic.
configs/partrag_stage1.yaml ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model:
2
+ pretrained_model_name_or_path: '/root/autodl-tmp/PartRAG/pretrained_weights/PartRAG'
3
+ vae:
4
+ num_tokens: 1024
5
+ transformer:
6
+ enable_local_cross_attn: true
7
+ global_attn_block_ids: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
8
+ global_attn_block_id_range: null
9
+
10
+ dataset:
11
+ config:
12
+ - '/root/autodl-tmp/dataset/Objaverse/processed/high_quality_object_part_configs_FIXED.json'
13
+ training_ratio: 0.9
14
+ min_num_parts: 2
15
+ max_num_parts: 8
16
+ max_iou_mean: 0.5
17
+ max_iou_max: 0.5
18
+ shuffle_parts: true
19
+ object_ratio: 0.5
20
+ rotating_ratio: 0.3
21
+ ratating_degree: 15
22
+
23
+ optimizer:
24
+ name: "adamw"
25
+ lr: 3e-5
26
+ betas: [0.9, 0.999]
27
+ weight_decay: 0.01
28
+ eps: 1.e-8
29
+
30
+ lr_scheduler:
31
+ name: "cosine_warmup"
32
+ num_warmup_steps: 300
33
+
34
+ retrieval:
35
+ database_path: /root/autodl-tmp/retrieval_database_high_quality
36
+ enabled: true
37
+ top_k: 3
38
+ use_image: true
39
+ use_mesh: true
40
+
41
+ train:
42
+ batch_size_per_gpu: 48
43
+ epochs: 100
44
+ grad_checkpoint: true
45
+ weighting_scheme: "logit_normal"
46
+ logit_mean: 0.0
47
+ logit_std: 1.0
48
+ mode_scale: 1.29
49
+ cfg_dropout_prob: 0.1
50
+ training_objective: "-v"
51
+ log_freq: 10
52
+ early_eval_freq: 500
53
+ early_eval: 1000
54
+ eval_freq: 2000
55
+ save_freq: 1000
56
+ eval_freq_epoch: 5
57
+ save_freq_epoch: 1
58
+ ema_kwargs:
59
+ decay: 0.9999
60
+ use_ema_warmup: true
61
+ inv_gamma: 1.
62
+ power: 0.75
63
+
64
+ use_part_dataset: true
65
+ enable_contrastive: false
66
+ contrastive_object_weight: 0.0
67
+ contrastive_part_weight: 0.0
68
+ contrastive_temperature: 0.07
69
+
70
+ freeze_pretrained_backbone: true
71
+ freeze_modules:
72
+ - "pos_embed"
73
+ - "time_embed"
74
+ - "part_embedding"
75
+ - "proj_in"
76
+ - "blocks.*.attn1"
77
+ - "blocks.*.ff"
78
+ - "blocks.*.norm1"
79
+ trainable_modules:
80
+ - "blocks.*.attn2*"
81
+ - "blocks.*.norm2"
82
+ - "proj_out"
83
+
84
+ use_differential_lr: true
85
+ frozen_modules_lr: 0.0
86
+ pretrained_modules_lr: 1e-6
87
+ new_modules_lr: 3e-5
88
+ projection_modules_lr: 1e-5
89
+
90
+ val:
91
+ batch_size_per_gpu: 1
92
+ nrow: 4
93
+ min_num_parts: 2
94
+ max_num_parts: 8
95
+ num_inference_steps: 50
96
+ max_num_expanded_coords: 1e8
97
+ use_flash_decoder: false
98
+ rendering:
99
+ radius: 4.0
100
+ num_views: 36
101
+ fps: 18
102
+ metric:
103
+ cd_num_samples: 204800
104
+ cd_metric: "l2"
105
+ f1_score_threshold: 0.1
106
+ default_cd: 1e6
107
+ default_f1: 0.0
configs/partrag_stage2.yaml ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model:
2
+ pretrained_model_name_or_path: '/root/autodl-tmp/PartRAG/pretrained_weights/PartRAG'
3
+ vae:
4
+ num_tokens: 1024
5
+ transformer:
6
+ enable_local_cross_attn: true
7
+ global_attn_block_ids: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
8
+ global_attn_block_id_range: null
9
+
10
+ dataset:
11
+ config:
12
+ - '/root/autodl-tmp/dataset/Objaverse/processed/high_quality_object_part_configs_FIXED.json'
13
+ training_ratio: 0.9
14
+ min_num_parts: 2
15
+ max_num_parts: 8
16
+ max_iou_mean: 0.5
17
+ max_iou_max: 0.5
18
+ shuffle_parts: true
19
+ object_ratio: 0.5
20
+ rotating_ratio: 0.3
21
+ ratating_degree: 15
22
+
23
+ optimizer:
24
+ name: "adamw"
25
+ lr: 3e-5
26
+ betas: [0.9, 0.999]
27
+ weight_decay: 0.01
28
+ eps: 1.e-8
29
+
30
+ lr_scheduler:
31
+ name: "cosine_warmup"
32
+ num_warmup_steps: 300
33
+
34
+ retrieval:
35
+ database_path: /root/autodl-tmp/retrieval_database_high_quality
36
+ enabled: true
37
+ top_k: 3
38
+ use_image: true
39
+ use_mesh: true
40
+
41
+ train:
42
+ batch_size_per_gpu: 48
43
+ epochs: 350
44
+ grad_checkpoint: true
45
+ weighting_scheme: "logit_normal"
46
+ logit_mean: 0.0
47
+ logit_std: 1.0
48
+ mode_scale: 1.29
49
+ cfg_dropout_prob: 0.1
50
+ training_objective: "-v"
51
+ log_freq: 10
52
+ early_eval_freq: 500
53
+ early_eval: 1000
54
+ eval_freq: 1000
55
+ save_freq: 1000
56
+ eval_freq_epoch: 5
57
+ save_freq_epoch: 1
58
+ ema_kwargs:
59
+ decay: 0.9999
60
+ use_ema_warmup: true
61
+ inv_gamma: 1.
62
+ power: 0.75
63
+
64
+ use_part_dataset: true
65
+ enable_contrastive: true
66
+ contrastive_object_weight: 0.03
67
+ contrastive_part_weight: 0.03
68
+ contrastive_temperature: 0.07
69
+
70
+ # Bidirectional momentum queue (paper setting)
71
+ use_momentum_queue: true
72
+ momentum_coefficient: 0.999
73
+ momentum_queue_size: 65536
74
+
75
+ freeze_pretrained_backbone: true
76
+ freeze_modules:
77
+ - "pos_embed"
78
+ - "time_embed"
79
+ - "part_embedding"
80
+ - "proj_in"
81
+ - "blocks.0.attn1*"
82
+ - "blocks.0.ff*"
83
+ - "blocks.0.norm1"
84
+ - "blocks.1.attn1*"
85
+ - "blocks.1.ff*"
86
+ - "blocks.1.norm1"
87
+ trainable_modules:
88
+ - "blocks.*.attn2*"
89
+ - "blocks.*.norm2"
90
+ - "blocks.[2-9].*"
91
+ - "blocks.1[0-9].*"
92
+ - "blocks.20.*"
93
+ - "proj_out"
94
+
95
+ use_differential_lr: true
96
+ frozen_modules_lr: 0.0
97
+ pretrained_modules_lr: 1e-6
98
+ new_modules_lr: 1e-5
99
+ projection_modules_lr: 1e-5
100
+
101
+ val:
102
+ batch_size_per_gpu: 1
103
+ nrow: 4
104
+ min_num_parts: 2
105
+ max_num_parts: 8
106
+ num_inference_steps: 50
107
+ max_num_expanded_coords: 1e8
108
+ use_flash_decoder: false
109
+ rendering:
110
+ radius: 4.0
111
+ num_views: 36
112
+ fps: 18
113
+ metric:
114
+ cd_num_samples: 204800
115
+ cd_metric: "l2"
116
+ f1_score_threshold: 0.1
117
+ default_cd: 1e6
118
+ default_f1: 0.0
feature_extractor_dinov2/preprocessor_config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "crop_size": {
3
+ "height": 224,
4
+ "width": 224
5
+ },
6
+ "do_center_crop": true,
7
+ "do_convert_rgb": true,
8
+ "do_normalize": true,
9
+ "do_rescale": true,
10
+ "do_resize": true,
11
+ "image_mean": [
12
+ 0.485,
13
+ 0.456,
14
+ 0.406
15
+ ],
16
+ "image_processor_type": "BitImageProcessor",
17
+ "image_std": [
18
+ 0.229,
19
+ 0.224,
20
+ 0.225
21
+ ],
22
+ "resample": 3,
23
+ "rescale_factor": 0.00392156862745098,
24
+ "size": {
25
+ "shortest_edge": 256
26
+ }
27
+ }
image_encoder_dinov2/config.json ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "apply_layernorm": true,
3
+ "architectures": [
4
+ "Dinov2Model"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.0,
7
+ "drop_path_rate": 0.0,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.0,
10
+ "hidden_size": 1024,
11
+ "image_size": 518,
12
+ "initializer_range": 0.02,
13
+ "layer_norm_eps": 1e-06,
14
+ "layerscale_value": 1.0,
15
+ "mlp_ratio": 4,
16
+ "model_type": "dinov2",
17
+ "num_attention_heads": 16,
18
+ "num_channels": 3,
19
+ "num_hidden_layers": 24,
20
+ "out_features": [
21
+ "stage24"
22
+ ],
23
+ "out_indices": [
24
+ 24
25
+ ],
26
+ "patch_size": 14,
27
+ "qkv_bias": true,
28
+ "reshape_hidden_states": true,
29
+ "stage_names": [
30
+ "stem",
31
+ "stage1",
32
+ "stage2",
33
+ "stage3",
34
+ "stage4",
35
+ "stage5",
36
+ "stage6",
37
+ "stage7",
38
+ "stage8",
39
+ "stage9",
40
+ "stage10",
41
+ "stage11",
42
+ "stage12",
43
+ "stage13",
44
+ "stage14",
45
+ "stage15",
46
+ "stage16",
47
+ "stage17",
48
+ "stage18",
49
+ "stage19",
50
+ "stage20",
51
+ "stage21",
52
+ "stage22",
53
+ "stage23",
54
+ "stage24"
55
+ ],
56
+ "torch_dtype": "float16",
57
+ "transformers_version": "4.53.0",
58
+ "use_mask_token": true,
59
+ "use_swiglu_ffn": false
60
+ }
image_encoder_dinov2/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa0b83921a3339259fb1ef684ce63c9d09b5a45f9998e0789a0ad4cba318b07b
3
+ size 608785376
model_index.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "PartragPipeline",
3
+ "_diffusers_version": "0.35.1",
4
+ "feature_extractor_dinov2": [
5
+ "transformers",
6
+ "BitImageProcessor"
7
+ ],
8
+ "image_encoder_dinov2": [
9
+ "transformers",
10
+ "Dinov2Model"
11
+ ],
12
+ "scheduler": [
13
+ "src.schedulers.scheduling_rectified_flow",
14
+ "RectifiedFlowScheduler"
15
+ ],
16
+ "transformer": [
17
+ "src.models.transformers.partrag_transformer",
18
+ "PartragDiTModel"
19
+ ],
20
+ "vae": [
21
+ "src.models.autoencoders.autoencoder_kl_triposg",
22
+ "TripoSGVAEModel"
23
+ ]
24
+ }
params.yaml ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model:
2
+ pretrained_model_name_or_path: '/root/autodl-tmp/PartRAG/pretrained_weights/PartRAG'
3
+ vae:
4
+ num_tokens: 1024
5
+ transformer:
6
+ enable_local_cross_attn: true
7
+ global_attn_block_ids: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
8
+ global_attn_block_id_range: null
9
+
10
+ dataset:
11
+ config:
12
+ - '/root/autodl-tmp/dataset/Objaverse/processed/high_quality_object_part_configs_FIXED.json'
13
+ training_ratio: 0.9
14
+ min_num_parts: 2
15
+ max_num_parts: 8
16
+ max_iou_mean: 0.5
17
+ max_iou_max: 0.5
18
+ shuffle_parts: true
19
+ object_ratio: 0.5
20
+ rotating_ratio: 0.3
21
+ ratating_degree: 15
22
+
23
+ optimizer:
24
+ name: "adamw"
25
+ lr: 3e-5
26
+ betas: [0.9, 0.999]
27
+ weight_decay: 0.01
28
+ eps: 1.e-8
29
+
30
+ lr_scheduler:
31
+ name: "cosine_warmup"
32
+ num_warmup_steps: 300
33
+
34
+ retrieval:
35
+ database_path: /root/autodl-tmp/retrieval_database_high_quality
36
+ enabled: true
37
+ top_k: 3
38
+ use_image: true
39
+ use_mesh: true
40
+
41
+ train:
42
+ batch_size_per_gpu: 48
43
+ epochs: 350
44
+ grad_checkpoint: true
45
+ weighting_scheme: "logit_normal"
46
+ logit_mean: 0.0
47
+ logit_std: 1.0
48
+ mode_scale: 1.29
49
+ cfg_dropout_prob: 0.1
50
+ training_objective: "-v"
51
+ log_freq: 10
52
+ early_eval_freq: 500
53
+ early_eval: 1000
54
+ eval_freq: 1000
55
+ save_freq: 1000
56
+ eval_freq_epoch: 5
57
+ save_freq_epoch: 1
58
+ ema_kwargs:
59
+ decay: 0.9999
60
+ use_ema_warmup: true
61
+ inv_gamma: 1.
62
+ power: 0.75
63
+
64
+ use_part_dataset: true
65
+ enable_contrastive: true
66
+ contrastive_object_weight: 0.03
67
+ contrastive_part_weight: 0.03
68
+ contrastive_temperature: 0.07
69
+
70
+ # Bidirectional momentum queue (paper setting)
71
+ use_momentum_queue: true
72
+ momentum_coefficient: 0.999
73
+ momentum_queue_size: 65536
74
+
75
+ freeze_pretrained_backbone: true
76
+ freeze_modules:
77
+ - "pos_embed"
78
+ - "time_embed"
79
+ - "part_embedding"
80
+ - "proj_in"
81
+ - "blocks.0.attn1*"
82
+ - "blocks.0.ff*"
83
+ - "blocks.0.norm1"
84
+ - "blocks.1.attn1*"
85
+ - "blocks.1.ff*"
86
+ - "blocks.1.norm1"
87
+ trainable_modules:
88
+ - "blocks.*.attn2*"
89
+ - "blocks.*.norm2"
90
+ - "blocks.[2-9].*"
91
+ - "blocks.1[0-9].*"
92
+ - "blocks.20.*"
93
+ - "proj_out"
94
+
95
+ use_differential_lr: true
96
+ frozen_modules_lr: 0.0
97
+ pretrained_modules_lr: 1e-6
98
+ new_modules_lr: 1e-5
99
+ projection_modules_lr: 1e-5
100
+
101
+ val:
102
+ batch_size_per_gpu: 1
103
+ nrow: 4
104
+ min_num_parts: 2
105
+ max_num_parts: 8
106
+ num_inference_steps: 50
107
+ max_num_expanded_coords: 1e8
108
+ use_flash_decoder: false
109
+ rendering:
110
+ radius: 4.0
111
+ num_views: 36
112
+ fps: 18
113
+ metric:
114
+ cd_num_samples: 204800
115
+ cd_metric: "l2"
116
+ f1_score_threshold: 0.1
117
+ default_cd: 1e6
118
+ default_f1: 0.0
scheduler/scheduler_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "RectifiedFlowScheduler",
3
+ "_diffusers_version": "0.34.0",
4
+ "num_train_timesteps": 1000,
5
+ "shift": 1,
6
+ "use_dynamic_shifting": false
7
+ }
transformer/config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "PartragDiTModel",
3
+ "_diffusers_version": "0.35.1",
4
+ "_name_or_path": "michaelpopo/PartRAG",
5
+ "cross_attention_dim": 1024,
6
+ "decay": 0.9999,
7
+ "enable_global_cross_attn": true,
8
+ "enable_local_cross_attn": true,
9
+ "enable_part_embedding": true,
10
+ "global_attn_block_id_range": null,
11
+ "global_attn_block_ids": [
12
+ 0,
13
+ 2,
14
+ 4,
15
+ 6,
16
+ 8,
17
+ 10,
18
+ 12,
19
+ 14,
20
+ 16,
21
+ 18,
22
+ 20
23
+ ],
24
+ "in_channels": 64,
25
+ "inv_gamma": 1.0,
26
+ "max_num_parts": 32,
27
+ "min_decay": 0.0,
28
+ "num_attention_heads": 16,
29
+ "num_layers": 21,
30
+ "optimization_step": 120,
31
+ "power": 0.75,
32
+ "update_after_step": 0,
33
+ "use_ema_warmup": true,
34
+ "width": 2048
35
+ }
transformer/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6ee7b36ec09fbe82ac1259bd36e8a469ac6a2ba275e98fc475bdce4c99cb730d
3
+ size 5758542376
vae/config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "TripoSGVAEModel",
3
+ "_diffusers_version": "0.34.0",
4
+ "_name_or_path": "pretrained_weights/TripoSG",
5
+ "embed_frequency": 8,
6
+ "embed_include_pi": false,
7
+ "embedding_type": "frequency",
8
+ "in_channels": 3,
9
+ "latent_channels": 64,
10
+ "num_attention_heads": 8,
11
+ "num_layers_decoder": 16,
12
+ "num_layers_encoder": 8,
13
+ "width_decoder": 1024,
14
+ "width_encoder": 512
15
+ }
vae/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b43b006e5692223877427cdb568c2c1477f52a2d226db4d5eb354b4886c167a4
3
+ size 485361378