Initial clean PartRAG release

Browse files

Files changed (13) hide show

README.md +173 -0
configs/partrag_stage1.yaml +107 -0
configs/partrag_stage2.yaml +118 -0
feature_extractor_dinov2/preprocessor_config.json +27 -0
image_encoder_dinov2/config.json +60 -0
image_encoder_dinov2/model.safetensors +3 -0
model_index.json +24 -0
params.yaml +118 -0
scheduler/scheduler_config.json +7 -0
transformer/config.json +35 -0
transformer/diffusion_pytorch_model.safetensors +3 -0
vae/config.json +15 -0
vae/diffusion_pytorch_model.safetensors +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,173 @@

+---
+license: mit
+library_name: diffusers
+pipeline_tag: image-to-3d
+base_model: wgsxm/PartCrafter
+tags:
+- partrag
+- partcrafter
+- diffusers
+- image-to-3d
+- 3d-generation
+- part-level-3d-generation
+- retrieval-augmented-generation
+- part-retrieval
+- rectified-flow
+- arxiv:2602.17033
+---
+# PartRAG: Retrieval-Augmented Part-Level 3D Generation and Editing
+This repository hosts trained PartRAG weights for the paper:
+> **PartRAG: Retrieval-Augmented Part-Level 3D Generation and Editing**
+> Peize Li, Zeyu Zhang, Hao Tang
+> arXiv:2602.17033
+PartRAG is a retrieval-augmented framework for single-image part-level 3D generation and editing. It builds on the open-source [PartCrafter](https://github.com/wgsxm/PartCrafter) implementation and extends it with the PartRAG retrieval and editing pipeline from the official code repository.
+## Links
+- Paper: https://arxiv.org/abs/2602.17033
+- Project page: https://aigeeksgroup.github.io/PartRAG/
+- Code: https://github.com/AIGeeksGroup/PartRAG
+- Base project: https://github.com/wgsxm/PartCrafter
+## Repository Contents
+This Hugging Face repository contains model weights and Diffusers metadata. The runnable code, training scripts, inference scripts, retrieval database builder, editing pipeline, and dataset preprocessing tools are maintained in the official GitHub repository:
+```text
+https://github.com/AIGeeksGroup/PartRAG
+```
+The metadata in this repository is aligned with the PartRAG codebase:
+- pipeline: `src.pipelines.pipeline_partrag.PartragPipeline`
+- transformer: `src.models.transformers.partrag_transformer.PartragDiTModel`
+- scheduler: `src.schedulers.scheduling_rectified_flow.RectifiedFlowScheduler`
+- VAE: `src.models.autoencoders.autoencoder_kl_triposg.TripoSGVAEModel`
+## Model Description
+PartRAG generates structured 3D objects from a single RGB image by producing multiple object parts. The framework augments part-level generation with retrieval and contrastive learning:
+- part-level image-to-3D generation using a diffusion transformer;
+- retrieval-augmented generation over part-level exemplars;
+- contrastive objectives for stronger part and object representations;
+- masked part-level editing that preserves non-target parts and part transforms.
+## Installation
+Use the official code repository:
+```bash
+git clone https://github.com/AIGeeksGroup/PartRAG.git
+cd PartRAG
+```
+Install dependencies following the repository setup:
+```bash
+bash settings/setup.sh
+```
+If you prefer to install dependencies manually:
+```bash
+pip install torch-cluster -f https://data.pyg.org/whl/torch-2.5.1+cu124.html
+pip install -r settings/requirements.txt
+sudo apt-get install libegl1 libegl1-mesa libgl1-mesa-dev -y
+```
+## Download Weights
+Download this checkpoint into the path expected by the PartRAG scripts:
+```bash
+huggingface-cli download michaelpopo/PartRAG \
+  --local-dir pretrained_weights/PartRAG
+```
+## Inference
+Run inference with the PartRAG checkpoint script from the GitHub repository:
+```bash
+python scripts/inference_partrag_with_checkpoint.py \
+  --image_path <input_image> \
+  --num_parts 4 \
+  --pretrained_model_path pretrained_weights/PartRAG \
+  --checkpoint_path pretrained_weights/PartRAG \
+  --output_dir results \
+  --render
+```
+The script exports individual part meshes as `part_XX.glb` and a merged object mesh as `object.glb`.
+## Retrieval Database
+The retrieval database is not included in this weights repository. Build it with the official PartRAG script:
+```bash
+python scripts/build_partrag_retrieval_database.py \
+  --config configs/partrag_stage1.yaml \
+  --output_dir retrieval_database_high_quality \
+  --subset_size 1236 \
+  --build_faiss
+```
+Then enable retrieval during checkpoint inference:
+```bash
+python scripts/inference_partrag_with_checkpoint.py \
+  --image_path <input_image> \
+  --num_parts 4 \
+  --pretrained_model_path pretrained_weights/PartRAG \
+  --checkpoint_path pretrained_weights/PartRAG \
+  --use_retrieval \
+  --database_path retrieval_database_high_quality \
+  --num_retrieved_images 3 \
+  --output_dir results \
+  --render
+```
+## Editing
+Part-level masked editing is provided by the official code repository:
+```bash
+python scripts/edit_partrag.py \
+  --checkpoint_path pretrained_weights/PartRAG \
+  --pretrained_path pretrained_weights/PartRAG \
+  --input_image <input_image> \
+  --num_parts 4 \
+  --target_parts 1,3 \
+  --edit_text "replace legs" \
+  --retrieval_db retrieval_database_high_quality \
+  --render
+```
+## Training Configuration
+The checkpoint metadata is provided in `params.yaml`. The paper-protocol training configs in the code repository are:
+- `configs/partrag_stage1.yaml`
+- `configs/partrag_stage2.yaml`
+## Citation
+If you use this model, please cite PartRAG:
+```bibtex
+@article{li2026partrag,
+  title={PartRAG: Retrieval-Augmented Part-Level 3D Generation and Editing},
+  author={Li, Peize and Zhang, Zeyu and Tang, Hao},
+  journal={arXiv preprint arXiv:2602.17033},
+  year={2026}
+}
+```
+## Attribution
+PartRAG builds on the open-source PartCrafter implementation. Upstream-derived components keep the same general module layout and are extended in the official PartRAG codebase with retrieval and editing-specific logic.

configs/partrag_stage1.yaml ADDED Viewed

	@@ -0,0 +1,107 @@

+model:
+  pretrained_model_name_or_path: '/root/autodl-tmp/PartRAG/pretrained_weights/PartRAG'
+  vae:
+    num_tokens: 1024
+  transformer:
+    enable_local_cross_attn: true
+    global_attn_block_ids: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
+    global_attn_block_id_range: null
+dataset:
+  config:
+    - '/root/autodl-tmp/dataset/Objaverse/processed/high_quality_object_part_configs_FIXED.json'
+  training_ratio: 0.9
+  min_num_parts: 2
+  max_num_parts: 8
+  max_iou_mean: 0.5
+  max_iou_max: 0.5
+  shuffle_parts: true
+  object_ratio: 0.5
+  rotating_ratio: 0.3
+  ratating_degree: 15
+optimizer:
+  name: "adamw"
+  lr: 3e-5
+  betas: [0.9, 0.999]
+  weight_decay: 0.01
+  eps: 1.e-8
+lr_scheduler:
+  name: "cosine_warmup"
+  num_warmup_steps: 300
+retrieval:
+  database_path: /root/autodl-tmp/retrieval_database_high_quality
+  enabled: true
+  top_k: 3
+  use_image: true
+  use_mesh: true
+train:
+  batch_size_per_gpu: 48
+  epochs: 100
+  grad_checkpoint: true
+  weighting_scheme: "logit_normal"
+  logit_mean: 0.0
+  logit_std: 1.0
+  mode_scale: 1.29
+  cfg_dropout_prob: 0.1
+  training_objective: "-v"
+  log_freq: 10
+  early_eval_freq: 500
+  early_eval: 1000
+  eval_freq: 2000
+  save_freq: 1000
+  eval_freq_epoch: 5
+  save_freq_epoch: 1
+  ema_kwargs:
+    decay: 0.9999
+    use_ema_warmup: true
+    inv_gamma: 1.
+    power: 0.75
+  use_part_dataset: true
+  enable_contrastive: false
+  contrastive_object_weight: 0.0
+  contrastive_part_weight: 0.0
+  contrastive_temperature: 0.07
+  freeze_pretrained_backbone: true
+  freeze_modules:
+    - "pos_embed"
+    - "time_embed"
+    - "part_embedding"
+    - "proj_in"
+    - "blocks.*.attn1"
+    - "blocks.*.ff"
+    - "blocks.*.norm1"
+  trainable_modules:
+    - "blocks.*.attn2*"
+    - "blocks.*.norm2"
+    - "proj_out"
+  use_differential_lr: true
+  frozen_modules_lr: 0.0
+  pretrained_modules_lr: 1e-6
+  new_modules_lr: 3e-5
+  projection_modules_lr: 1e-5
+val:
+  batch_size_per_gpu: 1
+  nrow: 4
+  min_num_parts: 2
+  max_num_parts: 8
+  num_inference_steps: 50
+  max_num_expanded_coords: 1e8
+  use_flash_decoder: false
+  rendering:
+    radius: 4.0
+    num_views: 36
+    fps: 18
+  metric:
+    cd_num_samples: 204800
+    cd_metric: "l2"
+    f1_score_threshold: 0.1
+    default_cd: 1e6
+    default_f1: 0.0

configs/partrag_stage2.yaml ADDED Viewed

	@@ -0,0 +1,118 @@

+model:
+  pretrained_model_name_or_path: '/root/autodl-tmp/PartRAG/pretrained_weights/PartRAG'
+  vae:
+    num_tokens: 1024
+  transformer:
+    enable_local_cross_attn: true
+    global_attn_block_ids: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
+    global_attn_block_id_range: null
+dataset:
+  config:
+    - '/root/autodl-tmp/dataset/Objaverse/processed/high_quality_object_part_configs_FIXED.json'
+  training_ratio: 0.9
+  min_num_parts: 2
+  max_num_parts: 8
+  max_iou_mean: 0.5
+  max_iou_max: 0.5
+  shuffle_parts: true
+  object_ratio: 0.5
+  rotating_ratio: 0.3
+  ratating_degree: 15
+optimizer:
+  name: "adamw"
+  lr: 3e-5
+  betas: [0.9, 0.999]
+  weight_decay: 0.01
+  eps: 1.e-8
+lr_scheduler:
+  name: "cosine_warmup"
+  num_warmup_steps: 300
+retrieval:
+  database_path: /root/autodl-tmp/retrieval_database_high_quality
+  enabled: true
+  top_k: 3
+  use_image: true
+  use_mesh: true
+train:
+  batch_size_per_gpu: 48
+  epochs: 350
+  grad_checkpoint: true
+  weighting_scheme: "logit_normal"
+  logit_mean: 0.0
+  logit_std: 1.0
+  mode_scale: 1.29
+  cfg_dropout_prob: 0.1
+  training_objective: "-v"
+  log_freq: 10
+  early_eval_freq: 500
+  early_eval: 1000
+  eval_freq: 1000
+  save_freq: 1000
+  eval_freq_epoch: 5
+  save_freq_epoch: 1
+  ema_kwargs:
+    decay: 0.9999
+    use_ema_warmup: true
+    inv_gamma: 1.
+    power: 0.75
+  use_part_dataset: true
+  enable_contrastive: true
+  contrastive_object_weight: 0.03
+  contrastive_part_weight: 0.03
+  contrastive_temperature: 0.07
+  # Bidirectional momentum queue (paper setting)
+  use_momentum_queue: true
+  momentum_coefficient: 0.999
+  momentum_queue_size: 65536
+  freeze_pretrained_backbone: true
+  freeze_modules:
+    - "pos_embed"
+    - "time_embed"
+    - "part_embedding"
+    - "proj_in"
+    - "blocks.0.attn1*"
+    - "blocks.0.ff*"
+    - "blocks.0.norm1"
+    - "blocks.1.attn1*"
+    - "blocks.1.ff*"
+    - "blocks.1.norm1"
+  trainable_modules:
+    - "blocks.*.attn2*"
+    - "blocks.*.norm2"
+    - "blocks.[2-9].*"
+    - "blocks.1[0-9].*"
+    - "blocks.20.*"
+    - "proj_out"
+  use_differential_lr: true
+  frozen_modules_lr: 0.0
+  pretrained_modules_lr: 1e-6
+  new_modules_lr: 1e-5
+  projection_modules_lr: 1e-5
+val:
+  batch_size_per_gpu: 1
+  nrow: 4
+  min_num_parts: 2
+  max_num_parts: 8
+  num_inference_steps: 50
+  max_num_expanded_coords: 1e8
+  use_flash_decoder: false
+  rendering:
+    radius: 4.0
+    num_views: 36
+    fps: 18
+  metric:
+    cd_num_samples: 204800
+    cd_metric: "l2"
+    f1_score_threshold: 0.1
+    default_cd: 1e6
+    default_f1: 0.0

feature_extractor_dinov2/preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "crop_size": {
+    "height": 224,
+    "width": 224
+  },
+  "do_center_crop": true,
+  "do_convert_rgb": true,
+  "do_normalize": true,
+  "do_rescale": true,
+  "do_resize": true,
+  "image_mean": [
+    0.485,
+    0.456,
+    0.406
+  ],
+  "image_processor_type": "BitImageProcessor",
+  "image_std": [
+    0.229,
+    0.224,
+    0.225
+  ],
+  "resample": 3,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "shortest_edge": 256
+  }
+}

image_encoder_dinov2/config.json ADDED Viewed

	@@ -0,0 +1,60 @@

+{
+  "apply_layernorm": true,
+  "architectures": [
+    "Dinov2Model"
+  ],
+  "attention_probs_dropout_prob": 0.0,
+  "drop_path_rate": 0.0,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.0,
+  "hidden_size": 1024,
+  "image_size": 518,
+  "initializer_range": 0.02,
+  "layer_norm_eps": 1e-06,
+  "layerscale_value": 1.0,
+  "mlp_ratio": 4,
+  "model_type": "dinov2",
+  "num_attention_heads": 16,
+  "num_channels": 3,
+  "num_hidden_layers": 24,
+  "out_features": [
+    "stage24"
+  ],
+  "out_indices": [
+    24
+  ],
+  "patch_size": 14,
+  "qkv_bias": true,
+  "reshape_hidden_states": true,
+  "stage_names": [
+    "stem",
+    "stage1",
+    "stage2",
+    "stage3",
+    "stage4",
+    "stage5",
+    "stage6",
+    "stage7",
+    "stage8",
+    "stage9",
+    "stage10",
+    "stage11",
+    "stage12",
+    "stage13",
+    "stage14",
+    "stage15",
+    "stage16",
+    "stage17",
+    "stage18",
+    "stage19",
+    "stage20",
+    "stage21",
+    "stage22",
+    "stage23",
+    "stage24"
+  ],
+  "torch_dtype": "float16",
+  "transformers_version": "4.53.0",
+  "use_mask_token": true,
+  "use_swiglu_ffn": false
+}

image_encoder_dinov2/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aa0b83921a3339259fb1ef684ce63c9d09b5a45f9998e0789a0ad4cba318b07b
+size 608785376

model_index.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "_class_name": "PartragPipeline",
+  "_diffusers_version": "0.35.1",
+  "feature_extractor_dinov2": [
+    "transformers",
+    "BitImageProcessor"
+  ],
+  "image_encoder_dinov2": [
+    "transformers",
+    "Dinov2Model"
+  ],
+  "scheduler": [
+    "src.schedulers.scheduling_rectified_flow",
+    "RectifiedFlowScheduler"
+  ],
+  "transformer": [
+    "src.models.transformers.partrag_transformer",
+    "PartragDiTModel"
+  ],
+  "vae": [
+    "src.models.autoencoders.autoencoder_kl_triposg",
+    "TripoSGVAEModel"
+  ]
+}

params.yaml ADDED Viewed

	@@ -0,0 +1,118 @@

+model:
+  pretrained_model_name_or_path: '/root/autodl-tmp/PartRAG/pretrained_weights/PartRAG'
+  vae:
+    num_tokens: 1024
+  transformer:
+    enable_local_cross_attn: true
+    global_attn_block_ids: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
+    global_attn_block_id_range: null
+dataset:
+  config:
+    - '/root/autodl-tmp/dataset/Objaverse/processed/high_quality_object_part_configs_FIXED.json'
+  training_ratio: 0.9
+  min_num_parts: 2
+  max_num_parts: 8
+  max_iou_mean: 0.5
+  max_iou_max: 0.5
+  shuffle_parts: true
+  object_ratio: 0.5
+  rotating_ratio: 0.3
+  ratating_degree: 15
+optimizer:
+  name: "adamw"
+  lr: 3e-5
+  betas: [0.9, 0.999]
+  weight_decay: 0.01
+  eps: 1.e-8
+lr_scheduler:
+  name: "cosine_warmup"
+  num_warmup_steps: 300
+retrieval:
+  database_path: /root/autodl-tmp/retrieval_database_high_quality
+  enabled: true
+  top_k: 3
+  use_image: true
+  use_mesh: true
+train:
+  batch_size_per_gpu: 48
+  epochs: 350
+  grad_checkpoint: true
+  weighting_scheme: "logit_normal"
+  logit_mean: 0.0
+  logit_std: 1.0
+  mode_scale: 1.29
+  cfg_dropout_prob: 0.1
+  training_objective: "-v"
+  log_freq: 10
+  early_eval_freq: 500
+  early_eval: 1000
+  eval_freq: 1000
+  save_freq: 1000
+  eval_freq_epoch: 5
+  save_freq_epoch: 1
+  ema_kwargs:
+    decay: 0.9999
+    use_ema_warmup: true
+    inv_gamma: 1.
+    power: 0.75
+  use_part_dataset: true
+  enable_contrastive: true
+  contrastive_object_weight: 0.03
+  contrastive_part_weight: 0.03
+  contrastive_temperature: 0.07
+  # Bidirectional momentum queue (paper setting)
+  use_momentum_queue: true
+  momentum_coefficient: 0.999
+  momentum_queue_size: 65536
+  freeze_pretrained_backbone: true
+  freeze_modules:
+    - "pos_embed"
+    - "time_embed"
+    - "part_embedding"
+    - "proj_in"
+    - "blocks.0.attn1*"
+    - "blocks.0.ff*"
+    - "blocks.0.norm1"
+    - "blocks.1.attn1*"
+    - "blocks.1.ff*"
+    - "blocks.1.norm1"
+  trainable_modules:
+    - "blocks.*.attn2*"
+    - "blocks.*.norm2"
+    - "blocks.[2-9].*"
+    - "blocks.1[0-9].*"
+    - "blocks.20.*"
+    - "proj_out"
+  use_differential_lr: true
+  frozen_modules_lr: 0.0
+  pretrained_modules_lr: 1e-6
+  new_modules_lr: 1e-5
+  projection_modules_lr: 1e-5
+val:
+  batch_size_per_gpu: 1
+  nrow: 4
+  min_num_parts: 2
+  max_num_parts: 8
+  num_inference_steps: 50
+  max_num_expanded_coords: 1e8
+  use_flash_decoder: false
+  rendering:
+    radius: 4.0
+    num_views: 36
+    fps: 18
+  metric:
+    cd_num_samples: 204800
+    cd_metric: "l2"
+    f1_score_threshold: 0.1
+    default_cd: 1e6
+    default_f1: 0.0

scheduler/scheduler_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "_class_name": "RectifiedFlowScheduler",
+  "_diffusers_version": "0.34.0",
+  "num_train_timesteps": 1000,
+  "shift": 1,
+  "use_dynamic_shifting": false
+}

transformer/config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "_class_name": "PartragDiTModel",
+  "_diffusers_version": "0.35.1",
+  "_name_or_path": "michaelpopo/PartRAG",
+  "cross_attention_dim": 1024,
+  "decay": 0.9999,
+  "enable_global_cross_attn": true,
+  "enable_local_cross_attn": true,
+  "enable_part_embedding": true,
+  "global_attn_block_id_range": null,
+  "global_attn_block_ids": [
+    0,
+    2,
+    4,
+    6,
+    8,
+    10,
+    12,
+    14,
+    16,
+    18,
+    20
+  ],
+  "in_channels": 64,
+  "inv_gamma": 1.0,
+  "max_num_parts": 32,
+  "min_decay": 0.0,
+  "num_attention_heads": 16,
+  "num_layers": 21,
+  "optimization_step": 120,
+  "power": 0.75,
+  "update_after_step": 0,
+  "use_ema_warmup": true,
+  "width": 2048
+}

transformer/diffusion_pytorch_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6ee7b36ec09fbe82ac1259bd36e8a469ac6a2ba275e98fc475bdce4c99cb730d
+size 5758542376

vae/config.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "_class_name": "TripoSGVAEModel",
+  "_diffusers_version": "0.34.0",
+  "_name_or_path": "pretrained_weights/TripoSG",
+  "embed_frequency": 8,
+  "embed_include_pi": false,
+  "embedding_type": "frequency",
+  "in_channels": 3,
+  "latent_channels": 64,
+  "num_attention_heads": 8,
+  "num_layers_decoder": 16,
+  "num_layers_encoder": 8,
+  "width_decoder": 1024,
+  "width_encoder": 512
+}

vae/diffusion_pytorch_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b43b006e5692223877427cdb568c2c1477f52a2d226db4d5eb354b4886c167a4
+size 485361378