# CogView4 ## Training For LoRA training, specify `--training_type lora`. For full finetuning, specify `--training_type full-finetune`. Examples available: - [Raider White Tarot cards style](../../examples/training/sft/cogview4/raider_white_tarot/) - [Omni Edit Control LoRA](../../examples/training/control/cogview4/omni_edit/) - [Canny Control LoRA](../../examples/training/control/cogview4/canny/) To run an example, run the following from the root directory of the repository (assuming you have installed the requirements and are using Linux/WSL): ```bash chmod +x ./examples/training/sft/cogview4/raider_white_tarot/train.sh ./examples/training/sft/cogview4/raider_white_tarot/train.sh ``` On Windows, you will have to modify the script to a compatible format to run it. [TODO(aryan): improve instructions for Windows] ## Supported checkpoints The following checkpoints were tested with `finetrainers` and are known to be working: - [THUDM/CogView4-6B](https://huggingface.co/THUDM/CogView4-6B) ## Inference Assuming your LoRA is saved and pushed to the HF Hub, and named `my-awesome-name/my-awesome-lora`, we can now use the finetuned model for inference: ```diff import torch from diffusers import CogView4Pipeline from diffusers.utils import export_to_video pipe = CogView4Pipeline.from_pretrained( "THUDM/CogView4-6B", torch_dtype=torch.bfloat16 ).to("cuda") + pipe.load_lora_weights("my-awesome-name/my-awesome-lora", adapter_name="cogview4-lora") + pipe.set_adapters(["cogview4-lora"], [0.9]) video = pipe("").frames[0] export_to_video(video, "output.mp4") ``` To use trained Control LoRAs, the following can be used for inference (ideally, you should raise a support request in Diffusers):
Control Lora inference ```python import torch from diffusers import CogView4Pipeline from diffusers.utils import load_image from finetrainers.models.utils import _expand_linear_with_zeroed_weights from finetrainers.patches import load_lora_weights from finetrainers.patches.dependencies.diffusers.control import control_channel_concat dtype = torch.bfloat16 device = torch.device("cuda") generator = torch.Generator().manual_seed(0) pipe = CogView4Pipeline.from_pretrained("THUDM/CogView4-6B", torch_dtype=dtype) in_channels = pipe.transformer.config.in_channels patch_channels = pipe.transformer.patch_embed.proj.in_features pipe.transformer.patch_embed.proj = _expand_linear_with_zeroed_weights(pipe.transformer.patch_embed.proj, new_in_features=2 * patch_channels) load_lora_weights(pipe, "/raid/aryan/cogview4-control-lora", "cogview4-lora") pipe.to(device) prompt = "Make the image look like it's from an ancient Egyptian mural." control_image = load_image("examples/training/control/cogview4/omni_edit/validation_dataset/0.png") height, width = 1024, 1024 with torch.no_grad(): latents = pipe.prepare_latents(1, in_channels, height, width, dtype, device, generator) control_image = pipe.image_processor.preprocess(control_image, height=height, width=width) control_image = control_image.to(device=device, dtype=dtype) control_latents = pipe.vae.encode(control_image).latent_dist.sample(generator=generator) control_latents = (control_latents - pipe.vae.config.shift_factor) * pipe.vae.config.scaling_factor with control_channel_concat(pipe.transformer, ["hidden_states"], [control_latents], dims=[1]): image = pipe(prompt, latents=latents, num_inference_steps=30, generator=generator).images[0] image.save("output.png") ```
You can refer to the following guides to know more about the model pipeline and performing LoRA inference in `diffusers`: - [CogView4 in Diffusers](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogview4) - [Load LoRAs for inference](https://huggingface.co/docs/diffusers/main/en/tutorials/using_peft_for_inference) - [Merge LoRAs](https://huggingface.co/docs/diffusers/main/en/using-diffusers/merge_loras)