engineerA314 commited on
Commit
beb31fe
·
verified ·
1 Parent(s): 7decf66

Add model card

Browse files
Files changed (1) hide show
  1. README.md +82 -0
README.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: diffusers
4
+ pipeline_tag: image-to-video
5
+ tags:
6
+ - wan
7
+ - video-generation
8
+ - image-to-video
9
+ - diffusers
10
+ base_model: alibaba-pai/Wan2.1-Fun-V1.1-1.3B-InP
11
+ ---
12
+
13
+ # Wan2.1-Fun-V1.1-1.3B-InP (Diffusers)
14
+
15
+ This is a diffusers-format conversion of [alibaba-pai/Wan2.1-Fun-V1.1-1.3B-InP](https://huggingface.co/alibaba-pai/Wan2.1-Fun-V1.1-1.3B-InP) (Wan-Fun Inpaint V1.1 1.3B) from VideoX-Fun format.
16
+
17
+ ## Model Details
18
+
19
+ - **Architecture**: WanTransformer3DModel with `in_channels=36` (16 noise + 4 mask + 16 image latent)
20
+ - **Parameters**: 1.3B
21
+ - **Pipeline**: `WanImageToVideoPipeline` (standard diffusers, no patching required)
22
+ - **Resolution**: 480x832 (480p) recommended
23
+ - **Frames**: 49 frames at 16fps (~3 seconds)
24
+
25
+ This model has the same I2V architecture as the official [Wan2.1-I2V-14B-480P](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P-Diffusers) (`in_channels=36`), but at 1.3B scale.
26
+
27
+ ## Usage
28
+
29
+ ```python
30
+ import torch
31
+ from diffusers import WanImageToVideoPipeline
32
+ from PIL import Image
33
+
34
+ pipe = WanImageToVideoPipeline.from_pretrained(
35
+ "engineerA314/Wan2.1-Fun-V1.1-1.3B-InP-Diffusers",
36
+ torch_dtype=torch.bfloat16,
37
+ )
38
+ pipe.enable_sequential_cpu_offload()
39
+
40
+ image = Image.open("first_frame.png").convert("RGB")
41
+
42
+ output = pipe(
43
+ image=image,
44
+ prompt="A person is talking naturally",
45
+ negative_prompt="static, blurred, low quality",
46
+ height=480,
47
+ width=832,
48
+ num_frames=49,
49
+ num_inference_steps=50,
50
+ guidance_scale=5.0,
51
+ )
52
+
53
+ from diffusers.utils import export_to_video
54
+ export_to_video(output.frames[0], "output.mp4", fps=16)
55
+ ```
56
+
57
+ ## Conversion Details
58
+
59
+ Converted from VideoX-Fun format using 1:1 weight key mapping (983 keys). No architectural modifications were needed -- the standard `WanImageToVideoPipeline` handles `in_channels=36` natively.
60
+
61
+ ### Components
62
+
63
+ | Component | Source |
64
+ |-----------|--------|
65
+ | Transformer | Converted from `alibaba-pai/Wan2.1-Fun-V1.1-1.3B-InP` |
66
+ | VAE | `Wan-AI/Wan2.1-T2V-1.3B-Diffusers` |
67
+ | Text Encoder | `Wan-AI/Wan2.1-T2V-1.3B-Diffusers` (UMT5-XXL) |
68
+ | Image Encoder | `Wan-AI/Wan2.1-I2V-14B-480P-Diffusers` (CLIP ViT-H-14) |
69
+ | Scheduler | UniPCMultistepScheduler (`flow_shift=3.0`) |
70
+
71
+ ### Comparison with TI2V variant
72
+
73
+ | | This model (InP) | [TI2V](https://huggingface.co/engineerA314/Wan2.1-Fun-V1.1-1.3B-TI2V-Diffusers) |
74
+ |---|---|---|
75
+ | `in_channels` | 36 (noise + mask + image) | 32 (noise + image) |
76
+ | Pipeline patches | None needed | `prepare_latents` override required |
77
+ | Origin | Wan-Fun Inpaint | Wan-Fun Camera Control (adapter removed) |
78
+
79
+ ## Acknowledgements
80
+
81
+ - [Alibaba PAI / VideoX-Fun](https://github.com/alibaba-pai/VideoX-Fun) for the original Wan-Fun models
82
+ - [Wan-Video](https://github.com/Wan-Video/Wan2.1) for the Wan 2.1 architecture