TrizZZZZ commited on
Commit
c37d533
·
verified ·
1 Parent(s): 061c68e

Delete readme.md

Browse files
Files changed (1) hide show
  1. readme.md +0 -179
readme.md DELETED
@@ -1,179 +0,0 @@
1
- ---
2
- license: apache-2.0
3
- pipeline_tag: image-text-to-video
4
- ---
5
-
6
- <div align="center">
7
-
8
- <h1 align="center">Bernini-Diffusers</h1>
9
-
10
- <h4 align="center">Latent Semantic Planning for Video Diffusion</h4>
11
-
12
- **Chenchen Liu<sup>\*</sup>, Junyi Chen<sup>\*</sup>, Lei Li<sup>\*</sup>, Lu Chi<sup>\*,§</sup>, Mingzhen Sun<sup>\*</sup>, Zhuoying Li<sup>\*</sup>, Yi Fu, Ruoyu Guo, Yiheng Wu, Ge Bai, Zehuan Yuan<sup>✉</sup>**
13
-
14
- <sup>\*</sup> Equal contribution&nbsp;&nbsp;<sup>✉</sup> Corresponding author&nbsp;&nbsp;<sup>§</sup> Project lead
15
-
16
- [![arXiv](https://img.shields.io/badge/arXiv-2605.22344-b31b1b.svg)](https://arxiv.org/abs/2605.22344)
17
- [![Project Page](https://img.shields.io/badge/Project-Page-blue.svg)](https://bernini-ai.github.io/)
18
- [![HuggingFace](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Models-yellow)](https://huggingface.co/collections/ByteDance/bernini)
19
-
20
- </div>
21
-
22
- ## 🎉 News
23
-
24
- - **[2026-06-10]** We open-sourced the inference code and model weights of the full Bernini (**Bernini**).
25
- - **[2026-05-22]** We released our paper [Bernini: Latent Semantic Planning for Video Diffusion](https://arxiv.org/abs/2605.22344).
26
-
27
- ## ✨ Highlights
28
-
29
- Bernini is a unified framework for video generation and editing that combines an MLLM-based semantic planner with a DiT-based renderer.
30
-
31
- Compared with the renderer-only Bernini-R release, **Bernini-Diffusers** packages the full semantic-planning pipeline: a Qwen2.5-VL planner, Bernini planning weights, and Wan2.2 diffusion components in one self-contained directory. This makes it the recommended release when you need stronger instruction following, multi-step semantic planning, and better handling of complex video editing requests.
32
-
33
- ## 🧾 Model card
34
-
35
- | Field | Description |
36
- |-------|-------------|
37
- | Model type | Full video generation/editing pipeline with an MLLM-based semantic planner and a DiT-based renderer. |
38
- | Checkpoint | [`ByteDance/Bernini-Diffusers`](https://huggingface.co/ByteDance/Bernini-Diffusers) |
39
- | Code | [`ByteDance/Bernini`](https://github.com/bytedance/Bernini) |
40
- | Recommended use | Complex generation/editing requests that benefit from explicit latent semantic planning and stronger instruction following. |
41
- | Model behavior | Better at decomposing complex instructions and planning semantic changes before rendering, at the cost of a heavier checkpoint layout than Bernini-R. |
42
-
43
- ### Benchmark snapshot
44
-
45
- | Model | EditVerse | OpenVE | OpenS2V | VBench | Bernini-v2v (OS) | Bernini-vr2v (OS) |
46
- |---|---|---|---|---|---|---|
47
- | [Bernini 7+14B](https://huggingface.co/ByteDance/Bernini-Diffusers) | 8.02 | 4.03 | 62.30 | 84.37 | 3.49 | 3.48 |
48
-
49
- On video editing, Bernini reaches the first tier among leading closed-source commercial models in our internal arena evaluation based on blind human pairwise comparisons.
50
-
51
- ## 📦 Package layout
52
-
53
- This release is a **self-contained diffusers-format directory**. Pass the downloaded `Bernini-Diffusers` directory directly to `--config`.
54
-
55
- ```text
56
- Bernini-Diffusers/
57
- bernini/
58
- mllm/
59
- scheduler/
60
- t5_text_encoder/
61
- t5_tokenizer/
62
- vae/
63
- config.json
64
- transformer_config.json
65
- transformer_2_config.json
66
- ```
67
-
68
- At runtime:
69
-
70
- - `bernini/` provides the Bernini planning checkpoint.
71
- - `mllm/` provides the Qwen2.5-VL planner assets.
72
- - `transformer_config.json` and `transformer_2_config.json` define the Wan2.2 diffusion decoder components used by the full pipeline.
73
- - `t5_text_encoder/`, `t5_tokenizer/`, `vae/`, and `scheduler/` provide the base diffusion modules required for inference.
74
-
75
- ## 📥 Download
76
-
77
- ```bash
78
- pip install -U "huggingface_hub"
79
- hf download ByteDance/Bernini-Diffusers \
80
- --local-dir pretrained_models/Bernini-Diffusers
81
- ```
82
-
83
- ## 🚀 Usage
84
-
85
- The official inference code is available in the [Bernini repository](https://github.com/bytedance/Bernini).
86
-
87
- ### Installation
88
-
89
- ```bash
90
- git clone https://github.com/bytedance/Bernini.git bernini && cd bernini
91
- pip install -r requirements.txt
92
- ```
93
-
94
- Recommended environment:
95
-
96
- - **Python** 3.11.2
97
- - **PyTorch** 2.5.1+cu124
98
- - **CUDA toolkit** 12.4
99
- - **GPU** Hopper GPUs (H100/H800/H200) are recommended for best performance
100
-
101
- For multi-GPU sequence parallel inference, install VeOmni:
102
-
103
- ```bash
104
- pip install --no-deps git+https://github.com/ByteDance-Seed/VeOmni.git@v0.1.10
105
- ```
106
-
107
- ### Load the model
108
-
109
- Pass the downloaded directory directly as `--config`:
110
-
111
- ```bash
112
- python infer_single_gpu.py --config pretrained_models/Bernini-Diffusers \
113
- --case assets/testcases/i2i/i2i.json --num_frames 1
114
- ```
115
-
116
- ### Prompt enhancer (highly recommended)
117
-
118
- `--use_pe` enhances the prompt through an OpenAI-compatible endpoint and is recommended for best generation quality.
119
-
120
- ```bash
121
- export BERNINI_PE_API_KEY=... # or OPENAI_API_KEY
122
- export BERNINI_PE_BASE_URL=... # or OPENAI_BASE_URL
123
- export BERNINI_PE_MODEL=... # vision-capable chat model
124
- ```
125
-
126
- ### Gradio demo
127
-
128
- ```bash
129
- # Single GPU
130
- python gradio_demo.py --config pretrained_models/Bernini-Diffusers --port 7860
131
-
132
- # 8 GPUs, 8-way Ulysses sequence parallel
133
- torchrun --nproc-per-node 8 gradio_demo.py --ulysses 8 \
134
- --config pretrained_models/Bernini-Diffusers \
135
- --port 7860 --share
136
- ```
137
-
138
- ### Run scripts
139
-
140
- The [`scripts/bernini/`](https://github.com/bytedance/Bernini/tree/master/scripts/bernini) directory in the Bernini repo provides ready-to-run task launchers for the full pipeline:
141
-
142
- - `run_t2i.sh`
143
- - `run_i2i.sh`
144
- - `run_t2v.sh`
145
- - `run_v2v.sh`
146
- - `run_rv2v.sh`
147
- - `run_r2v.sh`
148
- - `run_gradio.sh`
149
-
150
- You can override the model directory with:
151
-
152
- ```bash
153
- export BERNINI_CONFIG=/path/to/Bernini-Diffusers
154
- ```
155
-
156
- ## 📑 Citation
157
-
158
- If you use Bernini in your research, please cite:
159
-
160
- ```bibtex
161
- @article{bernini,
162
- title = {Bernini: Latent Semantic Planning for Video Diffusion},
163
- author = {Chenchen Liu and Junyi Chen and Lei Li and Lu Chi and Mingzhen Sun and Zhuoying Li and Yi Fu and Ruoyu Guo and Yiheng Wu and Ge Bai and Zehuan Yuan},
164
- journal = {arXiv preprint arXiv:2605.22344},
165
- year = {2026}
166
- }
167
- ```
168
-
169
- ## 🙏 Acknowledgements
170
-
171
- Bernini builds on several outstanding open-source projects:
172
-
173
- - [Wan2.2-T2V-A14B](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B)
174
- - [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)
175
- - [VeOmni](https://github.com/ByteDance-Seed/VeOmni)
176
-
177
- ## 📄 License
178
-
179
- Apache License 2.0.