ONNX
Safetensors
Chinese
English
Dubbing-model
xuan3986 commited on
Commit
8165d35
·
verified ·
1 Parent(s): 1b57165

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -11
README.md CHANGED
@@ -23,6 +23,7 @@ tags:
23
 
24
  <div align="center">
25
  <h4><a href="#Open-Source">Open Source</a>
 
26
  |<a href="#Dataset-Pipeline">Dataset Pipeline</a>
27
  |<a href="#Dubbing-Model">Dubbing Model</a>
28
  |<a href="#Recent-Updates">Recent Updates</a>
@@ -31,9 +32,7 @@ tags:
31
  </h4>
32
  </div>
33
 
34
- **Fun-CineForge** contains an end-to-end dataset pipeline for producing large-scale dubbing datasets and an MLLM-based dubbing model designed for diverse cinematic scenes.
35
- Using this pipeline, we constructed the first large-scale Chinese television dubbing dataset CineDub-CN, which includes rich annotations and diverse scenes.
36
- In monologue, narration, dialogue, and multi-speaker scenes, our dubbing model consistently outperforms state-of-the-art methods in terms of audio quality, lip-sync, timbre transition, and instruction following.
37
 
38
  <a name="Open-Source"></a>
39
  ## Open Source 🎬
@@ -41,18 +40,17 @@ You can access [https://funcineforge.github.io/](https://funcineforge.github.io/
41
 
42
  GitHub link: [https://github.com/FunAudioLLM/FunCineForge/](https://github.com/FunAudioLLM/FunCineForge/)
43
 
44
- Modelscope link: [https://www.modelscope.cn/models/FunAudioLLM/Fun-CineForge/](https://www.modelscope.cn/models/FunAudioLLM/Fun-CineForge/)
45
 
46
  CineDub Samples:
47
  [huggingface](https://huggingface.co/datasets/FunAudioLLM/CineDub-Example/)
48
  [modelscope](https://www.modelscope.cn/datasets/FunAudioLLM/CineDub-Example)
49
 
50
- <a name="Dataset-Pipeline"></a>
51
- ## Dataset Pipeline 🔨
52
 
53
- ### Environmental Installation
54
 
55
- Fun-CineForge dataset pipeline toolkit only relies on a Python environment to run.
56
  ```shell
57
  # Conda
58
  git clone git@github.com:FunAudioLLM/FunCineForge.git
@@ -62,6 +60,9 @@ sudo apt-get install ffmpeg
62
  python setup.py
63
  ```
64
 
 
 
 
65
  ### Data collection
66
  If you want to produce your own data,
67
  we recommend that you refer to the following requirements to collect the corresponding movies or television series.
@@ -101,16 +102,20 @@ cd speaker_diarization
101
  bash run.sh --stage 1 --stop_stage 4 --hf_access_token hf_xxx --root datasets/clean/zh --gpus "0 1 2 3"
102
  ```
103
 
 
 
 
 
 
104
  - [5] Multimodal CoT Correction. Based on general-purpose MLLMs, the system uses audio, ASR text, and RTTM files as input. It leverages Chain-of-Thought (CoT) reasoning to extract clues and corrects the results of the specialized models. It also annotates character age, gender, and vocal timbre. Experimental results show that this strategy reduces the CER from 4.53% to 0.94% and the speaker diarization error rate from 8.38% to 1.20%, achieving quality comparable to or even better than manual transcription. Adding the --resume enables breakpoint COT inference to prevent wasted resources from repeated COT inferences. Now supports both Chinese and English.
105
  ```shell
106
  python cot.py --root_dir datasets/clean/zh --lang zh --provider google --model gemini-3-pro-preview --api_key xxx --resume
107
  python cot.py --root_dir datasets/clean/en --lang en --provider google --model gemini-3-pro-preview --api_key xxx --resume
108
- python build_datasets.py --root_zh datasets/clean/zh --root_en datasets/clean/en --out_dir datasets/clean --save
109
  ```
110
 
111
- - (Reference) Extract speech tokens based on the CosyVoice3 tokenizer for llm training.
112
  ```shell
113
- python speech_tokenizer.py --root datasets/clean/zh
114
  ```
115
 
116
  <a name="Dubbing-Model"></a>
@@ -149,9 +154,12 @@ If you use our dataset or code, please cite the following paper:
149
 
150
  <a name="Comminicate"></a>
151
  ## Comminicate 🍟
 
152
  We welcome you to participate in discussions on Fun-CineForge [GitHub Issues](https://github.com/FunAudioLLM/FunCineForge/issues) or contact us for collaborative development.
153
  For any questions, you can contact the [developer](mailto:jxliu@mail.ustc.edu.cn).
154
 
 
 
155
  ### Disclaimer
156
 
157
  This repository contains research artifacts:
 
23
 
24
  <div align="center">
25
  <h4><a href="#Open-Source">Open Source</a>
26
+ |<a href="#Environment">Environment</a>
27
  |<a href="#Dataset-Pipeline">Dataset Pipeline</a>
28
  |<a href="#Dubbing-Model">Dubbing Model</a>
29
  |<a href="#Recent-Updates">Recent Updates</a>
 
32
  </h4>
33
  </div>
34
 
35
+ **Fun-CineForge** contains an end-to-end dataset pipeline for producing large-scale dubbing datasets and an MLLM-based dubbing model designed for diverse cinematic scenes. Using this pipeline, we constructed the first large-scale Chinese television dubbing dataset CineDub-CN, which includes rich annotations and diverse scenes. In monologue, narration, dialogue, and multi-speaker scenes, our dubbing model consistently outperforms state-of-the-art methods in terms of audio quality, lip-sync, timbre transition, and instruction following.
 
 
36
 
37
  <a name="Open-Source"></a>
38
  ## Open Source 🎬
 
40
 
41
  GitHub link: [https://github.com/FunAudioLLM/FunCineForge/](https://github.com/FunAudioLLM/FunCineForge/)
42
 
43
+ HuggingFace link: [https://huggingface.co/FunAudioLLM/Fun-CineForge/](https://huggingface.co/FunAudioLLM/Fun-CineForge/)
44
 
45
  CineDub Samples:
46
  [huggingface](https://huggingface.co/datasets/FunAudioLLM/CineDub-Example/)
47
  [modelscope](https://www.modelscope.cn/datasets/FunAudioLLM/CineDub-Example)
48
 
49
+ <a name="Environment"></a>
50
+ ## Environmental Installation
51
 
52
+ Fun-CineForge relies on Conda and Python environments. Execute **setup.py** to automatically install the entire project environment and open-source model.
53
 
 
54
  ```shell
55
  # Conda
56
  git clone git@github.com:FunAudioLLM/FunCineForge.git
 
60
  python setup.py
61
  ```
62
 
63
+ <a name="Dataset-Pipeline"></a>
64
+ ## Dataset Pipeline 🔨
65
+
66
  ### Data collection
67
  If you want to produce your own data,
68
  we recommend that you refer to the following requirements to collect the corresponding movies or television series.
 
102
  bash run.sh --stage 1 --stop_stage 4 --hf_access_token hf_xxx --root datasets/clean/zh --gpus "0 1 2 3"
103
  ```
104
 
105
+ - (Reference) Extract speech tokens based on the CosyVoice3 tokenizer for llm training.
106
+ ```shell
107
+ python speech_tokenizer.py --root datasets/clean/zh
108
+ ```
109
+
110
  - [5] Multimodal CoT Correction. Based on general-purpose MLLMs, the system uses audio, ASR text, and RTTM files as input. It leverages Chain-of-Thought (CoT) reasoning to extract clues and corrects the results of the specialized models. It also annotates character age, gender, and vocal timbre. Experimental results show that this strategy reduces the CER from 4.53% to 0.94% and the speaker diarization error rate from 8.38% to 1.20%, achieving quality comparable to or even better than manual transcription. Adding the --resume enables breakpoint COT inference to prevent wasted resources from repeated COT inferences. Now supports both Chinese and English.
111
  ```shell
112
  python cot.py --root_dir datasets/clean/zh --lang zh --provider google --model gemini-3-pro-preview --api_key xxx --resume
113
  python cot.py --root_dir datasets/clean/en --lang en --provider google --model gemini-3-pro-preview --api_key xxx --resume
 
114
  ```
115
 
116
+ - The construction of the dataset retrieval file will read all production data, perform bidirectional verification of script content and speaker separation results.
117
  ```shell
118
+ python build_datasets.py --root_zh datasets/clean/zh --root_en datasets/clean/en --out_dir datasets/clean --save
119
  ```
120
 
121
  <a name="Dubbing-Model"></a>
 
154
 
155
  <a name="Comminicate"></a>
156
  ## Comminicate 🍟
157
+ The Fun-CineForge open-source project is developed and maintained by the Tongyi Lab Speech Team and a student from NERCSLIP, University of Science and Technology of China.
158
  We welcome you to participate in discussions on Fun-CineForge [GitHub Issues](https://github.com/FunAudioLLM/FunCineForge/issues) or contact us for collaborative development.
159
  For any questions, you can contact the [developer](mailto:jxliu@mail.ustc.edu.cn).
160
 
161
+ ⭐ Hope you will support Fun-CineForge. Thank you.
162
+
163
  ### Disclaimer
164
 
165
  This repository contains research artifacts: