| | --- |
| | license: mit |
| | language: |
| | - en |
| | - zh |
| | base_model: |
| | - Wan-AI/Wan2.2-S2V-14B |
| | pipeline_tag: any-to-any |
| | --- |
| | |
| | # RealVideo |
| |
|
| | RealVideo is a WebSocket-based video calling system that supports text input. It leverages **GLM-4.5-AirX** and |
| | **GLM-TTS** models to generate audio responses and utilizes autoregressive diffusion to generate corresponding video frames. The |
| | system features a modular design with full functionality and a clean code structure. |
| | Visit [blog](https://z.ai/blog/realvideo) here! |
| |
|
| | ## Features |
| |
|
| | - **Text Input**: Supports text message input. |
| | - **AI Voice Response**: Integrates GLM-4.5-AirX and GLM-TTS models to generate voice responses. |
| | - **Lip Sync**: Generates real-time conversational video based on any input image and audio. |
| | - **Real-time Communication**: WebSocket-based real-time bidirectional communication. |
| |
|
| | ## Quick Start |
| |
|
| | you can check in our [GitHub](https://github.com/zai-org/RealVideo). |
| |
|
| | ## Technical Highlights |
| |
|
| | - **Model Integration**: Allows for convenient and quick voice cloning, taking text input to generate audio output. |
| | - **Modular Design**: Clear code structure, easy to maintain and extend. |
| | - **Real-time Performance**: Optimized audio processing and real-time video generation algorithms. |
| |
|
| | ## Acknowledgements |
| |
|
| | This project utilizes the following open-source libraries: |
| |
|
| | - [self forcing](https://github.com/guandeh17/Self-Forcing) |