zai-org
/

RealVideo

Model card Files Files and versions

RealVideo / README.md

ZHANGYUXUAN-zR's picture

Update README.md

1447f2d verified 3 months ago

|

history blame contribute delete

1.37 kB

	---
	license: mit
	language:
	- en
	- zh
	base_model:
	- Wan-AI/Wan2.2-S2V-14B
	pipeline_tag: any-to-any
	---

	# RealVideo

	RealVideo is a WebSocket-based video calling system that supports text input. It leverages GLM-4.5-AirX and
	GLM-TTS models to generate audio responses and utilizes autoregressive diffusion to generate corresponding video frames. The
	system features a modular design with full functionality and a clean code structure.
	Visit [blog](https://z.ai/blog/realvideo) here!

	## Features

	- Text Input: Supports text message input.
	- AI Voice Response: Integrates GLM-4.5-AirX and GLM-TTS models to generate voice responses.
	- Lip Sync: Generates real-time conversational video based on any input image and audio.
	- Real-time Communication: WebSocket-based real-time bidirectional communication.

	## Quick Start

	you can check in our [GitHub](https://github.com/zai-org/RealVideo).

	## Technical Highlights

	- Model Integration: Allows for convenient and quick voice cloning, taking text input to generate audio output.
	- Modular Design: Clear code structure, easy to maintain and extend.
	- Real-time Performance: Optimized audio processing and real-time video generation algorithms.

	## Acknowledgements

	This project utilizes the following open-source libraries:

	- [self forcing](https://github.com/guandeh17/Self-Forcing)