Foley-Omni

GitHub Code | arXiv | Demo

Overview

This repository packages the public inference checkpoint set for Foley-Omni. The release focuses on Video-to-Soundtrack (V2ST) generation, where the model jointly generates synchronized speech, sound effects, and music from a video and optional text prompt.

Model Size

5.5B

Repository Contents

ckpts/
β”œβ”€β”€ Foley-Omni/
β”‚   └── v2st.pth
β”œβ”€β”€ Wan2.2-TI2V-5B/
β”‚   β”œβ”€β”€ models_t5_umt5-xxl-enc-bf16.pth
β”‚   └── google/
β”‚       └── umt5-xxl/
β”‚           β”œβ”€β”€ special_tokens_map.json
β”‚           β”œβ”€β”€ spiece.model
β”‚           β”œβ”€β”€ tokenizer.json
β”‚           └── tokenizer_config.json
└── mmaudio/
    └── ext_weights/
        β”œβ”€β”€ v1-16.pth
        β”œβ”€β”€ best_netG.pt
        └── synchformer_state_dict.pth

What each part is used for:

  • ckpts/Foley-Omni/v2st.pth: released inference-only Foley-Omni weights
  • ckpts/Wan2.2-TI2V-5B/*: text encoder and tokenizer for text conditioning
  • ckpts/mmaudio/ext_weights/v1-16.pth: audio VAE for the 16 kHz inference path
  • ckpts/mmaudio/ext_weights/best_netG.pt: vocoder for waveform decoding
  • ckpts/mmaudio/ext_weights/synchformer_state_dict.pth: online visual feature extraction

Online Feature Extraction

This release supports both:

  • direct V2ST inference with pre-extracted clip_feature_path and sync_feature_path
  • V2ST inference without pre-extracted features, using online visual feature extraction

Notes:

  • synchformer_state_dict.pth is included in this repository because it is required for online Sync feature extraction.
  • The CLIP image encoder is loaded by open_clip from apple/DFN5B-CLIP-ViT-H-14-384 on first use. The current code path does not use a separate local CLIP checkpoint file.

Source Attribution

This repository redistributes a small subset of files from the following upstream releases for convenience:

  • Wan2.2-TI2V-5B: text encoder and tokenizer files
  • MMAudio: audio VAE, vocoder, and Synchformer files

Please refer to the original upstream repositories for their licenses, usage terms, and project details.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for CocoBro/Foley-Omni