File size: 5,906 Bytes
a9536c4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 | # AGENTS.md
This file provides guidance to Codex (Codex.ai/code) when working with code in this repository.
## Project Overview
RVC v2 voice conversion & AI cover system. Users upload a song, the pipeline separates vocals from accompaniment (Mel-Band Roformer), converts the vocal timbre via RVC v2 (HuBERT + RMVPE + FAISS), then mixes the result back with the accompaniment.
**Platform Support**: Windows / Linux / WSL2 / Google Colab
**Key Features**:
- AI song covers with automatic vocal separation and mixing
- 117 downloadable character models
- 4 mixing presets (universal, vocal-focused, accompaniment-focused, live)
- Karaoke mode (lead/backing vocal separation)
- 4 VC preprocessing modes (auto, direct, uvr_deecho, legacy)
- Dual VC pipeline (current implementation vs official RVC)
- Multi-backend GPU support (CUDA, ROCm, XPU, DirectML, MPS)
## Commands
```powershell
# Activate venv (Windows)
.\venv310\Scripts\Activate.ps1
# Activate venv (Linux/WSL2)
source venv310/bin/activate
# Install dependencies
python install.py # full install + launch
python install.py --check # check only
python install.py --cpu # CPU variant
# Run
python run.py # default: http://127.0.0.1:7860
python run.py --skip-check # skip env/model validation
python run.py --host 0.0.0.0 --port 8080 --share
# Download base models (HuBERT, RMVPE)
python tools/download_models.py
# Download character models
python -c "from tools.character_models import download_character_model; download_character_model('rin')"
# Quick CUDA check
python -c "import torch; print(torch.cuda.is_available())"
# Colab
# Open AI_RVC_Colab.ipynb in Google Colab, set runtime to GPU (T4), run cells sequentially
```
## Architecture
**Entry:** `run.py` → env check → model check → `ui/app.py:launch()`
**Pipeline flow** (`infer/cover_pipeline.py:CoverPipeline.process`):
1. Vocal separation (`infer/separator.py`) — Roformer (default), Demucs, or UVR5
2. RVC voice conversion (`infer/pipeline.py`) — HuBERT features → RMVPE F0 → RVC v2 inference with FAISS retrieval
3. Mixing (`lib/mixer.py`) — volume adjust + reverb via pedalboard
**Character model system** (`tools/character_models.py`):
- 117 downloadable character models from HuggingFace (`trioskosmos/rvc_models`)
- Stored in `assets/weights/characters/`
- Version notes (epochs, sample rate) extracted from .pth metadata and cached in `_version_notes.json`
- Display name assembly: `_get_display_name()` appends `(500 epochs·40k)` style training info
**UI** (`ui/app.py`):
- Gradio 3.50.2, single-file ~2000 lines
- i18n via `i18n/zh_CN.json`, accessed through `t(key, section)` helper
- Three main tabs: song cover (full pipeline), model management, settings
- Cover tab features:
- Character model download/management with series filtering and keyword search
- 4 mixing presets (universal, vocal-focused, accompaniment-focused, live)
- Karaoke separation (lead/backing vocals)
- 4 VC preprocessing modes (auto, direct, uvr_deecho, legacy)
- Source constraint control (auto/off/on)
- Dual VC pipeline mode (current/official)
- Singing repair (official mode only)
- Real-time VC route status display
- Model management tab:
- Base model download (HuBERT, RMVPE)
- Mature DeEcho model download
- Model list table with refresh
- Settings tab:
- Device info display
- Backend selection (CUDA/ROCm/XPU/DirectML/MPS/CPU)
- Config save
**Config:** `configs/config.json` — device, F0 method, index rate, cover separator settings, path mappings
## Key Conventions
- Python 3.10, UTF-8, 4-space indent
- `snake_case` functions/variables, `PascalCase` classes, `UPPER_SNAKE_CASE` constants
- User-facing text is bilingual Chinese/English
- Commit messages: short imperative subjects, Chinese/English mixed (e.g. `infer: fix CUDA OOM`)
- No automated test suite; verify changes by running one voice conversion + one cover through the UI
- `_official_rvc/` is vendored upstream reference — don't modify unless syncing
## Important Paths
- `configs/config.json` — all runtime settings
- `infer/cover_pipeline.py` — orchestrates the full cover workflow
- `infer/pipeline.py` — RVC v2 inference core
- `infer/separator.py` — Roformer/Demucs vocal separation wrappers
- `tools/character_models.py` — character model registry (117 entries) + download logic
- `tools/download_models.py` — base model (HuBERT/RMVPE) + mature DeEcho downloader
- `lib/mixer.py` — audio mixing with volume/reverb
- `ui/app.py` — entire Gradio UI (~2000 lines)
- `mcp/server.py` + `mcp/tools.py` — MCP server integration for Codex
- `AI_RVC_Colab.ipynb` — Google Colab notebook with full feature parity
- `install.py` — cross-platform installation script (Windows/Linux)
## Things to Watch
- `fairseq` is pinned to `0.12.2` — HuBERT loading breaks on other versions
- `audio-separator` must be installed with `[gpu]` extra for CUDA support
- Roformer model auto-downloads on first use to `assets/separator_models/`
- Gradio is pinned to `3.50.2`; the UI code uses v3 API patterns (not v4)
- Model weights (.pt, .pth) and audio files are gitignored — never commit them
- Path handling uses `pathlib.Path` for cross-platform compatibility (Windows/Linux)
- Virtual environment activation differs by platform: `Scripts/Activate.ps1` (Windows) vs `bin/activate` (Linux)
- `install.py` has hardcoded Windows Python paths in `PYTHON310_CANDIDATES` but falls back to `py -3.10` launcher
- Platform detection uses `os.name == "nt"` for Windows-specific logic (venv paths, etc.)
- All core functionality is platform-agnostic; audio libraries work better on Linux
- Colab notebook (`AI_RVC_Colab.ipynb`) provides full feature parity with Web UI
|