| | --- |
| | title: Audio Visual Transcription |
| | app_file: app.py |
| | sdk: gradio |
| | sdk_version: 5.1.0 |
| | license: apache-2.0 |
| | emoji: π |
| | colorFrom: blue |
| | colorTo: purple |
| | short_description: Get your synchronized subtitled video in minutes with AI. |
| | --- |
| | # AudioVisualTranscription |
| |
|
| | [](https://huggingface.co/spaces/nelikCode/AudioVisualTranscription) |
| |
|
| | Get your synchronized subtitled video in minutes with AI! |
| |
|
| |  |
| |
|
| | ## π Overview |
| |
|
| | **AVT** is a tool that allows you to precisely subtitle your audio or video |
| | content in minutes, using the power of AI. |
| |
|
| | Whether you need subtitles for accessibility, language learning, or just to make |
| | your content more engaging, this app has got you covered. Simply upload your audio |
| | or video file, select the language, and let the magic happen. |
| |
|
| | ## β¨ Features |
| |
|
| | - **Easy-to-use Interface**: Powered by [Gradio](https://gradio.app) for an |
| | intuitive user experience. |
| | - **Multi-Language Support**: Supports transcription in multiple languages: |
| | English, Spanish, French, German, Italian, Dutch, Russian, Norwegian, Chinese, |
| | and more. |
| | - **Video Playback**: View your subtitled video directly in the web app. |
| | - **Download Subtitles**: Save generated subtitle files for use with your preferred |
| | video player. |
| |
|
| | ## π Quickstart |
| |
|
| | The easiest way to use **AVT** is through this |
| | [Hugging Face Space](https://huggingface.co/spaces/nelikCode/AudioVisualTranscription). |
| |
|
| | To use it locally, follow the steps below. |
| |
|
| | ### Installation |
| |
|
| | Follow these steps to set up the application on your local machine. |
| |
|
| | 1. **Clone the repository**: |
| |
|
| | ```bash |
| | git clone https://github.com/killian31/AudioVisualTranscription |
| | cd AudioVisualTranscription |
| | ``` |
| | |
| | 2. **Create a Python environment** using pyenv: |
| |
|
| | ```bash |
| | pyenv virtualenv 3.11.9 avt |
| | pyenv activate avt |
| | ``` |
| | |
| | 3. **Install Poetry**: |
| |
|
| | ```bash |
| | pip install poetry |
| | ``` |
| | |
| | 4. **Install dependencies**: |
| |
|
| | ```bash |
| | poetry install |
| | ``` |
| | |
| | 5. **Install system-level dependencies**: |
| | - **MacOS**: Run the following script to install FFmpeg and ImageMagick. |
| |
|
| | ```bash |
| | bash ./install_macos.sh |
| | ``` |
| | |
| | - **Debian/Ubuntu**: Run the following commands to install FFmpeg and ImageMagick. |
| |
|
| | ```bash |
| | chmod +x install_linux.sh |
| | ./install_linux.sh |
| | ``` |
| | |
| | ### Running the App |
| |
|
| | To launch the Gradio app: |
| |
|
| | ```bash |
| | python app.py |
| | ``` |
| |
|
| | After launching, navigate to the provided local URL to interact with the |
| | application in your browser. |
| |
|
| | ## π How It Works |
| |
|
| | 1. **Upload Your Content**: Use the provided options to upload an audio file |
| | **or** a video file. Select the file type accordingly in the dropdown menu |
| | (Video, Audio). |
| | 2. **Select Your Preferences**: Choose the language of transcription and any |
| | delay settings you prefer. |
| | 3. **Generate Subtitles**: Click on the βGenerate Subtitled Videoβ button to |
| | process your input. |
| | 4. **Download or View**: View the subtitled video directly on the web interface |
| | or download the SRT subtitle file for later use. You need to generate the |
| | subtitles before being able to ckick on the download button. |
| |
|
| | ## π Requirements |
| |
|
| | The app relies on the following system-level dependencies: |
| |
|
| | - **[FFmpeg](https://ffmpeg.org/)**: Required for handling video and audio. |
| | - **[ImageMagick](https://imagemagick.org/)**: Required for video processing. |
| |
|
| | Please ensure these are installed using the provided scripts before running the app. |
| |
|
| | ## π Technologies Used |
| |
|
| | - **Gradio**: Provides the web interface for easy interaction. |
| | - **Whisper by OpenAI**: Performs speech recognition. |
| |
|
| | ## π€ Contributing |
| |
|
| | Contributions are welcome! If you'd like to improve the app or add new features, |
| | feel free to fork the repository and open a pull request. Please format your code |
| | with `black`. |
| |
|
| | ## π License |
| |
|
| | This project is open source and available under the [Apache 2.0 License](LICENSE). |
| |
|
| | ## βοΈ Contact |
| |
|
| | If you have any questions, feel free to |
| | [open an issue](https://github.com/killian31/AudioVisualTranscription/issues/new). |