If you're looking for a fast and accurate way to transcribe a YouTube video, combining yt-dlp with OpenAI's Whisper is a powerful solution. This guide walks you through the entire process — from downloading the video to generating high-quality transcripts on your local machine.
✨ Why This Setup?
- yt-dlp : A reliable command-line tool to download videos from YouTube and other platforms.
- Whisper : An open-source speech-to-text system by OpenAI known for its accuracy and multilingual support.
- Local setup : Ensures speed, privacy, and flexibility.
📅 Step-by-Step Guide
Step 1: Prerequisites
Make sure Python 3.7+ is installed. You can check by running:
python3 --version
If it's not installed, download it from python.org.
Step 2: Install yt-dlp, FFmpeg, and Whisper
Install yt-dlp
pip3 install -U yt-dlp
Install FFmpeg
FFmpeg is required for processing audio/video:
macOS (with Homebrew):
brew install ffmpeg
Linux (Debian/Ubuntu):
sudo apt install ffmpeg
Windows : Download from https://ffmpeg.org/download.html and add it to your system PATH.
Install Whisper
pip3 install -U openai-whisper
Step 3: Add Python Scripts to Your PATH
If you see a warning like:
The scripts pip, pip3 and pip3.9 are installed in '/Users/yourname/Library/Python/3.9/bin' which is not on PATH.
Add this directory to your shell PATH:
macOS or Linux (zsh):
echo 'export PATH="$HOME/Library/Python/3.9/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc
Bash users:
echo 'export PATH="$HOME/Library/Python/3.9/bin:$PATH"' >> ~/.bash_profile
source ~/.bash_profile
Confirm installation:
which yt-dlp
which whisper
Step 4: Download the YouTube Video
Use yt-dlp to download the video in MP4 format:
yt-dlp -f 'bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]' "https://www.youtube.com/watch?v=VIDEO_ID"
Replace VIDEO_ID with the actual ID or full URL.
Step 5: Transcribe the Video with Whisper
Once downloaded (e.g., video.mp4), run Whisper:
whisper video.mp4 --model medium --language English
Whisper will create:
- video.txt: Plain transcript
- video.srt: Subtitle format
- video.vtt: Web video text format
If the audio is in another language and you want to translate:
whisper video.mp4 --task translate --model medium
🚀 You're Done!
You now have a high-quality transcription generated locally and privately using powerful open-source tools.
🌐 Useful Links