How to Download a YouTube Video and Transcribe it with Whisper AI

Artificial Intelligence (AI) Posted on May 26, 2025
If you're looking for a fast and accurate way to transcribe a YouTube video, combining yt-dlp with OpenAI's Whisper is a powerful solution. This guide walks you through the entire process — from downloading the video to generating high-quality transcripts on your local machine.

You can use yt-dlp along with OpenAI's Whisper to quickly transcribe any YouTube video locally on your machine.

✨ Why This Setup?


  • yt-dlp: A reliable command-line tool to download videos from YouTube and other platforms.
  • Whisper: An open-source speech-to-text system by OpenAI known for its accuracy and multilingual support.
  • Local setup: Ensures speed, privacy, and flexibility.

📅 Step-by-Step Guide


Step 1: Prerequisites

Make sure Python 3.7+ is installed. You can check by running:

python3 --version

If it's not installed, download it from python.org.

Step 2: Install yt-dlp, FFmpeg, and Whisper

Install yt-dlp

pip3 install -U yt-dlp

Install FFmpeg
FFmpeg is required for processing audio/video:

  • macOS (with Homebrew):

brew install ffmpeg

  • Linux (Debian/Ubuntu):

sudo apt install ffmpeg


Install Whisper

pip3 install -U openai-whisper

Step 3: Add Python Scripts to Your PATH

If you see a warning like:

The scripts pip, pip3 and pip3.9 are installed in '/Users/yourname/Library/Python/3.9/bin' which is not on PATH.

Add this directory to your shell PATH:

macOS or Linux (zsh):

echo 'export PATH="$HOME/Library/Python/3.9/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc

Bash users:

echo 'export PATH="$HOME/Library/Python/3.9/bin:$PATH"' >> ~/.bash_profile
source ~/.bash_profile

Confirm installation:

which yt-dlp
which whisper

Step 4: Download the YouTube Video

Use yt-dlp to download the video in MP4 format:

yt-dlp -f 'bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]' "https://www.youtube.com/watch?v=VIDEO_ID"

Replace VIDEO_ID with the actual ID or full URL.

Step 5: Transcribe the Video with Whisper

Once downloaded (e.g., video.mp4), run Whisper:

whisper video.mp4 --model medium --language English

Whisper will create:
  • video.txt: Plain transcript
  • video.srt: Subtitle format
  • video.vtt: Web video text format

If the audio is in another language and you want to translate:

whisper video.mp4 --task translate --model medium

🚀 You're Done!

You now have a high-quality transcription generated locally and privately using powerful open-source tools.

🌐 Useful Links


Leave a comment:

Comments (0)