npx remotion transcribe
Transcribe an audio or video file to captions using Whisper.cpp. The command handles installation, model downloading, audio conversion, and transcription automatically.
npx remotion transcribe <input-file> [output-file]The output file defaults to <input-basename>.json if not specified.
The output is a JSON file containing a Caption[] array from @remotion/captions.
Phases
The command runs through 4 phases, each showing progress:
- Install whisper.cpp - Clones and builds whisper.cpp if not already present
- Download model - Downloads the specified Whisper model if not already present
- Convert audio - Converts the input file to a 16kHz WAV file using FFmpeg
- Transcribe - Runs Whisper.cpp on the audio and outputs captions
Example
npx remotion transcribe video.mp4This will create video.json with the transcription.
npx remotion transcribe video.mp4 captions.json --model=large-v3This will transcribe using the large-v3 model and save to captions.json.
Flags
--model
The Whisper model to use. Default: medium.
Available models: tiny, tiny.en, base, base.en, small, small.en, medium, medium.en, large-v1, large-v2, large-v3, large-v3-turbo.
--whisper-version
The version of whisper.cpp to install. Default: 1.5.5.
--whisper-path
The path where whisper.cpp should be installed. Default: ./whisper.cpp.
--model-folder
Custom folder for storing Whisper models. Defaults to the models folder inside the whisper.cpp directory.
--language
Specify the language of the audio. If not set, Whisper will auto-detect the language.
--translate-to-english
Translate the transcription to English. Default: false.
--token-level-timestamps
Enable token-level timestamps using DTW. Default: true.
--flash-attention
Enable flash attention for faster transcription. Default: false.
--log
Set the log level. Use --log=verbose to see all compilation and transcription output line-by-line instead of progress bars.
--quiet / -q
Suppress all output.