npx remotion transcribe

Transcribe an audio or video file to captions using Whisper.cpp. The command handles installation, model downloading, audio conversion, and transcription automatically.

npx remotion transcribe <input-file> [output-file]

The output file defaults to <input-basename>.json if not specified.

The output is a JSON file containing a Caption[] array from @remotion/captions.

Phases

The command runs through 4 phases, each showing progress:

Install whisper.cpp - Clones and builds whisper.cpp if not already present
Download model - Downloads the specified Whisper model if not already present
Convert audio - Converts the input file to a 16kHz WAV file using FFmpeg
Transcribe - Runs Whisper.cpp on the audio and outputs captions

Example

npx remotion transcribe video.mp4

This will create video.json with the transcription.

npx remotion transcribe video.mp4 captions.json --model=large-v3

This will transcribe using the large-v3 model and save to captions.json.

Flags

`--model`

The Whisper model to use. Default: medium.

Available models: tiny, tiny.en, base, base.en, small, small.en, medium, medium.en, large-v1, large-v2, large-v3, large-v3-turbo.

`--whisper-version`

The version of whisper.cpp to install. Default: 1.5.5.

`--whisper-path`

The path where whisper.cpp should be installed. Default: ./whisper.cpp.

`--model-folder`

Custom folder for storing Whisper models. Defaults to the models folder inside the whisper.cpp directory.

`--language`

Specify the language of the audio. If not set, Whisper will auto-detect the language.

`--translate-to-english`

Translate the transcription to English. Default: false.

`--token-level-timestamps`

Enable token-level timestamps using DTW. Default: true.

`--flash-attention`

Enable flash attention for faster transcription. Default: false.

`--log`

Set the log level. Use --log=verbose to see all compilation and transcription output line-by-line instead of progress bars.

`--quiet` / `-q`

Suppress all output.

Phases​

Example​

Flags​

--model​

--whisper-version​

--whisper-path​

--model-folder​

--language​

--translate-to-english​

--token-level-timestamps​

--flash-attention​

--log​

--quiet / -q​

See also​