Question 1

What is OpenAI Whisper and can I self-host it?

Accepted Answer

OpenAI Whisper is an open-source automatic speech recognition (ASR) model trained on 680,000 hours of multilingual audio. Yes, you can fully self-host it using faster-whisper (a CTranslate2-optimized port) or whisper.cpp (a C++ CPU-first implementation) — no cloud API key or internet connection required after the model is downloaded.

Question 2

What is the difference between faster-whisper and whisper.cpp?

Accepted Answer

faster-whisper is a Python-based reimplementation using CTranslate2 that delivers 4x faster GPU inference with lower VRAM usage compared to the original OpenAI implementation. whisper.cpp is a pure C/C++ port optimized for CPU inference on x86, ARM, and Apple Silicon — ideal for servers without a GPU. Both produce identical transcription quality.

Question 3

Which languages does Whisper support?

Accepted Answer

Whisper supports 99 languages including Arabic, English, French, Spanish, German, Chinese, Japanese, Russian, and many more. It can also perform language detection automatically, and translate from any supported language directly into English in a single pass.

Question 4

Do I need a GPU to run self-hosted Whisper?

Accepted Answer

No. whisper.cpp runs efficiently on CPU-only servers and is fast enough for real-time or near-real-time transcription on modern multi-core machines. faster-whisper also supports CPU mode. A GPU dramatically speeds up transcription — the large-v3 model transcribes a one-hour audio file in about 2 minutes on an RTX 3080, versus 10–20 minutes on CPU.

Question 5

What Whisper model sizes are available and which should I choose?

Accepted Answer

Whisper comes in five sizes: tiny (39M params, fastest), base (74M), small (244M), medium (769M), and large-v3 (1.5B, most accurate). For most self-hosted use cases, small or medium offers the best accuracy-to-speed trade-off. Use large-v3 only if accuracy is critical and you have a GPU with at least 8 GB VRAM.

Question 6

How do I integrate self-hosted Whisper with n8n or other automation tools?

Accepted Answer

The faster-whisper Docker image exposes an OpenAI-compatible REST API on port 9000. You can call it from n8n using the HTTP Request node, pointing to http://your-server:9000/v1/audio/transcriptions with a multipart/form-data POST — the same endpoint format as the OpenAI API. This means any tool that supports OpenAI's speech API works out of the box.

Question 7

Can I use self-hosted Whisper to generate subtitles (SRT/VTT)?

Accepted Answer

Yes. Whisper natively outputs timestamped transcriptions. When calling the API with response_format=srt or response_format=vtt, you receive subtitle files ready for video players and editors. Tools like SubEdit, Aegisub, and FFmpeg subtitle workflows integrate seamlessly with the Whisper API endpoint.

Question 8

How accurate is self-hosted Whisper compared to cloud speech APIs?

Accepted Answer

Whisper large-v3 matches or exceeds commercial cloud APIs (Google Speech-to-Text, AWS Transcribe, Azure Speech) on most benchmarks, with a word error rate (WER) below 5% on clean audio in major languages. Accuracy drops in noisy environments or with heavy accents, but for standard recordings — meetings, podcasts, interviews — it is production-ready.

🎙️ Setup Whisper — Self-Hosted Speech-to-Text

📦 Resources & Setup Scripts

Quick Install:

Tutorial Steps

1 Download the Script

2 Make it Executable

3 Run the Installer

4 Use the API

Ports Used

Overview

Why Use It

When You Need It

Who Should Use It

Real Use Cases

Main Features

How to Use After Installation

Security Best Practices

Ports and Firewall Notes

Backup and Maintenance

Common Mistakes

Troubleshooting

Alternatives

When Not to Use It

PrismaTechWork Professional Help

Frequently Asked Questions

Which Whisper model size should I use?

Does Whisper require a GPU?

What audio formats does Whisper support?

Can Whisper detect the language automatically?

Does Whisper produce timestamps?

Can I use Whisper to generate subtitles for videos?

Is the self-hosted Whisper API compatible with OpenAI's API?

How do I transcribe a file using the API?