local LLM for audio transcription on Mac Mini M2 16GB host. Include Whisper Vs more general purpose capable LLM options

Answer

For local audio transcription on a Mac Mini M2 16GB, Whisper-based solutions (whisper.cpp, WhisperKit) are the clear best-fit: purpose-built for ASR, CoreML/Neural Engine accelerated, and highly memory-efficient. General-purpose LLMs (Llama 3B via MLX, Ollama models) can supplement transcription with post-processing cleanup but are not competitive as primary transcription engines. The recommended stack is whisper.cpp with the large-v3-turbo Q5_0 model (~954MB) for transcription, optionally paired with a small LLM (1–3B) via MLX or Ollama for cleanup tasks.

Key Findings

whisper.cpp (github.com/ggml-org/whisper.cpp): Free, open-source, CoreML + Neural Engine accelerated. Recommended model: large-v3-turbo at ~954MB (fast + accurate, 100+ languages) or large-v3 at ~3GB (max accuracy). Q5_0 quantization reduces model size ~65% with minimal accuracy loss. Achieves near-real-time transcription on M2 hardware.
Ready-made Mac apps using Whisper: (1) Local Whisper (github.com/y-dai20/local-whisper) — free, open-source, captures mic + system audio for meetings; (2) Speak2 (github.com/zachswift615/speak2) — free, open-source, push-to-talk dictation using WhisperKit or Parakeet v3 (~600MB, 25 languages), supports Ollama for LLM cleanup; (3) Transcribe Master (App Store, free, by Dawei Bi) — polished GUI app, Whisper-powered, supports Mandarin/Cantonese/Japanese/English; (4) getonit.ai Dictate — free app using Parakeet 0.6B + Llama 3B via MLX, <500ms latency without LLM cleanup, ~800ms with it.
General-purpose LLM role is supplementary, not primary: Small LLMs (Llama 3B, 1B via MLX or Ollama) are used for post-transcription cleanup — removing filler words, formatting numbers/emails/currency — not for core ASR. On 16GB M2, a 3B model via MLX runs comfortably alongside a Whisper model. Larger models (7B–13B) would compete for unified memory and slow the pipeline. Parakeet v3 (NVIDIA/FluidAudio) is a competitive Whisper alternative for multilingual use at ~600MB.
Memory fit on 16GB M2: large-v3-turbo Q5_0 (~500MB active) + Llama 3B MLX (~2GB) leaves ample headroom. Full large-v3 fp16 (~6GB) is feasible but leaves less room. Avoid running a 7B+ LLM simultaneously with large Whisper models on 16GB.

Open Questions

How does Parakeet v3 (FluidAudio) accuracy compare to Whisper large-v3-turbo on real-world audio with accents or background noise on M2 hardware specifically?
For batch/offline file transcription workflows (vs. real-time dictation), are there productivity gains from using a pipeline tool like whisper.cpp CLI + a local summarization LLM via Ollama versus an integrated app like Transcribe Master?
Does running Ollama with a 7B model (e.g., Mistral 7B Q4) for higher-quality post-processing alongside Whisper large-v3-turbo cause memory pressure or swapping issues on 16GB M2 in practice?

Entities

whisper-cpp openai-whisper local-whisper whisperkit speak2 transcribe-master dawei-bi parakeet mlx-framework llama-3b ollama apple google-speech-to-text amazon-transcribe deepgram gladia getonit-ai

Concepts

local-offline-transcription apple-silicon-acceleration whisper-model-variants model-quantization llm-post-processing system-audio-capture cloud-vs-local-asr-trade-offs

Harris Notes

Explorer

20260502-0656-local-llm-for-audio-transcription-on-mac-mini-m2-1

local LLM for audio transcription on Mac Mini M2 16GB host. Include Whisper Vs more general purpose capable LLM options

Answer

Key Findings

Open Questions

Entities

Concepts

Sources

Graph View

Table of Contents

Backlinks