LLM Post-Processing
Using a small general-purpose LLM (e.g., Llama 3B) after transcription to clean up filler words, correct formatting, and improve readability. Adds latency (~300-800ms) but improves output quality.
Related
local-offline-transcription whisper-model-variants llama-3b mlx-framework speak2 ollama