Model Quantization
Reducing model precision (e.g., Q5_0 5-bit integer format) to shrink memory footprint by ~65% with minimal accuracy loss, enabling larger Whisper models to run comfortably on 16GB unified memory.
Related
whisper-model-variants local-offline-transcription whisper-cpp