Metal GPU Acceleration
Apple’s Metal framework allows llama.cpp to leverage the integrated GPU cores on Apple Silicon chips, significantly improving LLM inference tokens-per-second on Mac hardware.
Related
local-llm-inference unified-memory-architecture llama-cpp mac-mini-m2 apple