llama.cpp
An open-source C++ inference engine for LLMs that supports GGUF quantised models and Apple’s Metal backend, enabling hardware-accelerated local inference on Apple Silicon Macs.
Details
- Services: LLM inference, GGUF model support, Metal acceleration
Related
local-llm-inference gguf-4-bit-quantisation metal-gpu-acceleration openwebui ollama mac-mini-m2