Harris Notes

Home

❯

entities

❯

llama.cpp

llama.cpp

Apr 14, 20261 min read

  • entity
  • tool
  • general

llama.cpp

An open-source C++ inference engine for LLMs that supports GGUF quantised models and Apple’s Metal backend, enabling hardware-accelerated local inference on Apple Silicon Macs.

Details

  • Services: LLM inference, GGUF model support, Metal acceleration

Related

local-llm-inference gguf-4-bit-quantisation metal-gpu-acceleration openwebui ollama mac-mini-m2


Graph View

  • llama.cpp
  • Details
  • Related

Backlinks

  • GGUF 4-bit Quantisation
  • Local LLM Inference
  • Metal GPU Acceleration
  • Gemma-2-7B-Instruct
  • Llama 2 7B-Chat
  • Mac Mini M2
  • Meta
  • Mistral-7B-Instruct
  • Ollama
  • OpenWebUI
  • Phi-3-mini-instruct
  • 20260414-1234-successes-and-risks-running-llm--locally-on-mac-mi
  • index

Created with Quartz v4.5.2 © 2026

  • GitHub
  • Discord Community