Harris Notes

Home

❯

concepts

❯

Local LLM Inference

Apr 14, 20261 min read

concept
general
process

Local LLM Inference

Running large language models entirely on local hardware without sending data to external APIs, prioritising data privacy and eliminating per-token costs.

Containerised AI Stack Deployment
GGUF 4-bit Quantisation
Metal GPU Acceleration
RAM Capacity Constraints
Retrieval-Augmented Generation
Unified Memory Architecture
Bill WANG
DeepSeek-Coder 6.7B
Gemma-2-7B-Instruct
Hacker News
LangChain
Llama 2 7B-Chat
llama.cpp
LlamaIndex
Mac Mini M2
Mac Mini M4 Pro
Mac Studio
Meta
Mistral-7B-Instruct
Ollama
OpenClaw
OpenWebUI
Phi-3-mini-instruct
Simon Willison
20260414-1234-successes-and-risks-running-llm--locally-on-mac-mi
index

GitHub
Discord Community

Harris Notes

Explorer

Local LLM Inference

Local LLM Inference

Graph View

Table of Contents

Backlinks

Harris Notes

Explorer

Local LLM Inference

Local LLM Inference

Related

Graph View

Table of Contents

Backlinks