Harris Notes

Home

❯

concepts

❯

GGUF 4 bit Quantisation

GGUF 4-bit Quantisation

Apr 14, 20261 min read

  • concept
  • general
  • technique

GGUF 4-bit Quantisation

A model compression format that reduces LLM memory footprint dramatically, allowing 7B parameter models to fit within approximately 6GB RAM, making them viable on 16GB Apple Silicon devices.

Related

local-llm-inference ram-capacity-constraints llama-cpp phi-3-mini-instruct gemma-2-7b-instruct mistral-7b-instruct


Graph View

  • GGUF 4-bit Quantisation
  • Related

Backlinks

  • Local LLM Inference
  • RAM Capacity Constraints
  • Retrieval-Augmented Generation
  • DeepSeek-Coder 6.7B
  • Gemma-2-7B-Instruct
  • Llama 2 7B-Chat
  • llama.cpp
  • Mistral-7B-Instruct
  • Phi-3-mini-instruct
  • 20260414-1234-successes-and-risks-running-llm--locally-on-mac-mi
  • index

Created with Quartz v4.5.2 © 2026

  • GitHub
  • Discord Community