GGUF 4-bit Quantisation
A model compression format that reduces LLM memory footprint dramatically, allowing 7B parameter models to fit within approximately 6GB RAM, making them viable on 16GB Apple Silicon devices.
Related
local-llm-inference ram-capacity-constraints llama-cpp phi-3-mini-instruct gemma-2-7b-instruct mistral-7b-instruct