Retrieval-Augmented Generation
A technique combining a vector database of embedded documents with a local LLM to answer queries grounded in private document stores, requiring an embedding model and vector database alongside the main language model.
Related
local-llm-inference gguf-4-bit-quantisation langchain llamaindex qdrant chroma openwebui