Local LLM setup: how to use RAG and an embedding model to stop wasting context