Back to Browse

Advanced RAG Techniques with Arcee Trinity Mini (100% Local)

19.4K views
Jan 9, 2026
25:36

In this video, we build a fully local RAG chatbot that runs entirely on a MacBook - no cloud APIs, no usage costs, complete privacy. ⭐️⭐️⭐️ More content on Substack at https://julsimon.substack.com ⭐️⭐️⭐️ We use Arcee's Trinity Mini, a 26-billion-parameter mixture-of-experts model trained for real-world enterprise tasks, including RAG, function calling, and tool use. Running in Q8 quantization through llama.cpp with Metal acceleration, it's surprisingly capable on Apple Silicon. This builds on a previous video where we used Arcee Conductor for cloud-based inference. Same stack - LangChain for orchestration, ChromaDB for vector storage, Gradio for the UI - but now the model runs locally. We also explore advanced retrieval techniques: - MMR (Maximal Marginal Relevance) for diverse results - Hybrid search combining vector similarity and BM25 keyword matching - Query rewriting to clean up messy questions before retrieval - Cross-encoder re-ranking for precision after recall All running on a Mac. No internet required. Resources - https://www.arcee.ai/blog/the-trinity-manifesto - https://huggingface.co/arcee-ai/Trinity-Mini-GGUF - https://github.com/juliensimon/local-rag-chatbot/ #ArceeAI #TrinityMini #RAG #LocalLLM #llamacpp #ChromaDB #LangChain #HybridSearch #Reranking #AppleSilicon #EnterpriseAI #AITutorial #GenerativeAI #python

Download

1 formats

Video Formats

360pmp441.0 MB

Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.

Advanced RAG Techniques with Arcee Trinity Mini (100% Local) | NatokHD