Run LLMs locally with llama.cpp – Beginner-Friendly Guide
In this guide, I show you exactly how to get started with llama.cpp for fast, free, and intelligent local inference using Qwen-3.6-35b. You'll learn how to install llama.cpp, download a quantized GGUF model, run a local server, and integrate it with tools like Cline.
Timestamps:
00:00 - Introduction
02:49 - API vs Local Comparison
04:27 - What is Localmaxxing?
07:24 - Installing llama.cpp
10:06 - Running the Server + Using in Cline
Follow me for more local AI tips:
https://x.com/edgedistiller
Download models: https://huggingface.co
llama.cpp repo: https://github.com/ggerganov/llama.cpp
Localmaxxing benchmarks: https://localmaxing.com
#llama.cpp #LocalLLM #RunLLMLocally #Qwen #AI #GGUF #LocalAI #OllamaAlternative