Back to Browse

Run LLM Models Locally with Microsoft Foundry Local - A Full Tutorial

2.0K views
Dec 5, 2025
28:20

Unlock the power of running Large Language Models locally on your own machine using Microsoft Foundry Local. In this step-by-step tutorial, I walk you through installing Foundry Local, running an open-source model, exposing it as a local REST service, and integrating it directly into Python including a real document-summarization example using LangChain. Whether you're looking to keep data private, eliminate inference costs, or run AI tools offline, Foundry Local makes it incredibly easy. What you’ll learn: What Foundry Local is and why you'd use it Installing the Foundry Local CLI Running a model locally (Qwen 2.5 / 0.5B size) Understanding GPU/CPU/Metal/WebGPU variant selection Exposing your model as a local RESTful endpoint Calling the local model from Python (OpenAI SDK) Using LangChain to summarize a full 100-page PDF locally Practical tips for real company use cases (compliance, cost-savings, offline inference) 🔗 Code Repo: https://github.com/kirkmcpherson/foundry-local-demo Chapters: 00:00 – Intro: Why Run LLMs Locally? 00:20 – What Is Foundry Local? 00:56 – Installing Foundry Local 01:48 – Why Local LLMs Matter (Privacy, Offline, Cost) 02:51 – Running Your First Local Model 04:10 – GPU/CPU Variant Detection Explained 05:38 – Exploring Cache, Models & Services 07:36 – Exposing a Local REST Endpoint 09:03 – Using Foundry Local from Python 15:12 – Summarizing a 100-Page PDF with LangChain #FoundryLocal #LocalLLM #MicrosoftAI #RunLLMLocally #Qwen #LangChain #AIOnDevice #PrivateAI #EdgeAI #LocalInference #AIEngineering #PythonAI #AIDevelopment #WebGPU #Ollama #MacBookMSeries #OpenSourceAI #LLMTutorial

Download

0 formats

No download links available.

Run LLM Models Locally with Microsoft Foundry Local - A Full Tutorial | NatokHD