Back to Browse

week 04 - Hands-On Large Language Models deck

104 views
Feb 6, 2026
59:56

In this lecture (MIS 769 – Big Data Analytics for Business, Week 04), we get hands-on with Large Language Models (LLMs) and the core NLP concepts that make them work. We start with a practical overview of AI vs ML vs Deep Learning vs Generative AI, and why LLMs (ChatGPT / Llama / Claude, etc.) are powerful pattern learners for language tasks—generation, summarization, and classification. Then we break down the language-model pipeline: • Key NLP terms: corpus, vocabulary, tokenization, embeddings • Classic text representations: bag-of-words vs dense vector embeddings • word2vec and the intuition that “you know a word by the company it keeps” • Why contextual representations matter (and how this bridges into modern transformer models) • Transformers: self-attention, parallelization, and why “Attention Is All You Need” changed everything • Model families: BERT (encoder-only) vs GPT (decoder-only) and how training differs (masked LM vs autoregressive generation) • Training workflow: pretraining → fine-tuning • Tokenization in practice: BPE / SentencePiece, and why tokenizer choices affect speed, cost, and downstream behavior We also cover practical and responsible usage topics: • Ways to connect to LLMs (web UI, APIs, command line, mobile) • Open vs closed source tradeoffs (flexibility, cost, security, customization) • Tools like Ollama (local open models) and OpenRouter (routing across open/closed models) • Risks: bias, privacy, transparency, misuse, and IP considerations 🧠 Topics covered: LLM fundamentals, embeddings, attention/transformers, tokenization, open-source tooling, and responsible AI.

Download

0 formats

No download links available.

week 04 - Hands-On Large Language Models deck | NatokHD