18:14An image is worth NxN words Diffusion Transformers (ViT, DiT, MMDiT)Julia Turc23.5K views·2 months ago
24:02Llama 4 Explained Architecture, Long Context, and Native MultimodalityJulia Turc17.1K views·1 year ago
23:16DeepSeek's GRPO (Group Relative Policy Optimization) Reinforcement Learning for LLMsJulia Turc45.0K views·1 year ago
22:03Proximal Policy Optimization (PPO) for LLMs Explained IntuitivelyJulia Turc54.5K views·1 year ago