This steps through the token generation of a GPT-2 model. It shows multinomial sampling (vs. argmax) with top-K filtering (showing the top 40 tokens and their probability). It illustrates the inputs and outputs of the model to get a better understanding of how it works.