This video shows how the Transformer Encoder Layer Self Attention works. This is the layer immediately after the Embedding and the Positional Encoding Layer.
0:00 Positional Encoding
0:46 Key, Query and Value Calculation
8:29 Scaling
8:47 Attention Weights
14:52 Self Attention Output + Self Attention Input
torch version - 1.10.0