This video shows how the Transformer Encoder Layer Fully Connected Layer works. This is the layer immediately after the Encoder's first Normalization Layer
0:00 Recap
0:46 Initial Random weights and biases
1:35 Fully Connected Layer 1 and Relu
3:40 Fully Connected Layer 2
5:12 Fully Connected Layers Output + Input
torch version - 1.10.0