Transformer Model: Masked SelfAttention - Implementation
In this tutorial, we'll discuss that how to update our self attention module to accommodate encoder and decoder attention masking. We use a dummy input and subsequent mask to test the self attention module.
In the next couple of videos, we'll learn two more basic but important components for the Transformer model - Dropouts and Layer Normalization. Stay tuned!
The code used in this tutorial is available at hithub.
Masked Self Attention - https://github.com/makeesyai/makeesy-deep-learning/blob/main/self_attention/multiheaded_attention_scaled.py
Test Masked SelfAttention - https://github.com/makeesyai/makeesy-deep-learning/blob/main/self_attention/attn_mask_test.py
#tutoruial #pytarch #selfattention #masking