L11.4.3-4: Transformer Architecture: Implementing and adding positional encoding
This video is the second part of my discussion of the Transformer architecture and neural attention. In this video I discuss a custom subclass layer implementation of the basic Transformer encoder architecture. The Transformer architecture uses a Keras MultiHeadAtention layer to implement neural attention. It includes several Dense layers to factor outputs into multiple independent spaces. Also concepts like layer normalization and residual connections are part of the basic architecture to ensure gradient do not vanish during training. In this video I discuss this custom layer, and look at the corresponding current implementation in Keras Hub of the TransformerEncoder. I also look at the final piece of a typical transformer model, adding in word position awareness using a PositionalEncoder, which we also implement by hand as a custom subclass in this video. Resources: Textbook: Chollet (2022). "Deep Learning with Python (2ed)". Manning. https://www.amazon.com/dp/1617296864/?bestFormat=true&k=deep%20learning%20with%20python&ref_=nb_sb_ss_w_scx-ent-pd-bk-d_de_k0_1_15 CSci 560 Class Repository: https://github.com/csci560-nndl/nndl Contains video slides and iPython notebooks for this course. 00:00 Introduction 01:16 Transformer encoder architecture elements 03:35 Implement a TransformerEncoder as a custom Keras subclass 13:45 Positional encoding to re-inject order information 18:18 Implement a PositionalEmbedding as a custom Keras subclass 22:31 When to use sequence models over bag-of-word models 24:11 Summary
Download
0 formatsNo download links available.