It turns out that you can drop a lot of weights from your neural network without seeing a drastic effect on accuracy. In this video we'll highlight an experiment that we ran on our transformer blocks that demonstrate this.
If you'd like to read the detailed blogpost on this topic, you can do so here:
https://blog.rasa.com/why-rasa-uses-sparse-layers-in-transformers/