Self-Attention
Single Headed
![A description of my image.](/_astro/single_head.BF_3heu4_ZpyLO2.webp)
- Ignore the Softmax operation and normalize by dividing by the square root of
d_model
, because these operations do not affect the dimensions of the matrices involved.
Multiheaded
![A description of my image.](/_astro/multi_head.BbDD1pcl_1o52uE.webp)
- Ignore the Softmax operation and normalize by dividing by the square root of
d_model
, because these operations do not affect the dimensions of the matrices involved.