Attentions Implement the Attention in Neural Network Todos Multi Headed Attention (https://arxiv.org/abs/1706.03762) Flash Attention (https://arxiv.org/abs/2205.14135) References: https://nn.labml.ai/transformers/mha.html https://nlp.seas.harvard.edu/2018/04/03/attention.html#attention https://lena-voita.github.io/nlp_course/seq2seq_and_attention.html#attention_intro Attention Is All You Need (paper) The Illustrated Transformer