Head
Multi-Head Attention Collaborate Instead of ConcatenateFast Transformer Decoding: One Write-Head is All You Need
PreviousSparse MLP for Image Recognition: Is Self-Attention Really Necessary?NextMulti-Head Attention Collaborate Instead of Concatenate
Last updated