Transformer-Evolution-Paper
Ctrlk
  • README
  • 数学符号
  • Act
  • Arch
  • FFN
  • Head
    • Multi-Head Attention Collaborate Instead of Concatenate
    • Fast Transformer Decoding: One Write-Head is All You Need
  • Memory
  • MHA
  • Normalize_And_Residual
  • Pe
  • Pretrain
  • Softmax
  • Others
  • LongConv
  • Rnn
  • CrossAttention
  • Inference
  • Peft
  • LLM
Powered by GitBook
On this page

Head

Multi-Head Attention Collaborate Instead of ConcatenateFast Transformer Decoding: One Write-Head is All You Need
PreviousSparse MLP for Image Recognition: Is Self-Attention Really Necessary?NextMulti-Head Attention Collaborate Instead of Concatenate

Last updated 3 years ago