Transformer-Evolution-Paper
search
⌘Ctrlk
Transformer-Evolution-Paper
  • README
  • 数学符号
  • Act
  • Arch
    • Supplementary Material Implementation and Experiments for GAU-based Model
    • MetaFormer is Actually What You Need for Vision
    • Deeper vs Wider A Revisit of Transformer Configuration
    • Perceiver General Perception with Iterative Attention
    • General-purpose, long-context autoregressive modeling with Perceiver AR
    • Hierarchical Transformers Are More Efficient Language Models
    • Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
    • Generalization through Memorization: Nearest Neighbor Language Models
  • FFN
  • Head
  • Memory
  • MHA
  • Normalize_And_Residual
  • Pe
  • Pretrain
  • Softmax
  • Others
  • LongConv
  • Rnn
  • CrossAttention
  • Inference
  • Peft
  • LLM
gitbookPowered by GitBook
block-quoteOn this pagechevron-down

Arch

Supplementary Material Implementation and Experiments for GAU-based Modelchevron-rightMetaFormer is Actually What You Need for Visionchevron-rightDeeper vs Wider A Revisit of Transformer Configurationchevron-rightPerceiver General Perception with Iterative Attentionchevron-rightGeneral-purpose, long-context autoregressive modeling with Perceiver ARchevron-rightHierarchical Transformers Are More Efficient Language Modelschevron-rightBranchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understandingchevron-rightGeneralization through Memorization: Nearest Neighbor Language Modelschevron-right
PreviousA survey on recently proposed activation functions for Deep Learningchevron-leftNextSupplementary Material Implementation and Experiments for GAU-based Modelchevron-right

Last updated 2 years ago