Transformer-Evolution-Paper
Ctrlk
  • README
  • 数学符号
  • Act
  • Arch
    • Supplementary Material Implementation and Experiments for GAU-based Model
    • MetaFormer is Actually What You Need for Vision
    • Deeper vs Wider A Revisit of Transformer Configuration
    • Perceiver General Perception with Iterative Attention
    • General-purpose, long-context autoregressive modeling with Perceiver AR
    • Hierarchical Transformers Are More Efficient Language Models
    • Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
    • Generalization through Memorization: Nearest Neighbor Language Models
  • FFN
  • Head
  • Memory
  • MHA
  • Normalize_And_Residual
  • Pe
  • Pretrain
  • Softmax
  • Others
  • LongConv
  • Rnn
  • CrossAttention
  • Inference
  • Peft
  • LLM
Powered by GitBook
On this page

Arch

Supplementary Material Implementation and Experiments for GAU-based ModelMetaFormer is Actually What You Need for VisionDeeper vs Wider A Revisit of Transformer ConfigurationPerceiver General Perception with Iterative AttentionGeneral-purpose, long-context autoregressive modeling with Perceiver ARHierarchical Transformers Are More Efficient Language ModelsBranchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and UnderstandingGeneralization through Memorization: Nearest Neighbor Language Models
PreviousA survey on recently proposed activation functions for Deep LearningNextSupplementary Material Implementation and Experiments for GAU-based Model

Last updated 2 years ago