Transformer-Evolution-Paper
Ctrlk
  • README
  • 数学符号
  • Act
  • Arch
  • FFN
    • Large Memory Layers with Product Keys
    • Transformer Feed-Forward Layers Are Key-Value Memories
    • GLU Variants Improve Transformer
    • Simple Recurrence Improves Masked Language Models
    • Pay Attention to MLPs
    • S2-MLP Spatial-Shift MLP Architecture for Vision
    • S2-MLPv2 Improved Spatial-Shift MLP Architecture for Vision
    • HyperMixer An MLP-based Green AI Alternative to Transformers
    • DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling & DeLighT: Deep and Light-weight Transformer
    • When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism
    • Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?
  • Head
  • Memory
  • MHA
  • Normalize_And_Residual
  • Pe
  • Pretrain
  • Softmax
  • Others
  • LongConv
  • Rnn
  • CrossAttention
  • Inference
  • Peft
  • LLM
Powered by GitBook
On this page

FFN

Large Memory Layers with Product KeysTransformer Feed-Forward Layers Are Key-Value MemoriesGLU Variants Improve TransformerSimple Recurrence Improves Masked Language ModelsPay Attention to MLPsS2-MLP Spatial-Shift MLP Architecture for VisionS2-MLPv2 Improved Spatial-Shift MLP Architecture for VisionHyperMixer An MLP-based Green AI Alternative to TransformersDeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling & DeLighT: Deep and Light-weight TransformerWhen Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention MechanismSparse MLP for Image Recognition: Is Self-Attention Really Necessary?
PreviousGeneralization through Memorization: Nearest Neighbor Language ModelsNextLarge Memory Layers with Product Keys

Last updated 2 years ago