FFN
Large Memory Layers with Product KeysTransformer Feed-Forward Layers Are Key-Value MemoriesGLU Variants Improve TransformerSimple Recurrence Improves Masked Language ModelsPay Attention to MLPsS2-MLP Spatial-Shift MLP Architecture for VisionS2-MLPv2 Improved Spatial-Shift MLP Architecture for VisionHyperMixer An MLP-based Green AI Alternative to TransformersDeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling & DeLighT: Deep and Light-weight TransformerWhen Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention MechanismSparse MLP for Image Recognition: Is Self-Attention Really Necessary?
PreviousGeneralization through Memorization: Nearest Neighbor Language ModelsNextLarge Memory Layers with Product Keys
Last updated