When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism
PreviousDeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling & DeLighT: Deep and Light-weight TransformerNextSparse MLP for Image Recognition: Is Self-Attention Really Necessary?
Last updated