Transformer-Evolution-Paper

Ctrlk

Others

Synthesizer: Rethinking Self-Attention in Transformer Models Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kern Combiner Full Attention Transformer with Sparse Computation Cost Ripple Attention for Visual Perception with Sub-quadratic Complexity Sinkformers: Transformers with Doubly Stochastic Attention SOFT: Softmax-free Transformer with Linear Complexity Value-aware Approximate Attention EL-Attention: Memory Efficient Lossless Attention for Generation Flowformer: Linearizing Transformers with Conservation Flows ETSformer: Exponential Smoothing Transformers for Time-series Forecasting IGLOO: Slicing the Features Space to Represent Sequences Swin Transformer V2: Scaling Up Capacity and Resolution Skip-Attention: Improving Vision Transformers by Paying Less Attention

PreviousYou Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling NextSynthesizer: Rethinking Self-Attention in Transformer Models

Last updated 2 years ago