Transformer-Evolution-Paper

Ctrlk

SparseOrLowRank

Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection Scatterbrain: Unifying Sparse and Low-rank Attention Approximation Sparse Factorization of Large Square Matrices Blockwise Self-Attention for Long Document Understanding H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences ChunkFormer: Learning Long Time Series with Multi-stage Chunked Transformer Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting Fast Transformers with Clustered Attention Long-Short Transformer: Efficient Transformers for Language and Vision LongT5: Efficient Text-To-Text Transformer for Long Sequences Luna: Linear Unified Nested Attention Memory-efficient Transformers via Top-k Attention Separable Self-attention for Mobile Vision Transformers Simple Local Attentions Remain Competitive for Long-Context Tasks You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

PreviousMomentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization NextExplicit Sparse Transformer: Concentrated Attention Through Explicit Selection

Last updated 3 years ago