SparseOrLowRank
Explicit Sparse Transformer: Concentrated Attention Through Explicit SelectionScatterbrain: Unifying Sparse and Low-rank Attention ApproximationSparse Factorization of Large Square MatricesBlockwise Self-Attention for Long Document UnderstandingH-Transformer-1D: Fast One-Dimensional Hierarchical Attention for SequencesChunkFormer: Learning Long Time Series with Multi-stage Chunked TransformerEnhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series ForecastingFast Transformers with Clustered AttentionLong-Short Transformer: Efficient Transformers for Language and VisionLongT5: Efficient Text-To-Text Transformer for Long SequencesLuna: Linear Unified Nested AttentionMemory-efficient Transformers via Top-k AttentionSeparable Self-attention for Mobile Vision TransformersSimple Local Attentions Remain Competitive for Long-Context TasksYou Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling
PreviousMomentum Transformer: Closing the Performance Gap Between Self-attention and Its LinearizationNextExplicit Sparse Transformer: Concentrated Attention Through Explicit Selection
Last updated