Transformer-Evolution-Paper
search
⌘Ctrlk
Transformer-Evolution-Paper
  • README
  • 数学符号
  • Act
  • Arch
  • FFN
  • Head
  • Memory
  • MHA
    • FFT
    • LocalGlobal
    • MatrixMethod
    • RightProduct
    • SparseOrLowRank
      • Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
      • Scatterbrain: Unifying Sparse and Low-rank Attention Approximation
      • Sparse Factorization of Large Square Matrices
      • Blockwise Self-Attention for Long Document Understanding
      • H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences
      • ChunkFormer: Learning Long Time Series with Multi-stage Chunked Transformer
      • Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting
      • Fast Transformers with Clustered Attention
      • Long-Short Transformer: Efficient Transformers for Language and Vision
      • LongT5: Efficient Text-To-Text Transformer for Long Sequences
      • Luna: Linear Unified Nested Attention
      • Memory-efficient Transformers via Top-k Attention
      • Separable Self-attention for Mobile Vision Transformers
      • Simple Local Attentions Remain Competitive for Long-Context Tasks
      • You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling
    • Others
  • Normalize_And_Residual
  • Pe
  • Pretrain
  • Softmax
  • Others
  • LongConv
  • Rnn
  • CrossAttention
  • Inference
  • Peft
  • LLM
gitbookPowered by GitBook
block-quoteOn this pagechevron-down
  1. MHA

SparseOrLowRank

Explicit Sparse Transformer: Concentrated Attention Through Explicit Selectionchevron-rightScatterbrain: Unifying Sparse and Low-rank Attention Approximationchevron-rightSparse Factorization of Large Square Matriceschevron-rightBlockwise Self-Attention for Long Document Understandingchevron-rightH-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequenceschevron-rightChunkFormer: Learning Long Time Series with Multi-stage Chunked Transformerchevron-rightEnhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecastingchevron-rightFast Transformers with Clustered Attentionchevron-rightLong-Short Transformer: Efficient Transformers for Language and Visionchevron-rightLongT5: Efficient Text-To-Text Transformer for Long Sequenceschevron-rightLuna: Linear Unified Nested Attentionchevron-rightMemory-efficient Transformers via Top-k Attentionchevron-rightSeparable Self-attention for Mobile Vision Transformerschevron-rightSimple Local Attentions Remain Competitive for Long-Context Taskschevron-rightYou Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Samplingchevron-right
PreviousMomentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearizationchevron-leftNextExplicit Sparse Transformer: Concentrated Attention Through Explicit Selectionchevron-right

Last updated 3 years ago