Transformer-Evolution-Paper
search
⌘Ctrlk
Transformer-Evolution-Paper
  • README
  • 数学符号
  • Act
  • Arch
  • FFN
  • Head
  • Memory
  • MHA
    • FFT
    • LocalGlobal
    • MatrixMethod
    • RightProduct
    • SparseOrLowRank
    • Others
      • Synthesizer: Rethinking Self-Attention in Transformer Models
      • Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kern
      • Combiner Full Attention Transformer with Sparse Computation Cost
      • Ripple Attention for Visual Perception with Sub-quadratic Complexity
      • Sinkformers: Transformers with Doubly Stochastic Attention
      • SOFT: Softmax-free Transformer with Linear Complexity
      • Value-aware Approximate Attention
      • EL-Attention: Memory Efficient Lossless Attention for Generation
      • Flowformer: Linearizing Transformers with Conservation Flows
      • ETSformer: Exponential Smoothing Transformers for Time-series Forecasting
      • IGLOO: Slicing the Features Space to Represent Sequences
      • Swin Transformer V2: Scaling Up Capacity and Resolution
      • Skip-Attention: Improving Vision Transformers by Paying Less Attention
  • Normalize_And_Residual
  • Pe
  • Pretrain
  • Softmax
  • Others
  • LongConv
  • Rnn
  • CrossAttention
  • Inference
  • Peft
  • LLM
gitbookPowered by GitBook
block-quoteOn this pagechevron-down
  1. MHA

Others

Synthesizer: Rethinking Self-Attention in Transformer Modelschevron-rightTransformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernchevron-rightCombiner Full Attention Transformer with Sparse Computation Costchevron-rightRipple Attention for Visual Perception with Sub-quadratic Complexitychevron-rightSinkformers: Transformers with Doubly Stochastic Attentionchevron-rightSOFT: Softmax-free Transformer with Linear Complexitychevron-rightValue-aware Approximate Attentionchevron-rightEL-Attention: Memory Efficient Lossless Attention for Generationchevron-rightFlowformer: Linearizing Transformers with Conservation Flowschevron-rightETSformer: Exponential Smoothing Transformers for Time-series Forecastingchevron-rightIGLOO: Slicing the Features Space to Represent Sequenceschevron-rightSwin Transformer V2: Scaling Up Capacity and Resolutionchevron-rightSkip-Attention: Improving Vision Transformers by Paying Less Attentionchevron-right
PreviousYou Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Samplingchevron-leftNextSynthesizer: Rethinking Self-Attention in Transformer Modelschevron-right

Last updated 2 years ago