# MHA

- [FFT](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/fft.md)
- [Fourier Neural Operator for Parametric Partial Differential Equations](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/fft/001.md)
- [Global Filter Networks for Image Classification](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/fft/002.md)
- [Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/fft/003.md)
- [FNet: Mixing Tokens with Fourier Transforms](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/fft/004.md)
- [LocalGlobal](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/localglobal.md)
- [CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/localglobal/001.md)
- [Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/localglobal/002.md)
- [Neighborhood Attention Transformer](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/localglobal/003.md)
- [FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/localglobal/004.md)
- [Adaptive Attention Span in Transformers](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/localglobal/005.md)
- [CoLT5: Faster Long-Range Transformers with Conditional Computation](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/localglobal/006.md)
- [MatrixMethod](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/matrixmethod.md)
- [Skyformer Remodel Self-Attention with Gaussian Kernel and Nyström Method](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/matrixmethod/001.md)
- [Is Attention Better Than Matrix Decomposition](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/matrixmethod/002.md)
- [RightProduct](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/rightproduct.md)
- [Kronecker Attention Networks](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/rightproduct/001.md)
- [An Attention Free Transformer](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/rightproduct/002.md)
- [Transformer with Fourier Integral Attentions](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/rightproduct/003.md)
- [Linear Complexity Randomized Self-attention Mechanism](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/rightproduct/004.md)
- [UFO-ViT: High Performance Linear Vision Transformer without Softmax](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/rightproduct/005.md)
- [XCiT: Cross-Covariance Image Transformers](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/rightproduct/006.md)
- [SimpleTRON: Simple Transformer with O(N) Complexity](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/rightproduct/007.md)
- [A Dot Product Attention Free Transformer](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/rightproduct/008.md)
- [On Learning the Transformer Kernel](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/rightproduct/009.md)
- [Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/rightproduct/010.md)
- [SparseOrLowRank](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/sparseorlowrank.md)
- [Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/sparseorlowrank/001.md)
- [Scatterbrain: Unifying Sparse and Low-rank Attention Approximation](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/sparseorlowrank/002.md)
- [Sparse Factorization of Large Square Matrices](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/sparseorlowrank/003.md)
- [Blockwise Self-Attention for Long Document Understanding](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/sparseorlowrank/004.md)
- [H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/sparseorlowrank/005.md)
- [ChunkFormer: Learning Long Time Series with Multi-stage Chunked Transformer](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/sparseorlowrank/006.md)
- [Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/sparseorlowrank/007.md)
- [Fast Transformers with Clustered Attention](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/sparseorlowrank/008.md)
- [Long-Short Transformer: Efficient Transformers for Language and Vision](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/sparseorlowrank/009.md)
- [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/sparseorlowrank/010.md)
- [Luna: Linear Unified Nested Attention](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/sparseorlowrank/011.md)
- [Memory-efficient Transformers via Top-k Attention](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/sparseorlowrank/012.md)
- [Separable Self-attention for Mobile Vision Transformers](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/sparseorlowrank/013.md)
- [Simple Local Attentions Remain Competitive for Long-Context Tasks](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/sparseorlowrank/014.md)
- [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/sparseorlowrank/015.md)
- [Others](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/others.md)
- [Synthesizer: Rethinking Self-Attention in Transformer Models](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/others/001.md)
- [Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kern](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/others/002.md)
- [Combiner Full Attention Transformer with Sparse Computation Cost](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/others/003.md)
- [Ripple Attention for Visual Perception with Sub-quadratic Complexity](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/others/004.md)
- [Sinkformers: Transformers with Doubly Stochastic Attention](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/others/005.md)
- [SOFT: Softmax-free Transformer with Linear Complexity](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/others/006.md)
- [Value-aware Approximate Attention](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/others/007.md)
- [EL-Attention: Memory Efficient Lossless Attention for Generation](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/others/008.md)
- [Flowformer: Linearizing Transformers with Conservation Flows](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/others/009.md)
- [ETSformer: Exponential Smoothing Transformers for Time-series Forecasting](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/others/010.md)
- [IGLOO: Slicing the Features Space to Represent Sequences](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/others/011.md)
- [Swin Transformer V2: Scaling Up Capacity and Resolution](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/others/012.md)
- [Skip-Attention: Improving Vision Transformers by Paying Less Attention](https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/others/013.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
