Transformer-Evolution-Paper

Ctrlk

FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention

论文地址：

https://arxiv.org/abs/2108.02347

整体思路以及计算方式

利用Local Attention + Low-rank Attention逼近Softmax Attention，其中Low-rank Attention就是常用的Linear Attention。

代码

https://github.com/minhtannguyen/fmmformer-code-submission

简评

简单常规的思路，类似的论文也不少了。

PreviousNeighborhood Attention Transformer NextAdaptive Attention Span in Transformers

Last updated 2 years ago