Transformer-Evolution-Paper

CtrlK

Global Filter Networks for Image Classification

论文地址：

https://arxiv.org/abs/2107.00645

参考资料：

https://zhuanlan.zhihu.com/p/418500459

整体思路以及计算方式

对于2维输入 $\mathbf X\in \mathbb R^{n\times d}$ ：

\mathbf O = \mathcal F^{-1}(\mathcal F(\mathbf X)\odot \mathbf W) \in \mathbb R^{n\times d}

其中：

\mathbf W\in \mathbb R^{n\times d}

其中 $\mathcal F, \mathcal F^{-1}$ 分别为FFT和逆FFT，高维情形为在多个维度做FFT。

时间复杂度

$O(nd\log n+n d)$ 。

训练以及loss

不变。

代码

https://github.com/raoyongming/GFNet

实验以及适用场景

论文测试了Encoder情形，效果还可以。

细节

$\mathbf W$ 和序列长度有关；该方法依然不适配于Decoder情形。

简评

很自然的思路。

PreviousFourier Neural Operator for Parametric Partial Differential Equations NextAdaptive Fourier Neural Operators: Efficient Token Mixers for Transformers

Last updated 2 years ago