# Transformer with Fourier Integral Attentions

论文地址：

* <https://arxiv.org/abs/2206.00206>

## 整体思路以及计算方式

利用非参数回归的方式对Attention进行改进，整体思路分为两步：

非参数回归：

* $${v}*{j}=f\left({k}*{j}\right)+\varepsilon\_{j}$$
* $${\mathbb E}\[{v} \mid {k}]=\int\_{{R}^{D}} {v} \cdot p({v} \mid {k}) d {v}=\int \frac{{v} \cdot p({v}, {k})}{p({k})} d {v}$$
* 利用Kernel法估计概率密度（$$\varphi$$为高斯核函数）：

  $$
  \hat{p}*{\sigma}({v}, {k})=\frac{1}{N} \sum*{j=1}^{N} \varphi\_{\sigma}\left({v}-{v}*{j}\right) \varphi*{\sigma}\left({k}-{k}*{j}\right), \quad \hat{p}*{\sigma}({k})=\frac{1}{N} \sum\_{j=1}^{N} \varphi\_{\sigma}\left({k}-{k}\_{j}\right)
  $$
* 带入：$$\widehat{f}*{\sigma}({k})={\mathbb E}\[{v} \mid {k}]= \frac{\sum*{j=1}^{N} v\_{j} \varphi\_{\sigma}\left({k}-{k}*{j}\right)}{\sum*{j=1}^{N} \varphi\_{\sigma}\left({k}-{k}\_{j}\right)}$$
* 将$$k$$换成$$q$$得到：

  $$
  \begin{aligned} \widehat{f}*{\sigma}\left({q}*{i}\right) &=\frac{\sum\_{j}^{N} {v}*{j} \exp \left(-\left|{q}*{i}-{k}*{j}\right|^{2} / 2 \sigma^{2}\right)}{\sum*{j}^{N} \exp \left(-\left|{q}*{i}-{k}*{j}\right|^{2} / 2 \sigma^{2}\right)} \ &=\frac{\sum\_{j}^{N} {v}*{j} \exp \left\[-\left(\left|{q}*{i}\right|^{2}+\left|{k}*{j}\right|^{2}\right) / 2 \sigma^{2}\right] \exp \left({q}*{i} {k}*{j}^{\top} / \sigma^{2}\right)}{\sum*{j}^{N} \exp \left\[-\left(\left|{q}*{i}\right|^{2}+\left|{k}*{j^{\prime}}\right|^{2}\right) / 2 \sigma^{2}\right] \exp \left({q}*{i} {k}*{j}^{\top} / \sigma^{2}\right)} \end{aligned}
  $$

  如果假设$$|q\_i| = |k\_j|$$，那么上式退化为Attention，由此作者说该方法是Attention的推广；

计算：

* 作者利用傅里叶定理求解非参数回归问题，思路为利用傅里叶积分定理计算$$\varphi\_{\sigma}\left({k}-{k}\_{j}\right)$$；
* 直接给出计算公式：

  $$
  \hat{{h}}*{i}:=f*{N, R}\left({q}*{i}\right)=\frac{\sum*{i=1}^{N} {v}*{i} \prod*{j=1}^{D} \phi\left(\frac{\sin \left(R\left(q\_{i j}-k\_{i j}\right)\right)}{R\left(q\_{i j}-k\_{i j}\right)}\right)}{\sum\_{i=1}^{N} \prod\_{j=1}^{D} \phi\left(\frac{\sin \left(R\left(q\_{i j}-k\_{i j}\right)\right)}{R\left(q\_{i j}-k\_{i j}\right)}\right)}
  $$
* 这里$$\phi$$是一个函数，论文里有介绍。

## 时间复杂度

依然为$$O(n^2d)$$，所以理论复杂度没有改进，根据计算的形式，推测速度会慢。

## 训练以及loss

不变。

## 代码

* <https://github.com/minhtannguyen/FourierFormer_NeurIPS>

## 实验以及适用场景

适用于Encoder, Decoder，结果有所提升。

## 细节

暂无。

## 简评

不错的一个思路，让人眼前一亮。


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/rightproduct/003.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
