# Skyformer Remodel Self-Attention with Gaussian Kernel and Nyström Method

论文地址：

* <https://arxiv.org/abs/2111.00035>

## 整体思路以及计算方式

首先引入高斯核，将相似度矩阵表示为对称矩阵的子矩阵，然后利用Nyström方法计算对称矩阵，最后求出子矩阵。

符号说明：

* $$\mathbf Q\in \mathbb R^{n\times d},\mathbf K\in \mathbb R^{n\times d},\mathbf V \in \mathbb R^{n\times d}$$
* $$\mathbf S\in \mathbb R^{2n\times d\_s}$$

整体思路如下：

相似度的计算方式修改：

$$
\mathbf S\_{ij} =\exp \left(-\frac{\left|\mathbf{q}*{i}-\mathbf{k}*{j}\right|^{2}}{2 \sqrt{p}}\right)
$$

记：

$$
\phi(\mathbf p,\mathbf q) = \exp\left(-\frac{|\mathbf p-\mathbf q |^2}{2\sqrt p}\right)
$$

那么：

$$
\mathbf S = \phi(\mathbf Q, \mathbf K) \in \mathbb R^{n\times m}
$$

注意$$\mathbf S$$不好处理，但是$$\mathbf S$$为如下矩阵的子矩阵：

$$
\overline {\mathbf B} =\phi \left(\left(\begin{array}{l} \mathbf {Q} \ \mathbf {K} \end{array}\right),\left(\begin{array}{l} \mathbf {Q} \ \mathbf {K} \end{array}\right)\right)
$$

注意$$\bar B$$为正定矩阵，所以可以用Nyström Method进行近似计算：

$$
\tilde{\overline{\mathbf{B}}}=\overline{\mathbf{B}} \mathbf{S}\left(\mathbf{S}^{\top} \overline{\mathbf{B}} \mathbf{S}\right)^{\dagger} \mathbf{S}^{\top} \overline{\mathbf{B}}
$$

最后利用下式近似计算$$\mathbf S$$：

$$
\tilde{\mathbf{B}}:=(\mathbf{I}\_n, \mathbf{0}) \tilde{\overline{\mathbf{B}}}(\mathbf{0}, \mathbf{I}\_n)^{\top}
$$

## 时间复杂度

首先分析矩阵的形状：

* $$(\mathbf{I}\_n, \mathbf{0}) \overline{\mathbf{B}} \mathbf{S} \in \mathbb R^{n\times d\_s}$$
* $$\left(\mathbf{S}^{\top} \overline{\mathbf{B}} \mathbf{S}\right)^{\dagger}\in \mathbb R^{d\_s\times d\_s}$$
* $$\mathbf{S}^{\top} \overline{\mathbf{B}} (\mathbf{0}, \mathbf{I}\_n)^{\top} \in \mathbb R^{d\_s\times n}$$

总时间复杂度为$$O(nd\_s d)$$。

## 训练以及loss

不变。

## 代码

* <https://github.com/pkuzengqi/Skyformer>

## 实验以及适用场景

实验结果比较少，主要是测试了近似误差和LRA，总体来说结果一般；测试的都是Encoder情形，Decoder情形可能较难实现。

## 细节

暂无。

## 简评

暂无。


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://doraemonzzz.gitbook.io/transformer_evolution_paper/mha/matrixmethod/001.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
