# Going Beyond Linear Transformers with Recurrent Fast Weight Programmers

论文地址：

* <https://arxiv.org/abs/2106.06295>

## 整体思路以及计算方式

首先回顾Linear Attention的计算方式：

$$
\begin{aligned} \mathbf{k}*{t}, \mathbf{v}*{t}, \mathbf{q}*{t} &=\mathbf{W}*{k} \mathbf{x}*{t}, \mathbf{W}*{v} \mathbf{x}*{t}, \mathbf{W}*{q} \mathbf{x}*{t} \ \mathbf{W}*{t} &=\mathbf{W}*{t-1}+\mathbf{v}*{t} \otimes \mathbf{k}*{t} \ \mathbf{y}*{t} &=\mathbf{W}*{t} \mathbf{q}*{t} \end{aligned}
$$

其中$$\otimes$$表示向量外积。

作者将公式二改写为：

$$
\mathbf{W}*{t}=\mathbf{W}*{t-1}+\beta\_{t}\left(\mathbf{v}*{t}-\overline{\mathbf{v}}*{t}\right) \otimes \mathbf{k}\_{t}
$$

将公式一改写为：

$$
\begin{aligned} \mathbf{k}*{t} &=\mathbf{W}*{k} \mathbf{x}*{t}+\mathbf{R}*{k} \tanh \left(\mathbf{y}*{t-1}\right) \ \mathbf{v}*{t} &=\mathbf{W}*{v} \mathbf{x}*{t}+\mathbf{R}*{v} \tanh \left(\mathbf{y}*{t-1}\right) \ \mathbf{q}*{t} &=\mathbf{W}*{q} \mathbf{x}*{t}+\mathbf{R}*{q} \tanh \left(\mathbf{y}*{t-1}\right) \ \beta*{t} &=\sigma\left(\mathbf{W}*{\beta} \mathbf{x}*{t}+\mathbf{R}*{\beta} \tanh \left(\mathbf{y}*{t-1}\right)\right) \end{aligned}
$$

## 时间复杂度

$$O(nd^2)$$，但是因为使用了循环，所以实际会慢很多。

## 训练以及loss

不变。

## 代码

* <https://github.com/IDSIA/recurrent-fwp>

## 实验以及适用场景

测试了各种场景，总体性能不错。

## 细节

暂无。

## 简评

把Attention修改为RNN，个人感觉是一种退步，不看好这个工作。


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://doraemonzzz.gitbook.io/transformer_evolution_paper/rnn/003.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
