Transformer-Evolution-Paper

⌘Ctrlk

Parallelizing Legendre Memory Unit Training

论文地址：

https://arxiv.org/abs/2102.11417

整体思路以及计算方式

利用卷积的方式并行化计算LMU，整体思路如下。

首先回顾计算公式：

\begin{aligned} &\mathbf{u}_{t}=f_{1}\left(\mathbf{U}_{x} \mathbf{x}_{t}+\mathbf{b}_{u}\right) \\ &\mathbf{m}_{t}=\overline{\mathbf{A}} \mathbf{m}_{t-1}+\overline{\mathbf{B}} \mathbf{u}_{t} \\ &\mathbf{o}_{t}=f_{2}\left(\mathbf{W}_{m} \mathbf{m}_{t}+\mathbf{W}_{x} \mathbf{x}_{t}+\mathbf{b}_{o}\right) \end{aligned}

对 $\mathbf m_t$ 进行展开：

\mathbf{m}_{t}=\sum_{j=1}^{t} \overline{\mathbf{A}}^{t-j} \overline{\mathbf{B}} u_{j}

记：

\begin{aligned} \mathbf{H}&=\left[\begin{array}{lll} \overline{\mathbf{A}}^{0} \overline{\mathbf{B}} & \overline{\mathbf{A}} \overline{\mathbf{B}} & \ldots \end{array}\right] \in \mathbb{R}^{d \times n} \\ \mathbf{U}&=\left[\begin{array}{ccccc} u_{1} & u_{2} & u_{3} & \ldots & u_{n} \\ & u_{1} & u_{2} & \ldots & u_{n-1} \\ & & u_{1} & \ldots & u_{n-2} \\ & & & \ddots & \vdots \\ & & & & u_{1} \end{array}\right] \in \mathbb{R}^{n \times n} \end{aligned}

那么：

\mathbf{m}_{1: n}=\mathbf{H U}

利用傅里叶变换，最后的计算方式为：

\mathbf{m}_{1: n}=\mathcal{F}^{-1}\left\{\mathcal{F}\{\mathbf{H}\} \cdot \mathcal{F}\left\{\mathbf{U}_{: n}\right\}\right\}

时间复杂度

$O(nd e\log n )$ ，其中 $e$ 为embedding的维度。

代码

实验以及适用场景

略过。

细节

略过。

简评

依然和S4很像。

PreviousLegendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks NextSimplified State Space Layers for Sequence Modeling

Last updated 2 years ago