Transformer-Evolution-Paper

Ctrlk

Transformers without Tears: Improving the Normalization of Self-Attention

论文地址：

https://arxiv.org/abs/1910.05895

整体思路以及计算方式

对layernorm的改进：

\operatorname{ScaleNorm}(\mathbf{x} ; g)=g \frac{\mathbf{x}}{\|\mathbf{x}\|}

时间复杂度

不考虑。

训练以及loss

不变。

代码

https://github.com/tnq177/transformers_without_tears

实验以及适用场景

适用于所有场景，作者测试了机器翻译，获得了一定的提升。

细节

暂无。

简评

值得实现。

PreviousOn Layer Normalizations and Residual Connections in Transformers NextQuery-Key Normalization for Transformers

Last updated 3 years ago