Transformer-Evolution-Paper

Ctrlk

DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling & DeLighT: Deep and Light-weight Transformer

论文地址：

参考资料：

整体思路以及计算方式

用更深的网络代替FFN，具体思路为：

分多个阶段进行如下操作：
- 对feature维度进行分组计算（类似feature维度的多头attention）；
- 合并；

两篇论文都是利用该思路，最后得到的结果是一个很深很窄的子网络来替换全连接层。

时间复杂度

不考虑。

训练以及loss

不变。

代码

https://github.com/sacmehta/delight

实验以及适用场景

机器翻译为主，最后的结果是，可以用参数量少得多的网络达到相同的性能。

细节

细节较多，具体的需要参考原文。

简评

很好的一个思路，可以考虑复现。

PreviousHyperMixer An MLP-based Green AI Alternative to Transformers NextWhen Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism

Last updated 2 years ago