Do Transformer Modifications Transfer Across Implementations and Applications?
论文地址:
简评
是一篇实验性质的文章,测试了很多Transformer的变种,后续复现时可以参考这篇论文的结论。
PreviousObject-Centric Learning with Slot AttentionNextWhy self-attention is Natural for Sequence-to-Sequence Problems? A Perspective from Symmetries
Last updated