Do Transformer Modifications Transfer Across Implementations and Applications?
PreviousObject-Centric Learning with Slot AttentionNextWhy self-attention is Natural for Sequence-to-Sequence Problems? A Perspective from Symmetries
Last updated
Last updated