Transformer-Evolution-Paper

Ctrlk

RightProduct

Kronecker Attention Networks An Attention Free Transformer Transformer with Fourier Integral Attentions Linear Complexity Randomized Self-attention Mechanism UFO-ViT: High Performance Linear Vision Transformer without Softmax XCiT: Cross-Covariance Image Transformers SimpleTRON: Simple Transformer with O(N) Complexity A Dot Product Attention Free Transformer On Learning the Transformer Kernel Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization

PreviousIs Attention Better Than Matrix Decomposition NextKronecker Attention Networks

Last updated 3 years ago