Transformer-Evolution-Paper

Ctrlk

Arch

Supplementary Material Implementation and Experiments for GAU-based Model MetaFormer is Actually What You Need for Vision Deeper vs Wider A Revisit of Transformer Configuration Perceiver General Perception with Iterative Attention General-purpose, long-context autoregressive modeling with Perceiver AR Hierarchical Transformers Are More Efficient Language Models Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding Generalization through Memorization: Nearest Neighbor Language Models

PreviousA survey on recently proposed activation functions for Deep Learning NextSupplementary Material Implementation and Experiments for GAU-based Model

Last updated 2 years ago