Arch
Supplementary Material Implementation and Experiments for GAU-based ModelMetaFormer is Actually What You Need for VisionDeeper vs Wider A Revisit of Transformer ConfigurationPerceiver General Perception with Iterative AttentionGeneral-purpose, long-context autoregressive modeling with Perceiver ARHierarchical Transformers Are More Efficient Language ModelsBranchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and UnderstandingGeneralization through Memorization: Nearest Neighbor Language Models
PreviousA survey on recently proposed activation functions for Deep LearningNextSupplementary Material Implementation and Experiments for GAU-based Model
Last updated