Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
PreviousHierarchical Transformers Are More Efficient Language ModelsNextGeneralization through Memorization: Nearest Neighbor Language Models
Last updated