Pe
A Simple and Effective Positional Encoding for TransformersDeBERTa Decoding-enhanced BERT with Disentangled AttentionDecBERT Enhancing the Language Understanding of BERT with Causal Attention MasksEncoding word order in complex embeddingsImprove Transformer Models with Better Relative Position EmbeddingsKERPLE Kernelized Relative Positional Embedding for Length ExtrapolationPermuteFormer Efficient Relative Position Encoding for Long SequencesRethinking Positional Encoding in Language Pre-trainingTransformer-XL Attentive Language Models Beyond a Fixed-Length ContextTranslational Equivariance in Kernelizable AttentionTransformer Language Models without Positional Encodings Still Learn Positional InformationStable, Fast and Accurate: Kernelized Attention with Relative Positional EncodingRandomized Positional Encodings Boost Length Generalization of Transformers
PreviousUnderstanding the difficulty of training transformersNextA Simple and Effective Positional Encoding for Transformers
Last updated