Transformer-Evolution-Paper

CtrlK

Pe

A Simple and Effective Positional Encoding for Transformers DeBERTa Decoding-enhanced BERT with Disentangled Attention DecBERT Enhancing the Language Understanding of BERT with Causal Attention Masks Encoding word order in complex embeddings Improve Transformer Models with Better Relative Position Embeddings KERPLE Kernelized Relative Positional Embedding for Length Extrapolation PermuteFormer Efficient Relative Position Encoding for Long Sequences Rethinking Positional Encoding in Language Pre-training Transformer-XL Attentive Language Models Beyond a Fixed-Length Context Translational Equivariance in Kernelizable Attention Transformer Language Models without Positional Encodings Still Learn Positional Information Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding Randomized Positional Encodings Boost Length Generalization of Transformers

PreviousUnderstanding the difficulty of training transformers NextA Simple and Effective Positional Encoding for Transformers

Last updated 2 years ago