# Pe

- [A Simple and Effective Positional Encoding for Transformers](/transformer_evolution_paper/pe/001.md)
- [DeBERTa Decoding-enhanced BERT with Disentangled Attention](/transformer_evolution_paper/pe/002.md)
- [DecBERT Enhancing the Language Understanding of BERT with Causal Attention Masks](/transformer_evolution_paper/pe/003.md)
- [Encoding word order in complex embeddings](/transformer_evolution_paper/pe/004.md)
- [Improve Transformer Models with Better Relative Position Embeddings](/transformer_evolution_paper/pe/005.md)
- [KERPLE Kernelized Relative Positional Embedding for Length Extrapolation](/transformer_evolution_paper/pe/006.md)
- [PermuteFormer Efficient Relative Position Encoding for Long Sequences](/transformer_evolution_paper/pe/007.md)
- [Rethinking Positional Encoding in Language Pre-training](/transformer_evolution_paper/pe/008.md)
- [Transformer-XL Attentive Language Models Beyond a Fixed-Length Context](/transformer_evolution_paper/pe/009.md)
- [Translational Equivariance in Kernelizable Attention](/transformer_evolution_paper/pe/010.md)
- [Transformer Language Models without Positional Encodings Still Learn Positional Information](/transformer_evolution_paper/pe/011.md)
- [Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding](/transformer_evolution_paper/pe/012.md)
- [Randomized Positional Encodings Boost Length Generalization of Transformers](/transformer_evolution_paper/pe/013.md)
