Transformer-Evolution-Paper
Ctrlk
  • README
  • 数学符号
  • Act
  • Arch
  • FFN
  • Head
  • Memory
  • MHA
  • Normalize_And_Residual
  • Pe
    • A Simple and Effective Positional Encoding for Transformers
    • DeBERTa Decoding-enhanced BERT with Disentangled Attention
    • DecBERT Enhancing the Language Understanding of BERT with Causal Attention Masks
    • Encoding word order in complex embeddings
    • Improve Transformer Models with Better Relative Position Embeddings
    • KERPLE Kernelized Relative Positional Embedding for Length Extrapolation
    • PermuteFormer Efficient Relative Position Encoding for Long Sequences
    • Rethinking Positional Encoding in Language Pre-training
    • Transformer-XL Attentive Language Models Beyond a Fixed-Length Context
    • Translational Equivariance in Kernelizable Attention
    • Transformer Language Models without Positional Encodings Still Learn Positional Information
    • Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding
    • Randomized Positional Encodings Boost Length Generalization of Transformers
  • Pretrain
  • Softmax
  • Others
  • LongConv
  • Rnn
  • CrossAttention
  • Inference
  • Peft
  • LLM
Powered by GitBook
On this page

Pe

A Simple and Effective Positional Encoding for TransformersDeBERTa Decoding-enhanced BERT with Disentangled AttentionDecBERT Enhancing the Language Understanding of BERT with Causal Attention MasksEncoding word order in complex embeddingsImprove Transformer Models with Better Relative Position EmbeddingsKERPLE Kernelized Relative Positional Embedding for Length ExtrapolationPermuteFormer Efficient Relative Position Encoding for Long SequencesRethinking Positional Encoding in Language Pre-trainingTransformer-XL Attentive Language Models Beyond a Fixed-Length ContextTranslational Equivariance in Kernelizable AttentionTransformer Language Models without Positional Encodings Still Learn Positional InformationStable, Fast and Accurate: Kernelized Attention with Relative Positional EncodingRandomized Positional Encodings Boost Length Generalization of Transformers
PreviousUnderstanding the difficulty of training transformersNextA Simple and Effective Positional Encoding for Transformers

Last updated 2 years ago