Transformer-Evolution-Paper
search
⌘Ctrlk
Transformer-Evolution-Paper
  • README
  • 数学符号
  • Act
  • Arch
  • FFN
  • Head
  • Memory
  • MHA
  • Normalize_And_Residual
  • Pe
    • A Simple and Effective Positional Encoding for Transformers
    • DeBERTa Decoding-enhanced BERT with Disentangled Attention
    • DecBERT Enhancing the Language Understanding of BERT with Causal Attention Masks
    • Encoding word order in complex embeddings
    • Improve Transformer Models with Better Relative Position Embeddings
    • KERPLE Kernelized Relative Positional Embedding for Length Extrapolation
    • PermuteFormer Efficient Relative Position Encoding for Long Sequences
    • Rethinking Positional Encoding in Language Pre-training
    • Transformer-XL Attentive Language Models Beyond a Fixed-Length Context
    • Translational Equivariance in Kernelizable Attention
    • Transformer Language Models without Positional Encodings Still Learn Positional Information
    • Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding
    • Randomized Positional Encodings Boost Length Generalization of Transformers
  • Pretrain
  • Softmax
  • Others
  • LongConv
  • Rnn
  • CrossAttention
  • Inference
  • Peft
  • LLM
gitbookPowered by GitBook
block-quoteOn this pagechevron-down

Pe

A Simple and Effective Positional Encoding for Transformerschevron-rightDeBERTa Decoding-enhanced BERT with Disentangled Attentionchevron-rightDecBERT Enhancing the Language Understanding of BERT with Causal Attention Maskschevron-rightEncoding word order in complex embeddingschevron-rightImprove Transformer Models with Better Relative Position Embeddingschevron-rightKERPLE Kernelized Relative Positional Embedding for Length Extrapolationchevron-rightPermuteFormer Efficient Relative Position Encoding for Long Sequenceschevron-rightRethinking Positional Encoding in Language Pre-trainingchevron-rightTransformer-XL Attentive Language Models Beyond a Fixed-Length Contextchevron-rightTranslational Equivariance in Kernelizable Attentionchevron-rightTransformer Language Models without Positional Encodings Still Learn Positional Informationchevron-rightStable, Fast and Accurate: Kernelized Attention with Relative Positional Encodingchevron-rightRandomized Positional Encodings Boost Length Generalization of Transformerschevron-right
PreviousUnderstanding the difficulty of training transformerschevron-leftNextA Simple and Effective Positional Encoding for Transformerschevron-right

Last updated 2 years ago