Transformer-Evolution-Paper
search
⌘Ctrlk
Transformer-Evolution-Paper
  • README
  • 数学符号
  • Act
  • Arch
  • FFN
  • Head
  • Memory
  • MHA
  • Normalize_And_Residual
  • Pe
  • Pretrain
    • XLNet Generalized Autoregressive Pretraining for Language Understanding
    • Transcormer Transformer for Sentence Scoring with Sliding Language Modeling
    • Optimus Organizing Sentences via Pre-trained Modeling of a Latent Space
    • ELECTRA Pre-training Text Encoders as Discriminators Rather Than Generators
    • Cramming: Training a Language Model on a Single GPU in One Day
  • Softmax
  • Others
  • LongConv
  • Rnn
  • CrossAttention
  • Inference
  • Peft
  • LLM
gitbookPowered by GitBook
block-quoteOn this pagechevron-down

Pretrain

XLNet Generalized Autoregressive Pretraining for Language Understandingchevron-rightTranscormer Transformer for Sentence Scoring with Sliding Language Modelingchevron-rightOptimus Organizing Sentences via Pre-trained Modeling of a Latent Spacechevron-rightELECTRA Pre-training Text Encoders as Discriminators Rather Than Generatorschevron-rightCramming: Training a Language Model on a Single GPU in One Daychevron-right
PreviousRandomized Positional Encodings Boost Length Generalization of Transformerschevron-leftNextXLNet Generalized Autoregressive Pretraining for Language Understandingchevron-right

Last updated 2 years ago