Transformer-Evolution-Paper
search
⌘Ctrlk
Transformer-Evolution-Paper
  • README
  • 数学符号
  • Act
  • Arch
  • FFN
  • Head
  • Memory
  • MHA
  • Normalize_And_Residual
  • Pe
  • Pretrain
  • Softmax
  • Others
  • LongConv
  • Rnn
    • When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
    • Linear Transformers Are Secretly Fast Weight Programmers
    • Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
    • Parallelizing Linear Recurrent Neural Nets Over Sequence Length
    • Quasi-recurrent neural networks
  • CrossAttention
  • Inference
  • Peft
  • LLM
gitbookPowered by GitBook
block-quoteOn this pagechevron-down

Rnn

When Attention Meets Fast Recurrence: Training Language Models with Reduced Computechevron-rightLinear Transformers Are Secretly Fast Weight Programmerschevron-rightGoing Beyond Linear Transformers with Recurrent Fast Weight Programmerschevron-rightParallelizing Linear Recurrent Neural Nets Over Sequence Lengthchevron-rightQuasi-recurrent neural networkschevron-right
PreviousTowards a General Purpose CNN for Long Range Dependencies in NDchevron-leftNextWhen Attention Meets Fast Recurrence: Training Language Models with Reduced Computechevron-right

Last updated 3 years ago