Transformer-Evolution-Paper
search
⌘Ctrlk
Transformer-Evolution-Paper
  • README
  • 数学符号
  • Act
  • Arch
  • FFN
  • Head
  • Memory
  • MHA
  • Normalize_And_Residual
  • Pe
  • Pretrain
  • Softmax
  • Others
  • LongConv
    • Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks
    • Parallelizing Legendre Memory Unit Training
    • Simplified State Space Layers for Sequence Modeling
    • Pretraining Without Attention
    • What Makes Convolutional Models Great on Long Sequence Modeling?
    • Hungry Hungry Hippos: Towards Language Modeling with State Space Models
    • Hyena Hierarchy: Towards Larger Convolutional Language Models
    • RWKV
    • Simple Hardware-Efficient Long Convolutions for Sequence Modeling
    • Time-aware large kernel convolutions
    • Resurrecting Recurrent Neural Networks for Long Sequences
    • CKConv: Continuous Kernel Convolution For Sequential Data
    • FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes
    • Towards a General Purpose CNN for Long Range Dependencies in ND
  • Rnn
  • CrossAttention
  • Inference
  • Peft
  • LLM
gitbookPowered by GitBook
block-quoteOn this pagechevron-down

LongConv

Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networkschevron-rightParallelizing Legendre Memory Unit Trainingchevron-rightSimplified State Space Layers for Sequence Modelingchevron-rightPretraining Without Attentionchevron-rightWhat Makes Convolutional Models Great on Long Sequence Modeling?chevron-rightHungry Hungry Hippos: Towards Language Modeling with State Space Modelschevron-rightHyena Hierarchy: Towards Larger Convolutional Language Modelschevron-rightRWKVchevron-rightSimple Hardware-Efficient Long Convolutions for Sequence Modelingchevron-rightTime-aware large kernel convolutionschevron-rightResurrecting Recurrent Neural Networks for Long Sequenceschevron-rightCKConv: Continuous Kernel Convolution For Sequential Datachevron-rightFlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizeschevron-rightTowards a General Purpose CNN for Long Range Dependencies in NDchevron-right
PreviousWhy self-attention is Natural for Sequence-to-Sequence Problems? A Perspective from Symmetrieschevron-leftNextLegendre Memory Units: Continuous-Time Representation in Recurrent Neural Networkschevron-right

Last updated 3 years ago