Transformer-Evolution-Paper

CtrlK

LongConv

Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks Parallelizing Legendre Memory Unit Training Simplified State Space Layers for Sequence Modeling Pretraining Without Attention What Makes Convolutional Models Great on Long Sequence Modeling?Hungry Hungry Hippos: Towards Language Modeling with State Space Models Hyena Hierarchy: Towards Larger Convolutional Language Models RWKV Simple Hardware-Efficient Long Convolutions for Sequence Modeling Time-aware large kernel convolutions Resurrecting Recurrent Neural Networks for Long Sequences CKConv: Continuous Kernel Convolution For Sequential Data FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes Towards a General Purpose CNN for Long Range Dependencies in ND

PreviousWhy self-attention is Natural for Sequence-to-Sequence Problems? A Perspective from Symmetries NextLegendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks

Last updated 3 years ago