Rnn
When Attention Meets Fast Recurrence: Training Language Models with Reduced ComputeLinear Transformers Are Secretly Fast Weight ProgrammersGoing Beyond Linear Transformers with Recurrent Fast Weight ProgrammersParallelizing Linear Recurrent Neural Nets Over Sequence LengthQuasi-recurrent neural networks
PreviousTowards a General Purpose CNN for Long Range Dependencies in NDNextWhen Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
Last updated