Transformer-Evolution-Paper
More
Search
Ctrl + K
Rnn
Previous
Towards a General Purpose CNN for Long Range Dependencies in ND
Next
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
Last updated
2 years ago
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
Linear Transformers Are Secretly Fast Weight Programmers
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Parallelizing Linear Recurrent Neural Nets Over Sequence Length
Quasi-recurrent neural networks