Transformer-Evolution-Paper

Ctrlk

README
数学符号
Act
Arch
FFN
Head
Memory
MHA
Normalize_And_Residual
Pe
Pretrain
Softmax
Others
LongConv
Rnn
CrossAttention
Inference
Peft
LLM

Powered by GitBook

On this page

MHA

MatrixMethod

Skyformer Remodel Self-Attention with Gaussian Kernel and Nyström Method Is Attention Better Than Matrix Decomposition

PreviousCoLT5: Faster Long-Range Transformers with Conditional Computation NextSkyformer Remodel Self-Attention with Gaussian Kernel and Nyström Method

Last updated 3 years ago