RWKV
Training
Time mixing
x = torch.cat([self.time_shift(x)[:,:T,:C//2], x[:,:T,C//2:]], dim=2)Feature mixing
V1
V2
V3
V4
Inference
PreviousHyena Hierarchy: Towards Larger Convolutional Language ModelsNextSimple Hardware-Efficient Long Convolutions for Sequence Modeling
Last updated