General-purpose, long-context autoregressive modeling with Perceiver AR
论文地址:
整体思路以及计算方式
基本上同Perceiver,将模型拓展为可以处理单向数据,唯一的区别是将输入拆分为:
(with mask)
(with mask)
其余部分同Perceiver。
代码
简评
是否可以将该方法推广为一种预训练方式?
PreviousPerceiver General Perception with Iterative AttentionNextHierarchical Transformers Are More Efficient Language Models
Last updated