Pretrain
XLNet Generalized Autoregressive Pretraining for Language UnderstandingTranscormer Transformer for Sentence Scoring with Sliding Language ModelingOptimus Organizing Sentences via Pre-trained Modeling of a Latent SpaceELECTRA Pre-training Text Encoders as Discriminators Rather Than GeneratorsCramming: Training a Language Model on a Single GPU in One Day
PreviousRandomized Positional Encodings Boost Length Generalization of TransformersNextXLNet Generalized Autoregressive Pretraining for Language Understanding
Last updated