Fine-tuning a pre-trained on 1BLM performed well on and other NLP tasks. Why don’t we construct a gigantic text taken from novels, textbooks, , webpages and conversation, so that a LM trained on this dataset can a output (e.g. multiple sentences) conditioned on input sentences? Note that we don’t do fine-tuning here; we train the model on one giant dataset (not Seq2Seq, just LM), and that’s it. 1BLM has no inter-sentence dependency, yet here each sample of minibatch is a randomly sampled consecutive multiple sentences. It doesn’t only do but also give an appropriate output according to the input task. The samples are noisy, and the samples with question and answer format may be much smaller than the entire dataset. However, I believe the large size of the dataset would generalize nicely to resolve them. Any feedback?

Source link
thanks you RSS link


Please enter your comment!
Please enter your name here