Click here to download the project base paper of a large language model.
Abstract:
Large language model (LLMs) demonstrate a remarkable ability to comprehend, reason, and generate following natural language instructions. However, the development of LLMs has been primarily focused on high-resource languages, such as English, thereby limiting their applicability and research in other languages. Consequently, we present PolyLM, a multilingual LLM trained on 640 billion (B) tokens, available in two model sizes: 1.7B and 13B. To enhance its multilingual capabilities, we 1) integrate bilingual data into training data, and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training. Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning. To assess the model’s performance, we collect several existing multilingual tasks, including multilingual understanding, question answering, generation, and translation. Extensive experiments show that PolyLM surpasses other open-source models such as LLaMA and BLOOM on multilingual tasks while maintaining comparable performance in English.
Instruction Learning aims to bring together various natural language processing tasks by framing them as question-answering exercises that operate over a given context. Thisapproachenhancesthe valueofLLMsbyleveragingtheirexistingknowledge.With the success of language models, there has been growing interest in exploring their potential to comprehend and execute instructions. Severaladvancedresearches(Ouyangetal.,2022;Weietal.,2022;Pengetal.,2023;Yeetal., 2023;Zhouetal.,2023)have demonstrate dare markable ability to generalize to newzero-shottasks. However, they rely heavily on human-generated instruction data, which is frequently constrained in terms of quantity, diversity, and creativity, which is time-consuming and labour-intensive. Wang et al. (2022) make an effort to construct a self-Instruct framework for improving the instruction following capabilities of LLMs. Similarly, Xu et al. (2023) proposeanevol-instructframework to automatically rewrite simple human-written instructions step by step into more complex ones, to further improve instruction-followedLLMs. Our models of advanced deep learning projects for btech final year students,along with the instruction data and multilingual benchmark, are available Click here for more information