to download the project base paper multilingual project.


The availability of large, high-quality datasets has been one of the main drivers of recent progress in question-answering (QA). Such annotated datasets however are difficult and costly to collect, and rarely exist in languages other than English, rendering QA technology inaccessible to underrepresented languages. In Advanced Deep Learning An alternative to building large monolingual training datasets is to leverage pre-trained language models (PLMs) under a few-shot learning setting. Our approach, QAmeleon, uses a PLM to automatically generate multilingual data upon which QA models are trained, thus avoiding costly annotation. Prompt tuning the PLM for data synthesis with only five examples per language delivers accuracy superior to translation-based baselines, bridges nearly 60% of the gap between an English-only baseline and a fully supervised upper bound trained on almost 50,000 hand-labelled examples, and always leads to substantial improvements compared to fine-tuning a QA model directly on labelled examples in low resource settings. Experiments on the TyDiQA-GoldP and MLQA benchmarks show that few-shot prompt tuning for data synthesis scales across languages and is a viable alternative to large-scale annotation.

We claim in this study that a few-shot strategy combined with synthetic data generation and existing high-quality English resources can alleviate some of the above-mentioned artefacts. Beyond question answering, multilingual techniques have been successful in exploiting a small number of annotations in a variety of tasks such as natural language inference, parameter identification, and semantic parsing Existing research has also demonstrated that prompting pre-trained big language models (PLMs) can lead to high performance on a variety of tasks, including as question answering and open-ended natural language creation. Prompting studies in multilingual contexts have also revealed strong few-shot performance in classification tasks, natural language inference, commonsense reasoning, machine translation, and retrieval. With as little as five instances in a new target language, we synthesise these recommendations into QAMELEON, an approach for bootstrapping multilingual QA systems. We utilize goldannotationstoprompt-tuneaPLMinorder to create multilingual QA data, which is subsequently used to fine-tune the QA model. We discover that QAMELEON outperforms zero-shot approaches and competitive translation-based baselines, and in some cases outperforms the fully supervised upper bound.

Leave a Comment


No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *