Driving domain QA datasets

The driving domain QA (Question Answering) datasets were constructed based on the driving domain blog posts published on the web. They consisted of a Predicate-Argument Structure QA dataset (PAS-QA dataset) and a Reading Comprehension QA dataset (RC-QA dataset). We constructed a PAS-QA dataset in which a question asks an omitted argument for a predicate. We made 12,468 questions for the ga case (nominative), 3,151 questions for the wo case (accusative), 1,069 questions for the ni case (dative). We also constructed an RC-QA dataset which was a problem to extract the answer to the question from text, and made 20,007 questions. The data format of the QA dataset is the same as SQuAD1.0 for the PAS-QA nominative dataset and the RC-QA dataset, and the same as SQuAD2.0 for the PAS-QA accusative and dative dataset. We constructed PAS-QA and RC-QA datasets with crowdsourcing because it enabled to create large-scale datasets in a short time. Please refer to the references for SQuAD data formats and how to construct these datasets.

Download

Update history

References

Contact

If you have any questions or problems about these datasets, please send an email to nl-resource at nlp.ist.i.kyoto-u.ac.jp. If you have a request to add source information or to delete a document in the datasets, please send an email to this mail address.


Front page   New List of pages Search Recent changes   Help   RSS of recent changes