Driving Domain QA Datasets

The Driving Domain QA (Question Answering) Datasets were constructed based on driving domain blog posts published on the web. They consist of a Predicate-Argument Structure QA (PAS-QA) dataset and a Reading Comprehension QA (RC-QA) dataset. We constructed a PAS-QA dataset in which a question asks an omitted argument for a predicate. We made 12,468 problems for the ga case (nominative), 3,151 problems for the wo case (accusative) and 1,069 problems for the ni case (dative). We also constructed an RC-QA dataset that consist of 20,007 problems. Each problem consist of a document, a question and an answer that is a span in the document. We constructed the PAS-QA and RC-QA datasets with crowdsourcing because it enabled to create large-scale datasets in a short time. The data format of these QA datasets is the same as SQuAD 2.0. As for the PAS-QA nominative dataset and the RC-QA dataset, every problem has an answer in a document. However, as for the PAS-QA accusative and dative datasets, some problems cannot be answered because there is no answer in a document. Please refer to the references for how to construct these datasets and how to make problems with no answers. Examples of the Driving Domain QA Datasets are shown below.


Driving Domain QA Datasets Version 1.0 (tar.gz compression; 4,292,566 bytes): Download Form
To download the datasets, you need to enter your name and email address and agree to the download conditions.

Update history



If you have any questions or problems about these datasets, please send an email to nl-resource at nlp.ist.i.kyoto-u.ac.jp. If you have a request to add source information or to delete a document in the datasets, please send an email to this mail address.