The driving domain QA (Question Answering) datasets were constructed based on the driving domain blog posts published on the web. They consisted of a Predicate-Argument Structure QA dataset (PAS-QA dataset) and a Reading Comprehension QA dataset (RC-QA dataset). We constructed a PAS-QA dataset in which a question asks an omitted argument for a predicate. We made 12,468 questions for the ga case (nominative), 3,151 questions for the wo case (accusative), 1,069 questions for the ni case (dative). We also constructed an RC-QA dataset which was a problem to extract the answer to the question from text, and made 20,007 questions. The data format of the QA dataset is the same as SQuAD1.0 for the PAS-QA nominative dataset and the RC-QA dataset, and the same as SQuAD2.0 for the PAS-QA accusative and dative dataset. We constructed PAS-QA and RC-QA datasets with crowdsourcing because it enabled to create large-scale datasets in a short time. Please refer to the references for SQuAD data formats and how to construct these datasets.
dataset | use | file name |
RC-QA | train | DDQA-1.0_RC-QA_train.json |
RC-QA | dev | DDQA-1.0_RC-QA_dev.json |
RC-QA | test | DDQA-1.0_RC-QA_test.json |
PAS-QA (nominative) | train | DDQA-1.0_PAS-QA-NOM_train.json |
PAS-QA (nominative) | dev | DDQA-1.0_PAS-QA-NOM_dev.json |
PAS-QA (nominative) | test | DDQA-1.0_PAS-QA-NOM_test.json |
PAS-QA (accusative) | train | DDQA-1.0_PAS-QA-ACC_train.json |
PAS-QA (accusative) | dev | DDQA-1.0_PAS-QA-ACC_dev.json |
PAS-QA (accusative) | test | DDQA-1.0_PAS-QA-ACC_test.json |
PAS-QA (dative) | train | DDQA-1.0_PAS-QA-DAT_train.json |
PAS-QA (dative) | dev | DDQA-1.0_PAS-QA-DAT_dev.json |
PAS-QA (dative) | test | DDQA-1.0_PAS-QA-DAT_test.json |
If you have any questions or problems about these datasets, please send an email to nl-resource at nlp.ist.i.kyoto-u.ac.jp. If you have a request to add source information or to delete a document in the datasets, please send an email to this mail address.