Driving domain QA datasets - LANGUAGE MEDIA PROCESSING LAB

Driving Domain QA Datasets †

The Driving Domain QA (Question Answering) Datasets were constructed based on driving domain blog posts published on the web. They consist of a Predicate-Argument Structure QA (PAS-QA) dataset and a Reading Comprehension QA (RC-QA) dataset. We constructed a PAS-QA dataset in which a question asks an omitted argument for a predicate. We made 12,468 problems for the ga case (nominative), 3,151 problems for the wo case (accusative) and 1,069 problems for the ni case (dative). We also constructed an RC-QA dataset that consist of 20,007 problems. Each problem consist of a document, a question and an answer that is a span in the document. We constructed the PAS-QA and RC-QA datasets with crowdsourcing because it enabled to create large-scale datasets in a short time. The data format of these QA datasets is the same as SQuAD 2.0. As for the PAS-QA nominative dataset and the RC-QA dataset, every problem has an answer in a document. However, as for the PAS-QA accusative and dative datasets, some problems cannot be answered because there is no answer in a document. Please refer to the references for how to construct these datasets and how to make problems with no answers. Examples of the Driving Domain QA Datasets are shown below.

PAS-QA dataset
- Document :
  - 私は右車線に移動した。 (I moved to the right lane.)
  - (Φが) バックミラーを見た。 ((Φ-NOM) saw the rearview mirror.)
- Question :
  - “見た”の主語は何か？ (What is the subject of "saw"?)
- Answer :　
  - 私 (I)

RC-QA dataset
- Document :　
  - 私の車の前をバイクにまたがった警察官が走っていた。 (A police officer straddling his bike was running in front of my car.)
- Question : 　
  - 警察官は何に乗っていた？ (What was the police officer riding?)
- Answer :
  - バイク (his bike)

↑

Download †

Driving Domain QA Datasets Version 1.0 (tar.gz compression; 4,292,566 bytes): Download Form
To download the datasets, you need to enter your name and email address and agree to the download conditions.

↑

Update history †

Version 1.0 - Released on 10/31/2019

↑

References †

Norio Takahashi, Tomohide Shibata, Daisuke Kawahara and Sadao Kurohashi.
Predicate-argument structure analysis based on a machine comprehension model in a specific domain,
In Proceedings of the 25th Annual Meeting of Natural Language Processing (in Japanese), 2019.
　https://www.anlp.jp/proceedings/annual_meeting/2019/pdf_dir/B1-4.pdf
　Note that this paper describes how to construct these datasets.
Norio Takahashi, Tomohide Shibata, Daisuke Kawahara and Sadao Kurohashi.
Machine Comprehension Improves Domain-Specific Japanese Predicate-Argument Structure Analysis,
In Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Workshop MRQA: Machine Reading for Question Answering, 2019.
　https://www.aclweb.org/anthology/D19-5814.pdf
　Note that this paper describes how to construct these datasets and how to make problems with no answers.

↑

Contact †

If you have any questions or problems about these datasets, please send an email to nl-resource at nlp.ist.i.kyoto-u.ac.jp. If you have a request to add source information or to delete a document in the datasets, please send an email to this mail address.