Backup of Driving domain QA datasets(No. 1) - LANGUAGE MEDIA PROCESSING LAB

List of Backups
View the diff.
View the diff current.
View the source.
Go to Driving domain QA datasets.
- 1 (2019-10-28 (Mon) 06:39:48)
- 2 (2019-10-29 (Tue) 08:34:03)
- 3 (2019-10-29 (Tue) 13:13:03)
- 4 (2019-10-31 (Thu) 14:00:21)

Driving domain QA datasets †

The driving domain QA (Question Answering) datasets were constructed based on the driving domain blog posts published on the web. They consisted of a Predicate-Argument Structure QA dataset (PAS-QA dataset) and a Reading Comprehension QA dataset (RC-QA dataset). We constructed a PAS-QA dataset in which a question asks an omitted argument for a predicate. We made 12,468 questions for the ga case (nominative), 3,151 questions for the wo case (accusative), 1,069 questions for the ni case (dative). We also constructed an RC-QA dataset which was a problem to extract the answer to the question from text, and made 20,007 questions. The data format of the QA dataset is the same as SQuAD1.0 for the PAS-QA nominative dataset and the RC-QA dataset, and the same as SQuAD2.0 for the PAS-QA accusative and dative dataset. We constructed PAS-QA and RC-QA datasets with crowdsourcing because it enabled to create large-scale datasets in a short time. Please refer to the references for SQuAD data formats and how to construct these datasets.

↑

Download †

Driving domain QA datasets Version 1.0 (tar.gz compression; 4,257,966 bytes): Download Form
cf.) To download datasets, you need to enter your name and email address, and agree to the download conditions.
　
The file names of each QA dataset are as follows:

dataset	use	file name
RC-QA	train	DDQA-1.0_RC-QA_train.json
RC-QA	dev	DDQA-1.0_RC-QA_dev.json
RC-QA	test	DDQA-1.0_RC-QA_test.json
PAS-QA (nominative)	train	DDQA-1.0_PAS-QA-NOM_train.json
PAS-QA (nominative)	dev	DDQA-1.0_PAS-QA-NOM_dev.json
PAS-QA (nominative)	test	DDQA-1.0_PAS-QA-NOM_test.json
PAS-QA (accusative)	train	DDQA-1.0_PAS-QA-ACC_train.json
PAS-QA (accusative)	dev	DDQA-1.0_PAS-QA-ACC_dev.json
PAS-QA (accusative)	test	DDQA-1.0_PAS-QA-ACC_test.json
PAS-QA (dative)	train	DDQA-1.0_PAS-QA-DAT_train.json
PAS-QA (dative)	dev	DDQA-1.0_PAS-QA-DAT_dev.json
PAS-QA (dative)	test	DDQA-1.0_PAS-QA-DAT_test.json

↑

Update history †

Version 1.0 - Released on 10/28/2019

↑

References †

Norio Takahashi, Tomohide Shibata, Daisuke Kawahara and Sadao Kurohashi.
Predicate-argument structure analysis based on a machine comprehension model in a specific domain,
In Proceedings of the 25th Annual Meeting of Natural Language Processing (in Japanese), 2019.
　https://www.anlp.jp/proceedings/annual_meeting/2019/pdf_dir/B1-4.pdf
　　cf.) Describes how to construct these datasets.
Norio Takahashi, Tomohide Shibata, Daisuke Kawahara and Sadao Kurohashi.
Machine Comprehension Improves Domain-Specific Japanese Predicate-Argument Structure Analysis,
In Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Workshop MRQA: Machine Reading for Question Answering, 2019.
　https://mrqa.github.io/assets/papers/42_Paper.pdf
　　cf.) Describes how to construct these datasets and the reason for choosing the data formats.
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev and Percy Liang.
SQuAD: 100,000+ Questions for Machine Comprehension of Text,
In EMNLP2016, pages 2383–2392.
　https://www.aclweb.org/anthology/D16-1264.pdf
　　cf.) Describes SQuAD1.0.
Pranav Rajpurkar, Robin Jia, and Percy Liang.
Know what you don’t know: Unanswerable questions for SQuAD,
In ACL2018, pages 784–789.
　https://www.aclweb.org/anthology/P18-2124.pdf
　　cf.) Describes SQuAD2.0.

↑

Contact †

If you have any questions or problems about these datasets, please send an email to nl-resource at nlp.ist.i.kyoto-u.ac.jp. If you have a request to add source information or to delete a document in the datasets, please send an email to this mail address.