Kyoto University Text Corpus Version 4.0

This is a text corpus that is manually annotated with various linguistic information. It consists of approximately 40,000 sentences from Mainichi newspaper in 1995 with morphological and syntactic annotations. Out of these sentences, 5,000 sentences are annotated with predicate-argument structures including zero anaphora and coreferences.

Download

Note that the package does not include original sentences but include only annotation information. To recover the complete annotated corpus, it is necessary to obtain the Mainichi 1995 CD-ROM. The information of this CD-ROM is available at https://www.nichigai.co.jp/sales/corpus.html

Reference