Kyoto University Web Document Leads Corpus

This is a Japanese text corpus that consists of lead three sentences of web documents with various linguistic annotations. It comprises approximately 2,500 documents (7,500 sentences) with annotations of morphology, named entities, dependencies, predicate-argument structures including zero anaphora and coreferences. These annotations were given by manually modifying automatic analyses of the morphological analyzer JUMAN and the dependency, case structure and anaphora analyzer KNP. This corpus also includes 10,000 documents (30,000 sentences) with discourse relations between clauses manually assigned via crowdsourcing.

Download

Kyoto University Web Document Leads Corpus will be released soon.

References


Front page   New List of pages Search Recent changes   Help   RSS of recent changes