Downloading Kyoto University Web Document Leads Corpus

Conditions for downloading the corpus:
This corpus consists of linguistically annotated Web documents that have been made publicly available on the Web at some time. The corpus is released for the purpose of contributing to the research of natural language processing.Since the collected documents are fragmentary, i.e., only the lead three sentences of each Web document, we have not obtained permission from copyright owners of the Web documents and do not provide source information such as URL. If copyright owners of Web documents request addition of source information or deletion of these documents, we will update the corpus. In this case, we will contact the mail address that was provided for download and ask you to delete the old version and update it.

To download the corpus, please fill in your name and mail address below and push the button "Agree the above conditions and download."