What's New

We will hold an online briefing session (23 May, 2020)

We will hold a zoom-based online briefing session (open-lab) for those who are considering taking the entrance examination to be held in August 2020. Please submit the registration form to join!

We will present the following papers at ACL2020SRW

  • Haiyue Song, Raj Dabre, Zhuoyuan Mao, Fei Cheng, Sadao Kurohashi and Eiichiro Sumita:
    Pre-training via Leveraging Assisting Languages for Neural Machine Translation
  • Yu Tanaka, Yugo Murawaki, Daisuke Kawahara and Sadao Kurohashi:
    Building a Japanese Typo Dataset from Wikipedia's Revision History

Associate Professor Daisuke Kawahara moved to School of Fundamental Science and Engineering, Waseda University as a professor.

We will present the following papers at LREC2020 (2020/5)

  • Yudai Kishimoto, Yugo Murawaki and Sadao Kurohashi:
    Adapting BERT to Implicit Discourse Relation Classification with a Focus on Discourse Connectives
  • Ritsuko Iwai, Daisuke Kawahara, Takatsune Kumada and Sadao Kurohashi:
    Development of a Japanese Personality Dictionary based on Psychological Methods
  • Ritsuko Iwai, Daisuke Kawahara, Takatsune Kumada and Sadao Kurohashi:
    Acquiring Social Knowledge about Personality and Driving-related Behavior
  • Shuntaro Yada, Ayami Joh, Ribeka Tanaka, Fei Cheng, Eiji Aramaki and Sadao Kurohashi:
    Towards a Versatile Medical-Annotation Guideline Feasible Without Heavy Medical Knowledge: Starting From Critical Lung Diseases
  • Haiyue Song, Raj Dabre, Atsushi Fujita, Sadao Kurohashi:
    Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation
  • Zhuoyuan Mao, Fabien Cromieres, Raj Dabre, Haiyue Song and Sadao Kurohashi:
    JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation
  • Takashi Kodama, Ryuichiro Higashinaka, Koh Mitsuda, Ryo Masumura, Yushi Aono, Ryuta Nakamura, Noritake Adachi and Hidetoshi Kawabata:
    Generating Responses that Reflect Meta Information in User-Generated Question Answer Pairs

Research Overview

Language is the most reliable medium of human intellectual activities. Our objective is to establish the technology and academic discipline for handling and understanding language, in a manner that is as close as possible to that of humans, using computers. These include syntactic language analysis, semantic analysis, context analysis, text comprehension, text generation and dictionary systems to develop various application systems for machine translation and information retrieval.

Search Engine Infrastructure based on Deep Natural Language Processing

TSUBAKI.png

The essential purpose of information retrieval is not to retrieve just a relevant document but to acquire the information or knowledge in the document. We have been developing a next-generation infrastructure of information retrieval on the basis of the following techniques of deep natural language processing: precise processing based not on words but on predicate-argument structures, identifying the variety of linguistic expressions and providing a bird's-eye view of search results via clustering and interaction.

Machine Translation

EBMT.png

To bring automatic translation by computers to the level of human translation, we have been studying next-generation methodology of machine translation on the basis of text understanding and a large collection of translation examples. We have already accomplished practical translation on the domain of travel conversation, and constructed a translation-aid system that can be used by experts of patent translation.

Fundamental Studies on Text Understanding

To make computers understand language, it is essential to give computers world knowledge. This was a very hard problem ten years ago, but it has become possible to acquire knowledge from a massive amount of text in virtue of the drastic progress of computing power and network. We have successfully acquired linguistic patterns of predicate-argument structures from automatic parses of 7 billion Japanese sentences crawled from the Web using grid computing machines. By utilizing such knowledge, we study text understanding, i.e., recognizing the relationships between words and phrases in text.

Access