What's New

We will present the following papers at ACL 2024 (2024/8) .

  • Sirou Chen, Sakiko Yahata, Shuichiro Shimizu, Zhengdong Yang, Yihang Li, Chenhui Chu, Sadao Kurohashi:
    MELD-ST: An Emotion-aware Speech Translation Dataset (Findings)
  • Yahan Yu, Duzhen Zhang, Xiuyi Chen, Chenhui Chu:
    Flexible Weight Tuning and Weight Fusion Strategies for Continual Named Entity Recognition (Findings)
  • Duzhen Zhang, Yahan Yu, Chenxing Li, Jiahua Dong, Dan Su, Chenhui Chu, Dong Yu:
    MM-LLMs: Recent Advances in MultiModal Large Language Models (Findings)
  • Zhen Wan, Yating Zhang, Yexiang Wang, Fei Cheng, Sadao Kurohashi:
    Reformulating Domain Adaptation of Large Language Models as Adapt-Retrieve-Revise: A Case Study on Chinese Legal Domain (Findings)

We will hold a briefing session (2024/5/11)

The following paper received the FY2023 best paper award of the Journal of Natural Language Processing (2024/3)

  • Kazumasa Omura, Daisuke Kawahara, and Sadao Kurohashi:
    Building a Commonsense Inference Dataset based on Basic Events and its Application

We will present the following papers at LREC-COLING 2024 (2024/5).

  • Taishi Chika, Taro Okahisa, Takashi Kodama, Yin Jou Huang, Yugo Murawaki and Sadao Kurohashi:
    Domain Transferable Semantic Frames for Expert Interview Dialogues
  • Yikun Sun, Zhen Wan, Nobuhiro Ueda, Sakiko Yahata, Fei Cheng, Chenhui Chu and Sadao Kurohashi:
    Rapidly Developing High-quality Instruction Data and Evaluation Benchmark for Large Language Models with Minimal Human Effort: A Case Study on Japanese
  • Norizo Sakaguchi, Yugo Murawaki, Chenhui Chu and Sadao Kurohashi:
    Identifying Source Language Expressions for Pre-editing in Machine Translation
  • Nobuhiro Ueda, Hideko Habe, Yoko Matsui, Akishige Yuguchi, Seiya Kawano, Yasutomo Kawanishi, Sadao Kurohashi and Koichiro Yoshino: J-CRe3:
    A Japanese Conversation Dataset for Real-world Reference Resolution
  • Hao Wang, Tang Li, Chenhui Chu, Rui Wang and Pinpin Zhu:
    Towards Human-Like Machine Comprehension: Few-Shot Relational Learning in Visually-Rich Documents
  • Rikito Takahashi, Hirokazu Kiyomaru, Chenhui Chu and Sadao Kurohashi:
    Abstractive Multi-Video Captioning: Benchmark Dataset Construction and Extensive Evaluation
  • Yugo Murawaki:
    Principal Component Analysis as a Sanity Check for Bayesian Phylolinguistic Reconstruction
  • Kazumasa Omura, Fei Cheng and Sadao Kurohashi:
    An Empirical Study of Synthetic Data Generation for Implicit Discourse Relation Recognition
  • Xiaotian Lu, Jiyi Li, Zhen Wan, Xiaofeng Lin, Koh Takeuchi and Hisashi Kashima:
    Evaluating Saliency Explanations in NLP by Crowdsourcing
  • Francois Meyer, Haiyue Song, Abhisek Chakrabarty, Jan Buys, Raj Dabre and Hideki Tanaka:
    NGLUEni: Benchmarking and Adapting Pretrained Language Models for Nguni Languages

Research Overview

Language is the most reliable medium of human intellectual activities. Our objective is to establish the technology and academic discipline for handling and understanding language, in a manner that is as close as possible to that of humans, using computers. These include syntactic language analysis, semantic analysis, context analysis, text comprehension, text generation and dictionary systems to develop various application systems for machine translation and information retrieval.

Search Engine Infrastructure based on Deep Natural Language Processing

TSUBAKI.png

The essential purpose of information retrieval is not to retrieve just a relevant document but to acquire the information or knowledge in the document. We have been developing a next-generation infrastructure of information retrieval on the basis of the following techniques of deep natural language processing: precise processing based not on words but on predicate-argument structures, identifying the variety of linguistic expressions and providing a bird's-eye view of search results via clustering and interaction.

Machine Translation

EBMT.png

To bring automatic translation by computers to the level of human translation, we have been studying next-generation methodology of machine translation on the basis of text understanding and a large collection of translation examples. We have already accomplished practical translation on the domain of travel conversation, and constructed a translation-aid system that can be used by experts of patent translation.

Fundamental Studies on Text Understanding

To make computers understand language, it is essential to give computers world knowledge. This was a very hard problem ten years ago, but it has become possible to acquire knowledge from a massive amount of text in virtue of the drastic progress of computing power and network. We have successfully acquired linguistic patterns of predicate-argument structures from automatic parses of 7 billion Japanese sentences crawled from the Web using grid computing machines. By utilizing such knowledge, we study text understanding, i.e., recognizing the relationships between words and phrases in text.

Policy Regarding Acceptance of Students from Outside

Master course

PhD course

Access