What's New

We will present the following papers at EMNLP 2025 (2025/11).

  • Zhengdong Yang, Zhen Wan, Sheng Li, Chao-Han Huck Yang, Chenhui Chu:
    CoVoGER: A Multilingual Multitask Benchmark for Speech-to-text Generative Error Correction with Large Language Models
  • Yoshiki Takenami, Yin Jou Huang, Yugo Murawaki, Chenhui Chu:
    How Does Cognitive Bias Affect Large Language Models? A Case Study on the Anchoring Effect in Price Negotiation Simulations (Findings)
  • Yang Liu, Chenhui Chu:
    Do LLMs Align Human Values Regarding Social Biases? Judging and Explaining Social Biases with LLMs (Findings)
  • Jivnesh Sandhan, Fei Cheng, Tushar Sandhan, and Yugo Murawaki:
    From Disney-World to Reality: A Context-Dependent Testbed for Personality Assessment of Large Language Models (Findings)
  • Ruiyi Yan and Yugo Murawaki:
    Addressing Tokenization Inconsistency in Steganography and Watermarking Based on Large Language Models.
  • Yin Jou Huang, Rafik Hadfi:
    Beyond Self-Reports: Multi-Observer Agents for Personality Assessment in Large Language Models (Findings)
  • Sakiko Yahata, Zhen Wan, Fei Cheng, Sadao Kurohashi, Hisahiko Sato, Ryozo Nagai: Causal Tree Extraction from Medical Case Reports:
    A Novel Task for Experts-like Text Comprehension
  • Kazuma Kobayashi, Zhen Wan, Fei Cheng, Yuma Tsuta, Xin Zhao, Junfeng Jiang, Jiahao Huang, Zhiyi Huang, Yusuke Oda, Rio Yokota, Yuki Arase, Daisuke Kawahara, Akiko Aizawa, Sadao Kurohashi:
    Leveraging High-Resource English Corpora for Cross-lingual Domain Adaptation in Low-Resource Japanese Medicine via Continued Pre-training (Findings)

We will present the following papers at Interspeech 2025 (2025/8).

  • Zaid Sheikh, Shuichiro Shimizu, Siddhant Arora, Jiatong Shi, Samuele Cornell, Xinjian Li, Shinji Watanabe:
    Scalable Spontaneous Speech Dataset (SSSD): Crowdsourcing Data Collection to Promote Dialogue Research.
  • Brian Yan, Injy Hamed, Shuichiro Shimizu, Vasista Lodagala, William Chen, Olga Iakovenko, Bashar Talafha, Amir Hussein, Alexander Polok, Kalvin Chang, Dominik Klement, Sara Althubaiti, Puyuan Peng, Matthew Wiesner, Thamar Solorio, Ahmed Ali, Sanjeev Khudanpur, Shinji Watanabe, Chih-Chen Chen, Zhen Wu, Karim Benharrak, Anuj Diwan, Samuele Cornell, Eunjung Yeo, Kwanghee Choi, Carlos Carvalho, Karen Rosero:
    CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset.

We will present the following papers at ACL 2025 (2025/7-8).

  • Zhen Wan, Chao-Han Huck Yang, Yahan Yu, Jinchuan Tian, Sheng Li, Ke Hu, Zhehuai Chen, Shinji Watanabe, Fei Cheng, Chenhui Chu, Sadao Kurohashi:
    SIQ: Exterminating Speech Intelligence Quotient Cross Cognitive Levels in Voice Understanding Large Language Models
  • Zhengdong Yang, Sheng Li, Chenhui Chu:
    Generative Error Correction for Emotion-aware Speech-to-text Translation (Findings)
  • Zhengdong Yang, Shuichiro Shimizu, Yahan Yu, Chenhui Chu:
    When Large Language Models Meet Speech: A Survey on Integration Approaches (Findings)
  • Chengzhi Zhong, Qianying Liu, Fei Cheng, Junfeng Jiang, Zhen Wan, Chenhui Chu, Yugo Murawaki, Sadao Kurohashi:
    What Language Do Non-English-Centric Large Language Models Think in? (Findings)
  • Yahan Yu, Duzhen Zhang, Yong Ren, Xuanle Zhao, Xiuyi Chen, Chenhui Chu:
    Progressive LoRA for Multimodal Continual Instruction Tuning (Findings)
  • Qianying Liu, Katrina Qiyao Wang, Fei Cheng, Sadao Kurohashi:
    7 Points to Tsinghua but 10 Points to 清华? Assessing Large Language Models in Agentic Multilingual National Bias (Findings)
  • Yen-Ting Lin, Zhehuai Chen, Piotr Zelasko, Zhen Wan, Xuesong Yang, Zih-Ching Chen, Krishna C Puvvada, Ke Hu, Szu-Wei Fu, Jun Wei Chiu, Jagadeesh Balam, Boris Ginsburg, Yu-Chiang Frank Wang, Chao-Han Huck Yang:
    NeKo: Cross-Modality Post-Recognition Error Correction with Tasks-Guided Mixture-of-Experts Language Model (Industry)

The following work received the Best Presentaiton Award at the 13th AAMT for young Translation Research Group

  • Chengzhi Zhong, Fei Cheng, Qianying Liu, Junfeng Jiang, Zhen Wan, Chenhui Chu, Yugo Murawaki, Sadao Kurohashi:
    What language do Japanese-specialized large language models think in?

We will hold a briefing session (2025/5/10)

We will present the following paper at NAACL 2025 (2025/4-5) .

  • Siddhant Arora, Yifan Peng, Jiatong Shi, Jinchuan Tian, William Chen, Shikhar Bharadwaj, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Shuichiro Shimizu, Vaibhav Srivastav, Shinji Watanabe:
    ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems (System Demonstrations)
  • Shiho Matta, Yin Jou Huang, Fei Cheng, Hirokazu Kiyomaru, Yugo Murawaki:
    Optimizing Cost-Efficiency with LLM-Generated Training Data for Conversational Semantic Frame Analysis (Joint SIGHUM Workshop, LaTeCH-CLfL2025)

Research Overview

Language is the most reliable medium of human intellectual activities. Our objective is to establish the technology and academic discipline for handling and understanding language, in a manner that is as close as possible to that of humans, using computers. These include syntactic language analysis, semantic analysis, context analysis, text comprehension, text generation and dictionary systems to develop various application systems for machine translation and information retrieval.

Search Engine Infrastructure based on Deep Natural Language Processing

TSUBAKI.png

The essential purpose of information retrieval is not to retrieve just a relevant document but to acquire the information or knowledge in the document. We have been developing a next-generation infrastructure of information retrieval on the basis of the following techniques of deep natural language processing: precise processing based not on words but on predicate-argument structures, identifying the variety of linguistic expressions and providing a bird's-eye view of search results via clustering and interaction.

Machine Translation

EBMT.png

To bring automatic translation by computers to the level of human translation, we have been studying next-generation methodology of machine translation on the basis of text understanding and a large collection of translation examples. We have already accomplished practical translation on the domain of travel conversation, and constructed a translation-aid system that can be used by experts of patent translation.

Fundamental Studies on Text Understanding

To make computers understand language, it is essential to give computers world knowledge. This was a very hard problem ten years ago, but it has become possible to acquire knowledge from a massive amount of text in virtue of the drastic progress of computing power and network. We have successfully acquired linguistic patterns of predicate-argument structures from automatic parses of 7 billion Japanese sentences crawled from the Web using grid computing machines. By utilizing such knowledge, we study text understanding, i.e., recognizing the relationships between words and phrases in text.

Policy Regarding Acceptance of Students from Outside

Prof. Kurohashi does not supervise new students. Assoc. Prof. Murawaki or Assoc. Prof. Chu will be responsible for supervising their research.

Master course

PhD course

Access