moshen

Mo Shen

Ph.D. Student at Kyoto University
S208, Eng. Bldg. No.3, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto, 606-8501, Japan
E-mail: msmoshen at gmail.com

Last update: 08/18/2014

Summary

I am currently a Ph.D. candidate at the Language & Knowledge Engineering Lab at the Graduate School of Informatics, Kyoto University. My advisor is Prof. Sadao Kurohashi. My research interests include syntactic parsing, morphological analysis for Asian languages, and cognitive modeling of language.

Education

Kyoto University; Doctor of Philosophy (Ph.D.), Computational Linguistics, 2012 – 2015 (Expected)
: Kyoto University; Master of Science (M.S.), Computational Linguistics, 2010 - 2012; Hong Kong Baptist University; Bachelor of Science (B.S.), Mathematics, 2006 – 2010

Publications (International Journals and Conferences, Peer Reviewed)

Mo Shen, Daisuke Kawahara and Sadao Kurohashi. 2014. Dependency Parse Reranking with Rich Subtree Features. IEEE Transactions on Audio, Speech, and Language Processing, 22(7): 1208-1218.

Mo Shen, Hongxiao Liu, Daisuke Kawahara, and Sadao Kurohashi. 2014. Chinese Morphological Analysis with Character-level POS Tagging. In proceedings of the 52th Annual Meeting of the Association for Computational Linguistics (ACL 2014), Short Paper, pages 253–258, Baltimore, USA.

Mo Shen, Daisuke Kawahara, and Sadao Kurohashi. 2013. Chinese Word Segmentation by Mining Maximized Substrings. In proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP 2013), pages 171-179, Nagoya, Japan.

Mo Shen, Daisuke Kawahara and Sadao Kurohashi. 2012. A Reranking Approach for Dependency Parsing with Variable-sized Subtree Features. In proceedings of 26th Pacific Asia Conference on Language Information and Computing (PACLIC 26), pages 308-317, Bali, Indonesia.

Publications (Domestic Conference)

Mo Shen, Daisuke Kawahara and Sadao Kurohashi. 2014. Chinese Unknown Word Extraction by Mining Maximized Substrings. In proceedings of the 20th Annual Meeting of the Association for Computational Linguistics (NLP2014), pp.384-387, Sapporo, Japan.

Mo Shen, Daisuke Kawahara and Sadao Kurohashi. 2013. Dependency Parse Reranking Based-on Subtree Extraction. In proceedings of the 19th Annual Meeting of the Association for Computational Linguistics (NLP2013), pp.58-61, Nagoya, Japan.

Presentations

Towards Fully Lexicalized Dependency Parsing for Korean. At the 13th International Conference on Parsing Technologies (IWPT2013), Nara, Japan. 2013/11.

A Reranking Approach for Dependency Parsing with Variable-sized Subtree Features. At Microsoft Research Forum 2012 at Kyoto University, Kyoto, Japan. 2012/12.

Dependency Subtree Reranking with Rich Subtree-based Features. At Kyoto University 35th IST Seminar, Kyoto, Japan. 2012/07.

Professional Activity

Reviewer, IEEE Transactions on Audio, Speech, and Language Processing, 2014.

Software

SKP

A high-performance multilingual dependency parser written in c++, developed as a crucial component of the Kyoto Example-Based Dependency-to-Dependency Translation Framework (KyotoEBMT).

KyotoMorph

A joint Chinese word segmentation and part-of-speech tagging system written in c++, featuring a semi-supervised segmentation technique which explores large-scale texts for word boundary information, and an unknown word extractor which performs efficient Chinese word extraction and automatic lexicon compilation from web texts.

CUWE

A Chinese unknown word extractor, which can efficiently scan and choose reliable word candidates from a million sentences in a couple of minutes. The output can be directly compiled into a machine-readable dictionary that benefits other language processing systems.

Language Resources

Kyoto-U Chinese Web Corpus

A Chinese corpus automatically built and maintained using web texts, which currently contains over 2 billion sentences labeled with word segmentation, part-of-speech tagging, chunking, and dependency parsing information.

Chinese Treebank with CharPOS

An augmented version of Penn Chinese Treebank 5.0 (CTB5) with full character-level part-of-speech annotation.

Awards

2009: Honorable Mention in the 2009 Mathematical Contest in Modeling (MCM2009)

2010: Japanese Government (MEXT) Scholarship

2012: MEXT Honors Scholarship for Privately Financed International Students

2014: Murata Scholarship

Programming Skills

Code on a daily basis: C++, Python, Perl

Familiar with: Java, C

Code as a hobby: Matlab, Prolog

Language Proficiency

Chinese:

Native

English:

TOEIC	Score: 990/990 (2014/07)
TOEFL	Score: 100/120 (2009/08)

Japanese:

JLPT N1

Score: 180/180 (2013/12)