A Chinese Treebank in Scientific Domain (SCTB) - LANGUAGE MEDIA PROCESSING LAB

SCTB: A Chinese Treebank in Scientific Domain †

Update History †

2021/9/8 Distribute the second version (V2)
2016/10/20 Distribute the first version (V1)

Description †

SCTB is a phrase structure based Chinese treebank. The raw sentences are selected from the LCAS (National Science Library, Chinese Academy of Sciences) Chinese scientific paper corpus provided by Japan Science and Technology Agency (JST). Our annotation process follows that of the Penn Chinese treebank (CTB) with an exception of the "segmentation standard". We apply a Chinese word segmentation standard based on "short and consistent units" (Shen et al., 2016). The first version of release contains 5,133 sentences (138,781 words). The second version of release contains 12,175 sentences (328,562 words). This work is supported by the JST MT project "Project on Practical Implementation of Japanese to Chinese-Chinese to Japanese Machine Translation."

↑

Sample †

( (IP (NP (NT 近年)) (NP (NP (NN 稻瘟) (SFN 病)) (NP (NN 发生))) (VP (VV 呈) (NP (CP (IP (VP (VV 加重)))) 
(NP (NN 趋势)))) (PU 。)) )

( (IP (VP (VV 评价) (NP (NP (QP (CD 两)) (NP (NN 组))) (NP (NP (NN 疗效)) (CC 及) (NP (NN 安全) (SFN 性))))) 
(PU 。 )) )

( (IP (NP (NN 结果)) (VP (VV 表明) (PU ：) (IP (NP (NP (NN 芝麻)) (NP (NP (DP (DT 全)) (NP (VV 生育) (SFN 期)))
(NP (NN 长短)))) (PU ，) (VP (PP (P 与) (NP (NN 温度))) (VP (VV 呈) (NP (PFA 正) (NN 相关)))))) (PU ；)) )

↑

License †

Copyright (C) Kurohashi-Kawahara Lab. and JST. You can use all the data under the terms of the Creative Commons Attribution 3.0 Unported license.

↑

Download †

Please fill this form for download.

↑

Reference †

Chenhui Chu, Toshiaki Nakazawa, Daisuke Kawahara and Sadao Kurohashi.
SCTB: A Chinese Treebank in Scientific Domain,
Proceedings of the 12th Workshop on Asian Language Resources (ALR12 2016), Osaka, Japan, (2016.12)

Mo Shen, Wingmui Li, HyunJeong Choe, Chenhui Chu, Daisuke Kawahara and Sadao Kurohashi.
Consistent Word Segmentation, Part-of-Speech Tagging and Dependency Labelling Annotation for Chinese Language,
Proceedings of the 26th International Conference on Computational Linguistics (COLING2016), Osaka, Japan, (2016.12).

↑

Contact and Bug Report †

MAIL: nl-resource at nlp.ist.i.kyoto-u.ac.jp