A Chinese word segmentation and POS tagging system for readability research

Chang, T. H.; Sung, Y. T.; Lee, Y. T.

A Chinese word segmentation and POS tagging system for readability research

dc.contributor	國立臺灣師範大學教育心理與輔導學系	zh_tw
dc.contributor.author	Chang, T. H.	en_US
dc.contributor.author	Sung, Y. T.	en_US
dc.contributor.author	Lee, Y. T.	en_US
dc.date.accessioned	2014-12-02T06:38:54Z
dc.date.available	2014-12-02T06:38:54Z
dc.date.issued	2012-11-15	zh_TW
dc.description.abstract	In recent years, readability research has relied on applications of natural language processing techniques to analyze documents. However, Chinese sentences consist of characters and with no blanks between words. Therefore, a mistake on word segmentation and/or part-of-speech tagging for Chinese sentences will result in many errors in the follow-up analysis. CRF model,is recently the most popular and successful method for Chinese word segmentation. However, due to such problems as reiterative locution, unknown words and incomplete sentences, many readings for children cannot be processed accurately by CRF model. This study aims to develop a Chinese word segmentation and POS tagging system called WeCan. This system is composed of bigram model, SPLR algorithm, unknown words extraction and rule bases. WeCan has been applied to the preprocessing procedure of CRIE. In preliminary experiments, it also worked well on the elementary school textbook in Taiwan.	en_US
dc.identifier	ntnulib_tp_A0201_02_060	zh_TW
dc.identifier.uri	http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/40798
dc.language	en_US	zh_TW
dc.relation	42nd Annual Meeting of the Society for Computers in Psychology (SCiP 2012), Minnesota, U.S.A.	en_US
dc.title	A Chinese word segmentation and POS tagging system for readability research	en_US

Collections

教師著作

A Chinese word segmentation and POS tagging system for readability research

Files

Collections