From Phoneme to Morpheme: Another Verification Using a Corpus

Tanaka-Ishii, Kumiko; Jin, Zhihui

doi:10.1007/11940098_25

Kumiko Tanaka-Ishii²² &
Zhihui Jin²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4285))

Included in the following conference series:

International Conference on Computer Processing of Oriental Languages

1029 Accesses
1 Citations

Abstract

We scientifically test Harris’s hypothesis that morpheme/ word boundaries can be detected from changes in the complexity of phoneme sequences. We re-formulate his hypothesis from a more information theoretic viewpoint and use a corpus to test whether the hypothesis holds. We found that his hypothesis holds for morphemes, with an F-score of about 80%, in both English and Chinese. However, we obtained contrary results for English and Chinese with regard to word boundaries; this reflects a difference in the nature of the two languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Harris, S.: From phoneme to morpheme. Language, 190–222 (1955)
Google Scholar
Imai, K.: Dictionary of Chomsky. Taishukan (1986) (in Japanese)
Google Scholar
Martinet, A.: Elements de linguistique generale. Colin (1960)
Google Scholar
Jin, Z., Tanaka-Ishii, K.: Unsupervised segmentation of chinese text by use of braching entropy. In: COLLING/ACL (2006)
Google Scholar
Huang, H., Powers, D.: Chinese word segmentation based on contexual entropy. In: Pacific Asian Conference on Language, Information and Computation (2003)
Google Scholar
Frantzi, T., Ananiadou, S.: Extracting nested collocations. In: 16th COLING, pp. 41–46 (1996)
Google Scholar
Tanaka-Ishii, K., Nakagawa, H.: A multilingual usage consultation tool based on internet searching -More than a search engine, less than QA. In: WWW Conference, pp. 363–371 (2005)
Google Scholar
Tanaka-Ishii, K.: Entropy as an indicator of context boundaries —an experiment using a web search engine. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS, vol. 3651, pp. 93–105. Springer, Heidelberg (2005)
Chapter Google Scholar
Carnegie Mellon University: CMU pronouncing dictionary version 0.6 (2006) (visited 2006), http://www.speech.cs.cmu.edu/cgi-bin/cmudict
SIL: Pc-kimmo version 2, a morphologial parser (1995), http://www.sil.org/pckimmo/
ICL: People’s daily corpus, Beijing university (1999), http://www.icl.pku.edu.cn/icl_res/
NJStar Software Corp: Njstar, chinese word processing software (2006), http://www.njstar.com

Download references

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, University of Tokyo,
Kumiko Tanaka-Ishii & Zhihui Jin

Authors

Kumiko Tanaka-Ishii
View author publications
You can also search for this author in PubMed Google Scholar
Zhihui Jin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Information Science, Nara Institute of Science and Technology, 630-0192, Takayama, Ikoma, Nara, Japan
Yuji Matsumoto
Dept of ECE, University of Illinois at Urbana Champaign, IL 61801, Urbana, USA
Richard W. Sproat
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Kam-Fai Wong
State Key Lab of Intelligent Tech. & Sys., Tsinghua University,
Min Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tanaka-Ishii, K., Jin, Z. (2006). From Phoneme to Morpheme: Another Verification Using a Corpus. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_25

Download citation

DOI: https://doi.org/10.1007/11940098_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics