Period Disambiguation with Maxent Model

Kit, Chunyu; Liu, Xiaoyue

doi:10.1007/11562214_20

Chunyu Kit²² &
Xiaoyue Liu²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Included in the following conference series:

International Conference on Natural Language Processing

1543 Accesses
1 Citations

Abstract

This paper presents our recent work on period disambiguation, the kernel problem in sentence boundary identification, with the maximum entropy (Maxent) model. A number of experiments are conducted on PTB-II WSJ corpus for the investigation of how context window, feature space and lexical information such as abbreviated and sentence-initial words affect the learning performance. Such lexical information can be automatically acquired from a training corpus by a learner. Our experimental results show that extending the feature space to integrate these two kinds of lexical information can eliminate 93.52% of the remaining errors from the baseline Maxent model, achieving an F-score of 99.8227%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aberdeen, J., Burger, J., Day, D., Hirschman, L., Robinson, P., Vilain, M.: Mitre: Description of the alembic system used for muc-6. In: Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia, Maryland. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Berger, A., Pietra, S.D., Pietra, V.D.: A maximum entropy approach to natural language processing. Computational linguistics 22(1), 39–71 (1996)
Google Scholar
Della Pietra, S., Della Pietra, V., Lafferty, J.: Inducing features of random fields. Transactions Pattern Analysis and Machine Intelligence 19(4), 380–393 (1997)
Article Google Scholar
Malouf, R.: A comparison of algorithms for maximum entropy parameter estimation. In: Proceedings of CoNLL-2002, Taipei, Taiwan, pp. 49–55 (2002)
Google Scholar
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: The penn treebank. Computational Linguistics 19(2), 313–329 (1993)
Google Scholar
Mikheev, A.: Tagging sentence boundaries. In: Proceedings of the First Meeting of the North American Chapter of the Association for Computational Linguistics, NAACL 2000 (2000)
Google Scholar
Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
MATH Google Scholar
Palmer, D.D., Hearst, M.A.: Adaptive Multilingual Sentence Boundary Disambiguation. Computational Linguistics 23(2), 241–267 (1997)
Google Scholar
Ratnaparkhi, A.: Maximum entropy models for natural language ambiguity resolution. Ph.D. dissertation, University of Pennsylvania (1998)
Google Scholar
Reynar, J.C., Ratnaparkhi, A.: A maximum entropy approach to identifying sentence boundaries. In: Proceedings of the Fifth Conference on Applied Natural Language Processing, Washington, D.C. (1997)
Google Scholar
Riley, M.D.: Some applications of tree-based modelling to speech and language indexing. In: Proceedings of the DARPA Speech and Natural Language Workshop, pp. 339–352. Morgan Kaufmann (1989)
Google Scholar
Rosenfeld, R.: Adaptive statistical language modeling: A Maximum Entropy Approach. PhD thesis CMU-CS-94 (1994)
Google Scholar
Van Rijsbergen, C.J.: Information Retrieval. Butterworths, London (1979)
Google Scholar
Wallach, H.M.: Efficient training of conditional random fields. Master’s thesis, University of Edinburgh (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Chinese, Translation and Linguistics, City University of Hong Kong, 83 Tat Chee Ave., Kowloon, Hong Kong
Chunyu Kit & Xiaoyue Liu

Authors

Chunyu Kit
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyue Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Language Technology, Macquarie University, 2019, Sydney, NSW, Australia
Robert Dale
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Kam-Fai Wong
Institute for Infocomm Research, 21, Heng Mui Keng Terrace, 119613, Singapore
Jian Su
Language Information Sciences Research Centre, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
Oi Yee Kwong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kit, C., Liu, X. (2005). Period Disambiguation with Maxent Model. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_20

Download citation

DOI: https://doi.org/10.1007/11562214_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics