Abstract
In this paper we describe an unsupervised learning algorithm for automatically training a rule-based part of speech tagger without using a manually tagged corpus. We compare this algorithm to the Baum-Welch algorithm, used for unsupervised training of stochastic taggers. Next, we show a method for combining unsupervised and supervised rule-based training algorithms to create a highly accurate tagger using only a small amount of manually tagged text1.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baum, L. 1972. An inequality and associated maximization technique in statistical estimation for probabilistic functions of a Markov process. Inequalities, 3: 1–8.
Black, E., Jelinek, F., Lafferty, J., Mercer, R. and Roukos, S. 1992. Decision tree models applied to the labeling of text with parts-of-speech. In Darpa Workshop on Speech and Natural Language Harriman, N.Y.
Brill, E. and Resnik, P. 1994. A transformation-based approach to prepositional phrase attachment disambiguation. In Proceedings of the Fifteenth International Conference on Computational Linguistics (COLING-1994),Kyoto, Japan.
Brill, E. 1993. Automatic grammar induction and parsing free text: A transformation-based approach. In Proceedings of the 31st Meeting of the Association of Computational Linguistics, Columbus, OH, pp. 259–265.
Brill, E. 1995. Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics, 21 (4): 543–565.
Charniak, E., Hendrickson, C.,.Jacobson, N. and Perkowitz, M. 1993. Equations for part. of speech tagging. In Proceedings of the Conference of the American A.s.sociation for Artificial Intelligence (AA AI-93)
Church, K. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the Second Conference on Applied Natural Language Processing, ACL, pp. 136–143.
Cutting, D., Kupiec, J., Pedersen, J. and Sibun, P. 1992. A practical part-of-speech tagger. In Proceedings of the Third Conference on Applied Natural Language. Processing, ACL, Trento, Italy, pp. 133–140.
DeMarcken, C. 1990. Parsing the lob corpus. In Proceedings of the 28th Annual Meeting of the Association for Computational Linguistics, pp. 243–251.
DeR.ose, S. 1988. Grammatical category disambiguation by statistical optimization. Computational Linguistics, 14 (1): 31–39.
Elworthy, D. 1994. Does Baum-Welch re-estimation help taggers. In Proceedings of the Fourth Conference on Applied Natural Language Processing, ACL. Stuttgart, Germany, pp. 53–58.
Francis, W. and Kucera, H. 1982. Frequency analysis of English usage: Lexicon, and grammar. Houghton Mifflin, Boston.
Green, B. and Rubin, G. 1971. Automated grammatical tagging of english. Technical report, Department of Linguistics, Brown University.
Harris, Z. 1962. String Analysis of Language Structure. Mouton and Co., The Hague.
Hindle, D. 1989. Acquiring disambiguation rules from text. In Proceedings of the 27th. Annual Meeting of the Association for Computational Linguistics, pp. 118 125.
Huang, C’., Son-Bell, M. and Baggett, D. 1994. Generation of pronunciations from orthographies using transformation-based error-driven learning. In International Conference on Speech and Language Processing (ICSLP) Yokohama, Japan.
Ielinek, F. 1985. Self-Organized Language Modelling for Speech Recognition. Nijhoff, Dordrecht. In J. Skwirzinski (ed). Impact of Processing Techniques on Communication
Klein, S. and Simmons, R. 1963. A computational approach to grammatical coding of English words. JA CM, 10.
Kupiec, J. 1992. Robust part-of-speech tagging using a hidden Markov model. Computer Speech and Language, 6.
Lin, Y., Chiang, T. and Su, K. 1994. Automatic model refinement with an application to tagging. In Proceedings of the 15th International Conference on Computational Linguistics
Marcus, M., Santorini, B. and Marcinkiewicz, M. 1993. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, 19 (2): 313–330.
Merialdo, B. 1994. Tagging english text with a probabilistic model. Computational Linguistics, 20 (2): 155–171.
R.amshaw, L. and Marcus, M. 1994. Exploring the statistical derivation of transformational rule sequences for part-of-speech tagging. In The Balancing Act: Proceedings of the ACL Workshop on Combining Symbolic and Statistical Approaches to Language, New Mexico State University, pp. 86–95.
Roche, E. and Schabes, Y. 1995. Deterministic part of speech tagging with finite state transducers. Computational Linguistics, 21 (2): 227–253.
Schutze, H. and Singer, Y. 1994. Part of speech tagging using a variable memory Markov model. In Proceedings of the Association for Computational Linguistics, Las Cruces, NM, pp. 181–187.
Weischedel, R., Meteer, M., Schwartz, R., Ramshaw, L. and Palmucci, J. 1993. Coping with ambiguity and unknown words through probabilistic models. Computational Linguistics, 19 (2): 359–382.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Brill, E., Pop, M. (1999). Unsupervised Learning of Disambiguation Rules for Part-of-Speech Tagging. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds) Natural Language Processing Using Very Large Corpora. Text, Speech and Language Technology, vol 11. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2390-9_3
Download citation
DOI: https://doi.org/10.1007/978-94-017-2390-9_3
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-5349-7
Online ISBN: 978-94-017-2390-9
eBook Packages: Springer Book Archive