CRF Models for Tamil Part of Speech Tagging and Chunking

Pandian, S. Lakshmana; Geetha, T. V.

doi:10.1007/978-3-642-00831-3_2

S. Lakshmana Pandian²¹ &
T. V. Geetha²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5459))

Included in the following conference series:

International Conference on Computer Processing of Oriental Languages

897 Accesses
12 Citations

Abstract

Conditional random fields (CRFs) is a framework for building probabilistic models to segment and label sequence data. CRFs offer several advantages over hidden Markov models (HMMs) and stochastic grammars for such tasks, including the ability to relax strong independence assumptions made in those models. CRFs also avoid a fundamental limitation of maximum entropy Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states with few successor states. In this paper we propose the Language Models developed for Part Of Speech (POS) tagging and chunking using CRFs for Tamil. The Language models are designed based on morphological information. The CRF based POS tagger has an accuracy of about 89.18%, for Tamil and the chunking process performs at an accuracy of 84.25% for the same language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cutting, D., Kupiec, J., Pederson, J., Sibun, P.: A practical part-of-speech tagger. In: Proc. of the 3rd Conference on Applied NLP, pp. 133–140 (1992)
Google Scholar
Ratnaparkhi, A.: Learning to parse natural language with maximum entropy models. Machine Learning 34 (1999)
Google Scholar
Sha, F., Pereira, F.: Shallow Parsing with Conditional Random Fields. In: The Proceedings of HLT-NAACL (2003)
Google Scholar
Freitag, D., McCallum, A.: Information extraction with HMM structures learned by stochastic optimization. In: Proc. AAAI (2000)
Google Scholar
Bikel, D.M., Schwartz, R.L., Weischedel, R.M.: An algorithm that learns what’s in a name. Machine Learning 34, 211–231 (1999)
Article Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random _elds: Probabilistic models for segmenting and labeling sequence data. In: Proc. ICML 2001, pp. 282–289 (2001)
Google Scholar
Koeling, R.: Chunking with Maximum Entropy Models. In: Proceedings of CoNLL 2000, Lisbon, Portugal (2000)
Google Scholar
Pattabhi, R.K., Rao, T., Vijay Sundar Ram, R., Vijayakrishna, R., Sobha, L.: A Text Chunker and Hybrid POS Tagger for Indian Languages. In: Proceedings of the IJCAI 2007 Workshop On Shallow Parsing for South Asian Languages (SPSAL 2007), Hyderabad, India (2007)
Google Scholar
Brill, E.: Transformation-based error driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics (1995)
Google Scholar
Garside, R.: The CLAWS Word-tagging System. In: Garside, R., Leech, G., Sampson, G. (eds.) The Computational Analysis of English: A Corpus-based Approach. Longman, London (1987)
Google Scholar
Daelemans, W., Zavrel, J., Berck, P., Gillis, S.: MBT: A Memory-Based Part ofSpeech Tagger-Generator. In: Proceedings of the Fourth Workshop on Very Large Corpora, Copenhagen, Denmark, pp. 14–27 (1996)
Google Scholar
Olde, B.A., Hoener, J., Chipman, P., Graesser, A.C.: The Tutoring Research Group A Connectionist Model for Part of Speech Tagging. In: Proceedings of the 12th International Florida Artificial Intelligence Research Society Conference, Menlo Park, CA, pp. 172–176 (1999)
Google Scholar
Marques, N., Lopes, J.G.: Using Neural Nets for Portuguese Part-of-Speech Tagging. In: Proceedings of the Fifth International Conference on The Cognitive Science of Natural Language Processing, Dublin City University (1996)
Google Scholar
Ratnaparkhi, A.: Maximum Entropy Model For Natural Language Ambiguity Resolution, Dissertation in Computer and Information Science, University Of Pennslyvania (1998)
Google Scholar
Punyakanok, V., Roth, D.: The use of classifiers in sequential inference. In: NIPS, vol. 13, pp. 995–1001. MIT Press, Cambridge (2001)
Google Scholar
Abney, S., Schapire, R.E., Singer, Y.: Boosting applied totagging and PP attachment. In: Proc. EMNLP-VLC, NewBrunswick, New Jersey, ACL (1999)
Google Scholar
Kudo, T., Matsumoto, Y.: Chunking with. support vector machines. In: Proceedings of NAACL, pp. 192–199 (2001)
Google Scholar
CRF++: Yet Another Toolkit, http://chasen.org/~taku/software/CRF++
Lafferty, J.: Andrew McCallum and Fernando Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proc. of the International Conference on Machine Learning (ICML) (2001)
Google Scholar
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. In: Proceedings of the IEEE. IEEE, Los Alamitos (1989) IEEE Log Number 8825949
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Anna University, Patel Road, Chennai, 25, India
S. Lakshmana Pandian & T. V. Geetha

Authors

S. Lakshmana Pandian
View author publications
You can also search for this author in PubMed Google Scholar
T. V. Geetha
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
Wenjie Li
Division of Information and Communication Sciences, Macquarie University, NSW 2109, Sydney, Australia
Diego Mollá-Aliod

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pandian, S.L., Geetha, T.V. (2009). CRF Models for Tamil Part of Speech Tagging and Chunking. In: Li, W., Mollá-Aliod, D. (eds) Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy. ICCPOL 2009. Lecture Notes in Computer Science(), vol 5459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00831-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-00831-3_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00830-6
Online ISBN: 978-3-642-00831-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics