Optimal Stem Identification in Presence of Suffix List

Vasudevan, N.; Bhattacharyya, Pushpak

doi:10.1007/978-3-642-28604-9_8

Optimal Stem Identification in Presence of Suffix List

N. Vasudevan¹⁷ &
Pushpak Bhattacharyya¹⁷

Conference paper

2030 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7181))

Abstract

Stemming is considered crucial in many NLP and IR applications. In the absence of any linguistic information, stemming is a challenging task. Stemming of words using suffixes of a language as linguistic information is in comparison an easier problem. In this work we considered stemming as a process of obtaining minimum number of lexicon from an unannotated corpus by using a suffix set. We proved that the exact lexicon reduction problem is NP-hard and came up with a polynomial time approximation. One probabilistic model that minimizes the stem distributional entropy is also proposed for stemming. Performances of these models are analyzed using an unannotated corpus and a suffix set of Malayalam, a morphologically rich language of India belonging to the Dravidian family.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hammarström, H., Borin, L.: Unsupervised learning of morphology. CL, 309–350 (2011)
Google Scholar
Goldsmith, J.A.: Unsupervised learning of the morphology of a natural language. CL (2), 153–198 (2001)
Google Scholar
Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. TSLP 4 (2007)
Google Scholar
Clark, A.: Partially supervised learning of morphology with stochastic transducers. In: NLPRS, pp. 341–348 (2001)
Google Scholar
Snover, M.G., Jarosz, G.E., Brent, M.R.: Unsupervised learning of morphology using a novel directed search algorithm: taking the first step. In: Proc. of ACL-WMPL 2002, pp. 11–20 (2002)
Google Scholar
Dreyer, M., Eisner, J.: Graphical models over multiple strings. In: Proc. of EMNLP 2009, pp. 101–110 (2009)
Google Scholar
Johnson, H., Martin, J.: Unsupervised learning of morphology for english and inuktitut. In: Proc. of NAACL-HLT 2003, pp. 43–45 (2003)
Google Scholar
Bosch, A.v.d., Daelemans, W.: Memory-based morphological analysis. In: Proc. of ACL 1999 (1999)
Google Scholar
Hammarström, H.: A naive theory of affixation and an algorithm for extraction. In: Proc. of HLT-NAACL 2006, pp. 79–88 (June 2006)
Google Scholar
Hammarström, H.: Poor Man’s Stemming: Unsupervised Recognition of Same-Stem Words. In: Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.) AIRS 2006. LNCS, vol. 4182, pp. 323–337. Springer, Heidelberg (2006)
Chapter Google Scholar
Monson, C., Carbonell, J.G., Lavie, A., Levin, L.S.: ParaMor and Morpho Challenge 2008. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G.J.F., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 967–974. Springer, Heidelberg (2009)
Chapter Google Scholar
Dasgupta, S., Ng, V.: High-performance, language-independent morphological segmentation. In: HLT-NAACL, pp. 155–163 (2007)
Google Scholar
Dasgupta, S., Ng, V.: Unsupervised morphological parsing of bengali. Language Resources and Evaluation, 311–330 (2006)
Google Scholar
Lawphongpanich, S.: Frank-wolfe algorithm. In: Encyclopedia of Optimization, pp. 1094–1097 (2009)
Google Scholar
David, S.M.I.P.S.: A morphological processor for malayalam language. Technical report, South Asia Research (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Engg Department, IIT Bombay, Mumbai, India
N. Vasudevan & Pushpak Bhattacharyya

Authors

N. Vasudevan
View author publications
You can also search for this author in PubMed Google Scholar
Pushpak Bhattacharyya
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research (CIC), National Polytechnic Institute (IPN), Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vasudevan, N., Bhattacharyya, P. (2012). Optimal Stem Identification in Presence of Suffix List. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28604-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-28604-9_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28603-2
Online ISBN: 978-3-642-28604-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics