Hidden Markov Models with Affix Based Observation in the Field of Syntactic Analysis

Pietras, Marcin

doi:10.1007/978-3-319-48429-7_2

Marcin Pietras⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 534))

Included in the following conference series:

International Multi-Conference on Advanced Computer Systems

846 Accesses

Abstract

This paper introduces Hidden Markov Models with N-gram observation based on words bound morphemes (affixes) used in natural language text processing focusing on the field of syntactic classification. In general, presented curtailment of the consecutive gram’s affixes, decreases the accuracy in observation, but reveals statistically significant dependencies. Hence, considerably smaller size of the training data set is required. Therefore, the impact of affix observation on the knowledge generalization and associated with this improved word mapping is also described. The focal point of this paper is the evaluation of the HMM in the field of syntactic analysis for English and Polish language based on Penn and Składnica treebank. In total, a 10 HMM differing in the structure of observation has been compared. The experimental results show the advantages of particular configuration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Trigram HMM-Based POS Tagger for Indian Languages

Parse Tree Generation Using HMM Bigram Model

Review on Usage of Hidden Markov Model in Natural Language Processing

References

Kupiec, J.: Robust part-of-speech tagging using a hidden Markov model. In: Computer Speech and Language, pp. 225–242 (1992)
Google Scholar
Goldwater, S., Griffiths, T.: A fully Bayesian approach to unsupervised part-of-speech tagging. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 744–751. Association for Computational Linguistics, June 2007
Google Scholar
Gao, J., Johnson, M.: A comparison of Bayesian estimators for unsupervised hidden Markov model pos taggers. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 344–352 (2008)
Google Scholar
Lioma, C.: Part of speech n-grams for information retrieval. Ph.D. thesis, University of Glasgow (2008)
Google Scholar
Brants, T.: TnT — A statistical part of speech tagger. In: Proceedings of the 6th Applied NLP Conference(ANLP-2000), pp. 224–231 (2000)
Google Scholar
Thede, S.M.: Predicting part-of-speech information about unknown words using statistical methods. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics - v.2, pp. 1505–1507 (1998)
Google Scholar
Nakagawa, T., Kudoh, T., Matsumoto, Y.: Unknown word guessing and part-of-speech tagging using support vector machines. In: Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium, pp. 325–331 (2001)
Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing. Prentice Hall, Upper Saddle River (2000)
Google Scholar
Tseng, H., Jurafsky, D., Manning, C.: Morphological features help POS tagging of unknown words across language varieties. In: Proceedings of the Fourth SIGHAN Bakeoff (2005)
Google Scholar
Luong, M.T., Nakov, P., Ken, M.Y.: A hybrid morpheme-word representation for machine translation of morphologically rich languages. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2010), Cambridge, MA, pp. 148–157 (2010)
Google Scholar
Adler, M.: Hebrew morphological disambiguation: an unsupervised stochastic word-based approach. Ph.D. thesis, Ben-Gurion University of the Negev, Israel (2007)
Google Scholar
Taylor, A., Marcus, M., Santorini, B.: The Penn Treebank: An Overview (2003)
Google Scholar
Hajnicz, E.: Lexico-semantic annotation of składnica treebank by means of PLWN lexical units. In: Proceedings of the Seventh Global WordNet Conference, Tartu, Estonia, pp. 23–31 (2014)
Google Scholar
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Article Google Scholar
Jahmm, Java implementation of HMM related algorithms (2009)
Google Scholar
Layton, M.: Augmented Statistical Models for Classifying Sequence Data (2006)
Google Scholar
Langkilde, I., Knight, K.: The practical value of n-grams in generation. In: Proceedings of the Ninth International Workshop on Natural Language Generation, Niagara-on-the-Lake, Ontario, pp. 248–255 (1998)
Google Scholar
Lee, L.-M., Lee, J.-C.: A study on high-order hidden Markov models and applications to speech recognition. In: Ali, M., Dapoigny, R. (eds.) IEA/AIE 2006. LNCS (LNAI), vol. 4031, pp. 682–690. Springer, Heidelberg (2006)
Chapter Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Maziarz, M., Piasecki, M., Szpakowicz, S.: Approaching plWordNet 2.0. In: Proceedings of the 6th Global Wordnet Conference, Matsue, Japan (2012)
Google Scholar
Levenshtein, A.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1966)
MathSciNet MATH Google Scholar
Pietras, M.: Sentence sentiment classification using fuzzy word matching combined with fuzzy sentiment classifier. Electrical Review - Special issue, Poland (2014). doi:10.15199/48.2015.02.26
Wróblewska, A.: Polish dependency parser trained on an automatically induced dependency bank. Ph.D. dissertation, Institute of Computer Science, Polish Academy of Sciences, Warsaw (2014)
Google Scholar
Fine, S., Singer, Y., Tishby, N.: The hierarchical hidden Markov model: analysis and applications. Mach. Learn. Boston 32, 41–62 (1998)
Article MATH Google Scholar
Kobyliński, Ł.: PoliTa: a multitagger for Polish. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, Iceland, pp. 2949–2954 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Information Technology, West Pomeranian University of Technology, Żołnierska 49, Szczecin, Poland
Marcin Pietras

Authors

Marcin Pietras
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcin Pietras .

Editor information

Editors and Affiliations

Graduate School of Science and Engineeri, Ehime University Graduate School of Science and Engineeri, Ehime, Japan
Shin-ya Kobayashi
West Pomeranian University of Technology in Szczecin , Szczecin, Poland
Andrzej Piegat
West Pomeranian University of Technology in Szczecin , Szczecin, Poland
Jerzy Pejaś
West Pomeranian University of Technology in Szczecin , Szczecin, Poland
Imed El Fray
Polish Academy of Sciences, Systems Research Institute Polish Academy of Sciences, Warsaw, Poland
Janusz Kacprzyk

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pietras, M. (2017). Hidden Markov Models with Affix Based Observation in the Field of Syntactic Analysis. In: Kobayashi, Sy., Piegat, A., Pejaś, J., El Fray, I., Kacprzyk, J. (eds) Hard and Soft Computing for Artificial Intelligence, Multimedia and Security. ACS 2016. Advances in Intelligent Systems and Computing, vol 534. Springer, Cham. https://doi.org/10.1007/978-3-319-48429-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-48429-7_2
Published: 20 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48428-0
Online ISBN: 978-3-319-48429-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Hidden Markov Models with Affix Based Observation in the Field of Syntactic Analysis

Abstract

Access this chapter

Similar content being viewed by others

A Trigram HMM-Based POS Tagger for Indian Languages

Parse Tree Generation Using HMM Bigram Model

Review on Usage of Hidden Markov Model in Natural Language Processing

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Hidden Markov Models with Affix Based Observation in the Field of Syntactic Analysis

Abstract

Access this chapter

Similar content being viewed by others

A Trigram HMM-Based POS Tagger for Indian Languages

Parse Tree Generation Using HMM Bigram Model

Review on Usage of Hidden Markov Model in Natural Language Processing

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation