Pair Hidden Markov Model for Named Entity Matching

Nabende, Peter; Tiedemann, Jörg; Nerbonne, John

doi:10.1007/978-90-481-3658-2_87

Peter Nabende²,
Jörg Tiedemann² &
John Nerbonne²

Abstract

This paper introduces a pair-Hidden Markov Model (pair-HMM) for the task of evaluating the similarity between bilingual named entities. The pair-HMM is adapted from Mackay and Kondrak [1] who used it on the task of cognate identification and was later adapted by Wieling et al. [5] for Dutch dialect comparison. When using the pair-HMM for evaluating named entities, we do not consider the phonetic representation step as is the case with most named-entity similarity measurement systems. We instead consider the original orthographic representation of the input data and introduce into the pair-HMM representation for diacritics or accents to accommodate for pronunciation variations in the input data. We have first adapted the pair-HMM on measuring the similarity between named entities from languages (French and English) that use the same writing system (the Roman alphabet) and languages (English and Russian) that use a different writing system. The results are encouraging as we propose to extend the pair-HMM to more application oriented named-entity recognition and generation tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

W. Mackay and G. Kondrak, “Computing Word Similarity and Identifying Cognates with Pair Hidden Markov Models,” Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL), pp. 40-47, Ann Arbor, Michigan, 2005.
Google Scholar
W. Lam, S-K. Chan and R. Huang, “Named Entity Translation Matching and Learning: With Application for Mining Unseen Translations,” ACM Transactions on Information Systems, vol. 25, issue 1, article 2, 2007.
Google Scholar
N. Chinchor, “MUC-7 Named Entity Task Definition,” Proceedings of the 7 ^th Message Understanding Conference (MUC-7), 2007.
Google Scholar
C-J. Lee, J.S. Chang and J-S.R. Juang, “Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Models and Multiple Knowledge Sources,” ACM Transactions on Asian Language Information Processing, vol. 5, issue 2, 2006, pp. 121-145.
Article Google Scholar
M. Wieling, T. Leinonen and J. Nerbonne, “Inducing Sound Segment Differences using Pair Hidden Markov Models. In J. Nerbonne, M. Ellison and G. Kondrak (eds.), Computing and Historical Phonology: 9 ^th Meeting of ACL Special Interest Group for Computational Morphology and Phonology Workshop, Prague, pp. 48-56, 2007.
Google Scholar
C-C. Hsu., C-H. Chen, T-T. Shih and C-K. Chen, “Measuring Similarity between Transliterations against Noise and Data,” ACM Transactions on Asian Language Information Processing, vol. 6, issue 2, article 5, 2005.
Google Scholar
C.M. Grinstead and J.L. Snell, Introduction to Probability, 2^nd Edition, AMS, 1997.
Google Scholar
B. Poliquen, R. Steinberger, C. Ignat, I. Temnikova, A. Widiger, W. Zaghouani and J. Žižka, “Multilingual Person Name Recognition and Transliteration. Revue CORELA, Cognition, Represéntation, Language, 2005.
Google Scholar
L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, vol. 77, issue 2, pp. 257-286, 1989.
Article Google Scholar
W. Mackay, Word Similarity using Pair Hidden Markov Models, Masters Thesis, University of Alberta, 2004.
Google Scholar
R. Durbin, S.R. Eddy, A. Krogh and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Protein and Nucleic Acids. Cambridge University Press, 1998.
Google Scholar
A. Arribas-Gil, E. Gassiat and C. Matias, “Parameter Estimation in Pair-hidden Markov Models,” Scandinavian Journal of Statistics, vol. 33, issue 4, pp. 651-671, 2006.
Article MATH MathSciNet Google Scholar
E.M. Voorhees and D.M. Tice. The TREC-8 Question Answering Track Report. In English Text Retrieval Conference (TREC-8), 2000.
Google Scholar
D. Jurafsky and H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd Edition, Pearson Edn Inc., Prentice Hall, 2009.
Google Scholar
C-J. Lee, J.S. Chang and J-S.R. Juang. A Statistical Approach to Chinese-to-English Back Transliteration. In Proceedings of the 17th Pacific Asia Conference, 2003.
Google Scholar
G. Kondrak and T. Sherif. Evaluation of Several Phonetic Algorithms on the Task of Cognate Identification. In Proceedings of the Workshop on Linguistic Distances, pages 43-50, Association for Computational Linguistics, Sydney, 2006.
Google Scholar
D. Durand and R. Hoberman. HMM Lecture Notes, Carnegie Mellon School of Computer Science. Retrieved fromhttp://www.cs.cmu.edu/~durand/03711/Lectures/hmm3-05.pdf on14^th Oct. 2008.
S. Bergsma and G. Kondrak, “Alignment-Based Discriminative String Similarity”. In Proceedings of the 45 ^th Annual Meeting of the Association of Computational Linguistics, pp. 656-663, Czech Republic, June 2007.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computational Linguistics, Center for Language and Cognition Groningen, University of Groningen, Groningen, Netherlands
Peter Nabende, Jörg Tiedemann & John Nerbonne

Authors

Peter Nabende
View author publications
You can also search for this author in PubMed Google Scholar
Jörg Tiedemann
View author publications
You can also search for this author in PubMed Google Scholar
John Nerbonne
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Nabende .

Editor information

Editors and Affiliations

School of Engineering, University of Bridgeport, University Avenue 221, Bridgeport, 06604, U.S.A.
Tarek Sobh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nabende, P., Tiedemann, J., Nerbonne, J. (2010). Pair Hidden Markov Model for Named Entity Matching. In: Sobh, T. (eds) Innovations and Advances in Computer Sciences and Engineering. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-3658-2_87

Download citation

DOI: https://doi.org/10.1007/978-90-481-3658-2_87
Published: 28 December 2009
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-3657-5
Online ISBN: 978-90-481-3658-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics