Abstract
This paper introduces a pair-Hidden Markov Model (pair-HMM) for the task of evaluating the similarity between bilingual named entities. The pair-HMM is adapted from Mackay and Kondrak [1] who used it on the task of cognate identification and was later adapted by Wieling et al. [5] for Dutch dialect comparison. When using the pair-HMM for evaluating named entities, we do not consider the phonetic representation step as is the case with most named-entity similarity measurement systems. We instead consider the original orthographic representation of the input data and introduce into the pair-HMM representation for diacritics or accents to accommodate for pronunciation variations in the input data. We have first adapted the pair-HMM on measuring the similarity between named entities from languages (French and English) that use the same writing system (the Roman alphabet) and languages (English and Russian) that use a different writing system. The results are encouraging as we propose to extend the pair-HMM to more application oriented named-entity recognition and generation tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
W. Mackay and G. Kondrak, “Computing Word Similarity and Identifying Cognates with Pair Hidden Markov Models,” Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL), pp. 40-47, Ann Arbor, Michigan, 2005.
W. Lam, S-K. Chan and R. Huang, “Named Entity Translation Matching and Learning: With Application for Mining Unseen Translations,” ACM Transactions on Information Systems, vol. 25, issue 1, article 2, 2007.
N. Chinchor, “MUC-7 Named Entity Task Definition,” Proceedings of the 7 th Message Understanding Conference (MUC-7), 2007.
C-J. Lee, J.S. Chang and J-S.R. Juang, “Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Models and Multiple Knowledge Sources,” ACM Transactions on Asian Language Information Processing, vol. 5, issue 2, 2006, pp. 121-145.
M. Wieling, T. Leinonen and J. Nerbonne, “Inducing Sound Segment Differences using Pair Hidden Markov Models. In J. Nerbonne, M. Ellison and G. Kondrak (eds.), Computing and Historical Phonology: 9 th Meeting of ACL Special Interest Group for Computational Morphology and Phonology Workshop, Prague, pp. 48-56, 2007.
C-C. Hsu., C-H. Chen, T-T. Shih and C-K. Chen, “Measuring Similarity between Transliterations against Noise and Data,” ACM Transactions on Asian Language Information Processing, vol. 6, issue 2, article 5, 2005.
C.M. Grinstead and J.L. Snell, Introduction to Probability, 2nd Edition, AMS, 1997.
B. Poliquen, R. Steinberger, C. Ignat, I. Temnikova, A. Widiger, W. Zaghouani and J. Žižka, “Multilingual Person Name Recognition and Transliteration. Revue CORELA, Cognition, Represéntation, Language, 2005.
L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, vol. 77, issue 2, pp. 257-286, 1989.
W. Mackay, Word Similarity using Pair Hidden Markov Models, Masters Thesis, University of Alberta, 2004.
R. Durbin, S.R. Eddy, A. Krogh and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Protein and Nucleic Acids. Cambridge University Press, 1998.
A. Arribas-Gil, E. Gassiat and C. Matias, “Parameter Estimation in Pair-hidden Markov Models,” Scandinavian Journal of Statistics, vol. 33, issue 4, pp. 651-671, 2006.
E.M. Voorhees and D.M. Tice. The TREC-8 Question Answering Track Report. In English Text Retrieval Conference (TREC-8), 2000.
D. Jurafsky and H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd Edition, Pearson Edn Inc., Prentice Hall, 2009.
C-J. Lee, J.S. Chang and J-S.R. Juang. A Statistical Approach to Chinese-to-English Back Transliteration. In Proceedings of the 17th Pacific Asia Conference, 2003.
G. Kondrak and T. Sherif. Evaluation of Several Phonetic Algorithms on the Task of Cognate Identification. In Proceedings of the Workshop on Linguistic Distances, pages 43-50, Association for Computational Linguistics, Sydney, 2006.
D. Durand and R. Hoberman. HMM Lecture Notes, Carnegie Mellon School of Computer Science. Retrieved fromhttp://www.cs.cmu.edu/~durand/03711/Lectures/hmm3-05.pdf on14th Oct. 2008.
S. Bergsma and G. Kondrak, “Alignment-Based Discriminative String Similarity”. In Proceedings of the 45 th Annual Meeting of the Association of Computational Linguistics, pp. 656-663, Czech Republic, June 2007.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media B.V.
About this paper
Cite this paper
Nabende, P., Tiedemann, J., Nerbonne, J. (2010). Pair Hidden Markov Model for Named Entity Matching. In: Sobh, T. (eds) Innovations and Advances in Computer Sciences and Engineering. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-3658-2_87
Download citation
DOI: https://doi.org/10.1007/978-90-481-3658-2_87
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-3657-5
Online ISBN: 978-90-481-3658-2
eBook Packages: EngineeringEngineering (R0)