Advertisement

Title Matching for Finding Identical Metadata Records in Different Languages

  • Yuting SongEmail author
  • Biligsaikhan Batjargal
  • Akira Maeda
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1057)

Abstract

This paper proposes a title matching method for finding identical metadata records from multiple databases in different languages. To overcome the language barriers, we represent words in titles in different languages by using bilingual word embeddings that allow word similarities to be measured across languages. The proposed method can be used to link or integrate databases in different languages. We evaluate our proposed method’s effectiveness on the Japanese and English ukiyo-e print databases. We also compare the performance of our method with a method that relies on machine translation.

Keywords

Title matching Digital cultural collections Cross-language record linkage 

References

  1. 1.
    Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting Similarities among Languages for Machine Translation. arXiv Prepr. arXiv:1309.4168v1. pp. 1–10 (2013)
  2. 2.
    Artetxe, M., Labaka, G., Agirre, E.: Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 2289–2294 (2016)Google Scholar
  3. 3.
    Vulić, I., Moens, M.F.: Bilingual distributed word representations from document-aligned comparable data. J. Artif. Intell. Res. 55, 953–994 (2016)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Koudas, N., Sarawagi, S., Srivastava, D.: Record linkage: similarity measures and algorithms. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 802–803 (2006)Google Scholar
  5. 5.
    Batjargal, B., Kuyama, T., Kimura, F., Maeda, A.: Identifying the same records across multiple Ukiyo-e image database using textual data in different languages. In: Proceedings of the 14th ACM/IEEE Joint Conference on Digital Libraries, pp. 193–196 (2014)Google Scholar
  6. 6.
    Song, Y., Batjargal, B., Maeda, A.: Cross-language record linkage based on semantic matching of metadata. Database Soc. Japan Engl. J. 17, 1–8 (2019)Google Scholar
  7. 7.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. arXiv Prepr. arXiv:1301.3781. pp. 1–12 (2013)
  8. 8.
    Socher, R., Bauer, J., Manning, C.D., Ng, A.Y.: Parsing with compositional vector grammars. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 455–465 (2013)Google Scholar
  9. 9.
    Mikolov, T., Yih, W.-T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of NAACL-HLT 2013, pp. 746–751 (2013)Google Scholar
  10. 10.
    Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A comparison of string metrics for matching names and records. In: Proceedings of the International Workshop on Data Cleaning and Object Consolidation, held at KDD, pp. 73–78 (2003)Google Scholar
  11. 11.
    Gali, N., Mariescu-Istodor, R., Fränti, P.: Similarity measures for title matching. In: Proceedings of 23rd International Conference on Pattern Recognition, pp. 1549–1554 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Research Organization of Science and TechnologyRitsumeikan UniversityKusatsuJapan
  2. 2.Kinugasa Research OrganizationRitsumeikan UniversityKyotoJapan
  3. 3.College of Information Science and EngineeringRitsumeikan UniversityKusatsuJapan

Personalised recommendations