Heuristic-Based Configuration Learning for Linked Data Instance Matching

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9544)

Abstract

Instance matching in linked data has become increasingly important because of the rapid development of linked data. The goal of instance matching is to detect co-referent instances that refer to the same real-world objects. In order to realize such instances, instance matching systems use a configuration, which specifies the matching properties, similarity measures, and other settings of the matching process. For different repositories, the configuration is varied to adapt with the particular characteristics of the input. Therefore, the automation of configuration creation is very important. In this paper, we propose \(cLink\), a supervised instance matching system for linked data. \(cLink\) is enhanced by a heuristic algorithm that learns the optimal configuration on the basic of input repositories. We show that \(cLink\) can achieve effective performance even when being given only a small amount of training data. Compared to previous configuration learning algorithms, our algorithm significantly improves the results. Compared to the recent supervised systems, \(cLink\) is also consistently better on all tested datasets.

Keywords

Instance matching Schema-independent Supervised Linked data 

References

  1. 1.
    Araujo, S., Tran, D.T., DeVries, A., Hidders, J., Schwabe, D.: SERIMI: Class-based disambiguation for effective instance matching over heterogeneous web data. In: 15th ACM SIGMOD Workshop on the Web and Databases, pp. 25–30 (2012)Google Scholar
  2. 2.
    Cruz, I.F., Antonelli, F.P., Stroe, C.: AgreementMaker: efficient matching for large real-world schemas and ontologies. VLDB Endowment 2, 1586–1589 (2009)CrossRefGoogle Scholar
  3. 3.
    Ferrara, A., Nikolov, A., Scharffe, F.: Data linking for the semantic web. Int. J. Semant. Web Inf. Syst. 7(3), 46–76 (2011)CrossRefGoogle Scholar
  4. 4.
    Gale, D., Shapley, L.S.: College admissions and the stability of marriage. Am. Math. Mon. 96(1), 9–15 (1962)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Hu, W., Chen, J., Cheng, G., Qu, Y.: ObjectCoref & Falcon-AO: results for OAEI 2010. In: 5th ISWC Workshop on Ontology Matching, pp. 158–165 (2010)Google Scholar
  6. 6.
    Hu, W., Chen, J., Qu, Y.: A self-training approach for resolving object coreference on the semantic web. In: 20th International Conference on World Wide Web, pp. 87–96 (2011)Google Scholar
  7. 7.
    Hu, W., Yang, R., Qu, Y.: Automatically generating data linkages using class-based discriminative properties. Data Knowl. Eng. 91, 34–51 (2014)CrossRefGoogle Scholar
  8. 8.
    Isele, R., Bizer, C.: Active learning of expressive linkage rules using genetic programming. Web Semant.: Sci. Serv. Agents World Wide Web 23, 2–15 (2013)CrossRefGoogle Scholar
  9. 9.
    Isele, R., Jentzsch, A., Bizer, C.: Efficient multidimensional blocking for link discovery without losing recall. In: 14th ACM SIGMOD Workshop on the Web and Databases (2011)Google Scholar
  10. 10.
    Li, J., Tang, J., Li, Y., Luo, Q.: RiMOM: a dynamic multistrategy ontology alignment framework. IEEE Trans. Knowl. Data Eng. 21(8), 1218–1232 (2009)CrossRefGoogle Scholar
  11. 11.
    Ngomo, A.C.N., Lehmann, J., Auer, S., Höffner, K.: RAVEN - active learning of link specifications. In: 6th International Semantic Web Conference Workshop on Ontology Matching, pp. 25–36 (2011)Google Scholar
  12. 12.
    Ngomo, A.C.N., Auer, S.: LIMES: a time-efficient approach for large-scale link discovery on the web of data. In: 22nd International Joint Conference on Artificial Intelligence, pp. 2312–2317 (2011)Google Scholar
  13. 13.
    Ngomo, A.C.N., Lyko, K.: EAGLE: efficient active learning of link specifications using genetic programming. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 149–163. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  14. 14.
    Ngomo, A.C.N., Lyko, K.: Unsupervised learning of link specifications: deterministic vs. non-deterministic. In: 8th International Sematic Web Conference Workshop on Ontology Matching, pp. 25–36 (2013)Google Scholar
  15. 15.
    Nguyen, K., Ichise, R.: A heuristic approach for configuration learning of supervised instance matching. In: 14th International Semantic Web Conference Posters and Demonstrations Track (2015)Google Scholar
  16. 16.
    Nguyen, K., Ichise, R.: ScSLINT: time and memory efficient interlinking framework for linked data. In: 14th International Semantic Web Conference Posters and Demonstrations Track (2015)Google Scholar
  17. 17.
    Nguyen, K., Ichise, R., Le, B.: Interlinking linked data sources using a domain-independent system. In: Takeda, H., Qu, Y., Mizoguchi, R., Kitamura, Y. (eds.) JIST 2012. LNCS, vol. 7774, pp. 113–128. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  18. 18.
    Nikolov, A., d’Aquin, M., Motta, E.: Unsupervised learning of link discovery configuration. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 119–133. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  19. 19.
    Niu, X., Rong, S., Zhang, Y., Wang, H.: Zhishi.Links results for OAEI 2011. In: 6th ISWC Workshop on Ontology Matching, pp. 220–227 (2011)Google Scholar
  20. 20.
    Papadakis, G., Ioannou, E., Palpanas, T., Niederée, C., Nejdl, W.: A blocking framework for entity resolution in highly heterogeneous information spaces. IEEE Trans. Knowl. Data Eng. 25(12), 2665–2682 (2013)CrossRefGoogle Scholar
  21. 21.
    Rong, S., Niu, X., Xiang, E.W., Wang, H., Yang, Q., Yu, Y.: A machine learning approach for instance matching based on similarity metrics. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 460–475. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  22. 22.
    Song, D., Heflin, J.: Automatically generating data linkages using a domain-independent candidate selection approach. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 649–664. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  23. 23.
    Soru, T., Ngomo, A.C.N.: A comparison of supervised learning classifiers for link discovery. In: Proceedings of the 10th International Conference on Semantic Systems, pp. 41–44. ACM (2014)Google Scholar
  24. 24.
    Suchanek, F.M., Abiteboul, S., Senellart, P.: PARIS: probabilistic alignment of relations, instances, and schema. VLDB Endowment 5(3), 157–168 (2011)CrossRefGoogle Scholar
  25. 25.
    Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and maintaining links on the web of data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 650–665. Springer, Heidelberg (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.The Graduate University for Advanced StudiesHayamaJapan
  2. 2.National Institute of InformaticsTokyoJapan

Personalised recommendations