Wombat – A Generalization Approach for Automatic Link Discovery

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10249)


A significant portion of the evolution of Linked Data datasets lies in updating the links to other datasets. An important challenge when aiming to update these links automatically under the open-world assumption is the fact that usually only positive examples for the links exist. We address this challenge by presenting and evaluating Wombat, a novel approach for the discovery of links between knowledge bases that relies exclusively on positive examples. Wombat is based on generalisation via an upward refinement operator to traverse the space of link specification. We study the theoretical characteristics of Wombat and evaluate it on 8 different benchmark datasets. Our evaluation suggests that Wombat outperforms state-of-the-art supervised approaches while relying on less information. Moreover, our evaluation suggests that Wombat’s pruning algorithm allows it to scale well even on large datasets.



This work has been supported by H2020 projects SLIPO (GA no. 731581) and HOBBIT (GA no. 688227) as well as the DFG project LinkingLOD (project no. NG 105/3-2) and the BMWI Project GEISER (project no. 01MD16014).


  1. 1.
    Auer, S., Lehmann, J., Ngonga Ngomo, A.-C., Zaveri, A.: Introduction to linked data and its lifecycle on the web. In: Reasoning Web, pp. 1–90 (2013)Google Scholar
  2. 2.
    Denis, F., Gilleron, R., Letouzey, F.: Learning from positive and unlabeled examples. Theoret. Comput. Sci. 348(1), 70–83 (2005). Algorithmic Learning Theory 2000MathSciNetCrossRefGoogle Scholar
  3. 3.
    Esposito, F., Fanizzi, N., Iannone, L., Palmisano, I., Semeraro, G.: Knowledge-intensive induction of terminologies from metadata. In: McIlraith, S.A., Plexousakis, D., Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 441–455. Springer, Heidelberg (2004). doi: 10.1007/978-3-540-30475-3_31CrossRefGoogle Scholar
  4. 4.
    Iannone, L., Palmisano, I., Fanizzi, N.: An algorithm based on counterfactuals for concept learning in the semantic web. Appl. Intell. 26(2), 139–159 (2007)CrossRefGoogle Scholar
  5. 5.
    Isele, R., Bizer, C.: Learning linkage rules using genetic programming. In: Sixth International Ontology Matching Workshop (2011)Google Scholar
  6. 6.
    Isele, R., Jentzsch, A., Bizer, C.: Efficient multidimensional blocking for link discovery without losing recall. In: WebDB (2011)Google Scholar
  7. 7.
    Isele, R., Jentzsch, A., Bizer, C.: Active learning of expressive linkage rules for the web of data. In: Brambilla, M., Tokuda, T., Tolksdorf, R. (eds.) ICWE 2012. LNCS, vol. 7387, pp. 411–418. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-31753-8_34CrossRefGoogle Scholar
  8. 8.
    Kejriwal, M., Miranker, D.P.: Semi-supervised instance matching using boosted classifiers. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 388–402. Springer, Cham (2015). doi: 10.1007/978-3-319-18818-8_24CrossRefGoogle Scholar
  9. 9.
    Köpcke, H., Thor, A., Rahm, E.: Evaluation of entity resolution approaches on real-world match problems. Proc. VLDB Endow. 3(1–2), 484–493 (2010)CrossRefGoogle Scholar
  10. 10.
    Lehmann, J., Haase, C.: Ideal downward refinement in the EL description logic. In: 19th International Conference on Inductive Logic Programming, Leuven, Belgium (2009)Google Scholar
  11. 11.
    Lehmann, J., Hitzler, P.: Foundations of refinement operators for description logics. In: Blockeel, H., Ramon, J., Shavlik, J., Tadepalli, P. (eds.) ILP 2007. LNCS (LNAI), vol. 4894, pp. 161–174. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-78469-2_18CrossRefGoogle Scholar
  12. 12.
    Lehmann, J., Hitzler, P.: Concept learning in description logics using refinement operators. Mach. Learn. J. 78(1–2), 203–250 (2010)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Muggleton, S.: Learning from positive data. In: Muggleton, S. (ed.) ILP 1996. LNCS, vol. 1314, pp. 358–376. Springer, Heidelberg (1997). doi: 10.1007/3-540-63494-0_65CrossRefzbMATHGoogle Scholar
  14. 14.
    Ngonga Ngomo, A.-C.: Link discovery with guaranteed reduction ratio in affine spaces with minkowski measures. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012. LNCS, vol. 7649, pp. 378–393. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-35176-1_24CrossRefGoogle Scholar
  15. 15.
    Ngonga Ngomo, A.-C., Lyko, K.: EAGLE: efficient active learning of link specifications using genetic programming. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 149–163. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-30284-8_17CrossRefGoogle Scholar
  16. 16.
    Ngonga Ngomo, A.-C., Lyko, K.: Unsupervised learning of link specifications: deterministic vs. non-deterministic. In: Proceedings of the Ontology Matching Workshop (2013)Google Scholar
  17. 17.
    Ngomo, A.-C.N., Lyko, K., Christen, V.: COALA – correlation-aware active learning of link specifications. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 442–456. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-38288-8_30CrossRefGoogle Scholar
  18. 18.
    Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000)CrossRefGoogle Scholar
  19. 19.
    Nikolov, A., dAquin, M., Motta, E.: Unsupervised learning of link discovery configuration. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 119–133. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-30284-8_15CrossRefGoogle Scholar
  20. 20.
    Shapiro, E.Y.: Inductive inference of theories from facts. In: Lassez, J.L., Plotkin, G.D. (eds.) Computational Logic: Essays in Honor of Alan Robinson. The MIT Press (1991)Google Scholar
  21. 21.
    Sherif, M.A., Ngomo, A.-C.N., Lehmann, J.: Automating RDF dataset transformation and enrichment. In: Gandon, F., Sabou, M., Sack, H., dAmato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 371–387. Springer, Cham (2015). doi: 10.1007/978-3-319-18818-8_23CrossRefGoogle Scholar
  22. 22.
    Suchanek, F.M., Abiteboul, S., Senellart, P.: PARIS: probabilistic alignment of relations, instances, and schema. PVLDB 5(3), 157–168 (2011)Google Scholar
  23. 23.
    Laag, P.R.J., Nienhuys-Cheng, S.-H.: Existence and nonexistence of complete refinement operators. In: Bergadano, F., Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 307–322. Springer, Heidelberg (1994). doi: 10.1007/3-540-57868-4_66CrossRefGoogle Scholar
  24. 24.
    Zhou, K., Gui-Rong, X., Yang, Q., Yu, Y.: Learning with positive and unlabeled examples using topic-sensitive PLSA. IEEE Trans. Knowl. Data Eng. 22(1), 46–58 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.R&D Department II, Computing CenterUniversity of LeipzigLeipzigGermany
  2. 2.Data Science GroupUniversity of PaderbornPaderbornGermany
  3. 3.Computer Science InstituteUniversity of BonnBonnGermany
  4. 4.Fraunhofer IAIS, Schloss BirlinghovenSankt AugustinGermany

Personalised recommendations