Advertisement

Dynamic Planning for Link Discovery

  • Kleanthi Georgala
  • Daniel Obraczka
  • Axel-Cyrille Ngonga Ngomo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10843)

Abstract

With the growth of the number and the size of RDF datasets comes an increasing need for scalable solutions to support the linking of resources. Most Link Discovery frameworks rely on complex link specifications for this purpose. We address the scalability of the execution of link specifications by presenting the first dynamic planning approach for Link Discovery dubbed Condor. In contrast to the state of the art, Condor can re-evaluate and reshape execution plans for link specifications during their execution. Thus, it achieves significantly better runtimes than existing planning solutions while retaining an F-measure of 100%. We quantify our improvement by evaluating our approach on 7 datasets and 700 link specifications. Our results suggest that Condor is up to 2 orders of magnitude faster than the state of the art and requires less than 0.1% of the total runtime of a given specification to generate the corresponding plan.

Notes

Acknowledgments

This work has been supported by H2020 projects SLIPO (GA no. 731581) and HOBBIT (GA no. 688227) as well as the DFG project LinkingLOD (project no. NG 105/3-2) and the BMWI Projects SAKE (project no. 01MD15006E) and GEISER (project no. 01MD16014).

References

  1. 1.
    Nentwig, M., Hartung, M., Ngonga Ngomo, A.-C., Rahm, E.: A survey of current link discovery frameworks. Semant. Web 8, 1–18 (2015). (Preprint)Google Scholar
  2. 2.
    Isele, R., Jentzsch, A., Bizer, C.: Efficient multidimensional blocking for link discovery without losing recall. In: Marian, A., Vassalos, V. (eds.) WebDB (2011)Google Scholar
  3. 3.
    Ngonga Ngomo, A.-C., Auer, S.: LIMES - a time-efficient approach for large-scale link discovery on the web of data. In: Proceedings of IJCAI (2011)Google Scholar
  4. 4.
    Wang, J., Feng, J., Li, G.: Trie-join: efficient trie-based string similarity joins with edit-distance constraints. Proc. VLDB Endow. 3(1–2), 1219–1230 (2010)CrossRefGoogle Scholar
  5. 5.
    Xiao, C., Wang, W., Lin, X., Yu, J.X.: Efficient similarity joins for near duplicate detection. In: Proceedings of the 17th International Conference on World Wide Web, WWW 2008, pp. 131–140. ACM, New York (2008)Google Scholar
  6. 6.
    Sherif, M.A., Ngonga Ngomo, A.-C., Lehmann, J.: Wombat – a generalization approach for automatic link discovery. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10249, pp. 103–119. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-58068-5_7CrossRefGoogle Scholar
  7. 7.
    Ngonga Ngomo, A.-C.: HELIOS – execution optimization for link discovery. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandečić, D., Groth, P., Noy, N., Janowicz, K., Goble, C. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 17–32. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11964-9_2CrossRefGoogle Scholar
  8. 8.
    Georgala, K., Hoffmann, M., Ngonga Ngomo, A.-C.: An evaluation of models for runtime approximation in link discovery. In: Proceedings of the International Conference on Web Intelligence, WI 2017, pp. 57–64. ACM, New York (2017)Google Scholar
  9. 9.
    Köpcke, H., Thor, A., Rahm, E.: Evaluation of entity resolution approaches on real-world match problems. Proc. VLDB Endow. 3(1–2), 484–493 (2010)CrossRefGoogle Scholar
  10. 10.
    Ngonga Ngomo, A.-C., Lyko, K.: EAGLE: efficient active learning of link specifications using genetic programming. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 149–163. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-30284-8_17CrossRefGoogle Scholar
  11. 11.
    Ngonga Ngomo, A.-C.: On link discovery using a hybrid approach. J. Data Semant. 1(4), 203–217 (2012)CrossRefGoogle Scholar
  12. 12.
    Nikolov, A., d’Aquin, M., Motta, E.: Unsupervised learning of link discovery configuration. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 119–133. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-30284-8_15CrossRefGoogle Scholar
  13. 13.
    Niu, X., Rong, S., Zhang, Y., Wang, H.: Zhishi.links results for OAEI 2011. Ontol. Matching 184, 220 (2011)Google Scholar
  14. 14.
    Achichi, M., Cheatham, M., Dragisic, Z., Euzenat, J., Faria, D., Ferrara, A., Flouris, G., Fundulaki, I., Harrow, I., Ivanova, V., Jiménez-Ruiz, E., Kuss, E., Lambrix, P., Leopold, H., Li, H., Meilicke, C., Montanelli, S., Pesquita, C., Saveta, T., Shvaiko, P., Splendiani, A., Stuckenschmidt, H., Todorov, K., Trojahn, C., Zamazal, O.: Results of the ontology alignment evaluation initiative 2016. In: Proceedings of the 11th International Workshop on Ontology Matching, OM 2016, Co-located with the 15th International Semantic Web Conference (ISWC 2016) Kobe, Japan, 18 October 2016, vol. 1766, pp. 73–129. RWTH, Aachen (2016)Google Scholar
  15. 15.
    Silberschatz, A., Korth, H., Sudarshan, S.: Database Systems Concepts, 5th edn. McGraw-Hill Inc., New York (2006)zbMATHGoogle Scholar
  16. 16.
    Bennett, K., Ferris, M.C., Ioannidis, Y.E.: A genetic algorithm for database query optimization. In: Proceedings of the fourth International Conference on Genetic Algorithms, pp. 400–407 (1991)Google Scholar
  17. 17.
    Kanne, C.C., Moerkotte, G.: Histograms reloaded: the merits of bucket diversity. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pp. 663–674. ACM, New York (2010)Google Scholar
  18. 18.
    Ng, K.W., Wang, Z., Muntz, R.R., Nittel, S.: Dynamic query re-optimization. In: Eleventh International Conference on Scientific and Statistical Database Management, pp. 264–273. IEEE (1999)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.AKSW Research GroupUniversity of LeipzigLeipzigGermany
  2. 2.Data Science GroupPaderborn UniversityPaderbornGermany

Personalised recommendations