Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

Extended Semantic Web Conference

ESWC 2012: The Semantic Web: Research and Applications pp 149–163Cite as

  1. Home
  2. The Semantic Web: Research and Applications
  3. Conference paper
EAGLE: Efficient Active Learning of Link Specifications Using Genetic Programming

EAGLE: Efficient Active Learning of Link Specifications Using Genetic Programming

  • Axel-Cyrille Ngonga Ngomo21 &
  • Klaus Lyko21 
  • Conference paper
  • 3178 Accesses

  • 61 Citations

Part of the Lecture Notes in Computer Science book series (LNISA,volume 7295)

Abstract

With the growth of the Linked Data Web, time-efficient approaches for computing links between data sources have become indispensable. Most Link Discovery frameworks implement approaches that require two main computational steps. First, a link specification has to be explicated by the user. Then, this specification must be executed. While several approaches for the time-efficient execution of link specifications have been developed over the last few years, the discovery of accurate link specifications remains a tedious problem. In this paper, we present EAGLE, an active learning approach based on genetic programming. EAGLE generates highly accurate link specifications while reducing the annotation burden for the user. We evaluate EAGLE against batch learning on three different data sets and show that our algorithm can detect specifications with an F-measure superior to 90% while requiring a small number of questions.

Keywords

  • Active Learning
  • Genetic Program
  • Record Linkage
  • Entity Resolution
  • Fuzzy Decision Tree

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Download conference paper PDF

References

  1. Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. SIGMOD Rec. 22, 207–216 (1993)

    CrossRef  Google Scholar 

  2. Arasu, A., Götz, M., Kaushik, R.: On active learning of record matching packages. In: SIGMOD Conference, pp. 783–794 (2010)

    Google Scholar 

  3. Auer, S., Lehmann, J., Ngonga Ngomo, A.-C.: Introduction to Linked Data and Its Lifecycle on the Web. In: Polleres, A., d’Amato, C., Arenas, M., Handschuh, S., Kroner, P., Ossowski, S., Patel-Schneider, P. (eds.) Reasoning Web 2011. LNCS, vol. 6848, pp. 1–75. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  4. Bilenko, M., Mooney, R.J.: Adaptive duplicate detection using learnable string similarity measures. In: KDD, pp. 39–48 (2003)

    Google Scholar 

  5. Bleiholder, J., Naumann, F.: Data fusion. ACM Comput. Surv. 41(1), 1–41 (2008)

    CrossRef  Google Scholar 

  6. Carvalho, M.G., Laender, A.H.F., Gonçalves, M.A., da Silva, A.S.: Replica identification using genetic programming. In: Proceedings of the 2008 ACM Symposium on Applied Computing, SAC 2008, pp. 1801–1806. ACM, New York (2008)

    Google Scholar 

  7. Christen, P.: Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface. In: KDD 2008, pp. 1065–1068 (2008)

    Google Scholar 

  8. Cristianini, N., Ricci, E.: Support vector machines. In: Kao, M.-Y. (ed.) Encyclopedia of Algorithms. Springer (2008)

    Google Scholar 

  9. Cudré-Mauroux, P., Haghani, P., Jost, M., Aberer, K., de Meer, H.: idmesh: graph-based disambiguation of linked data. In: WWW, pp. 591–600 (2009)

    Google Scholar 

  10. Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19, 1–16 (2007)

    CrossRef  Google Scholar 

  11. Glaser, H., Millard, I.C., Sung, W.-K., Lee, S., Kim, P., You, B.-J.: Research on linked data and co-reference resolution. Technical report, University of Southampton (2009)

    Google Scholar 

  12. Hassanzadeh, O., Consens, M.: Linked movie data base. In: Bizer, C., Heath, T., Berners-Lee, T., Idehen, K. (eds.) Proceedings of the WWW 2009 Worshop on Linked Data on the Web, LDOW 2009 (2009)

    Google Scholar 

  13. Hogan, A., Polleres, A., Umbrich, J., Zimmermann, A.: Some entities are more equal than others: statistical methods to consolidate linked data. In: Workshop on New Forms of Reasoning for the Semantic Web: Scalable & Dynamic (NeFoRS 2010) (2010)

    Google Scholar 

  14. Isele, R., Jentzsch, A., Bizer, C.: Efficient Multidimensional Blocking for Link Discovery without losing Recall. In: WebDB (2011)

    Google Scholar 

  15. Isele, R., Bizer, C.: Learning Linkage Rules using Genetic Programming. In: Sixth International Ontology Matching Workshop (2011)

    Google Scholar 

  16. Sathiya Keerthi, S., Lin, C.-J.: Asymptotic behaviors of support vector machines with gaussian kernel. Neural Comput. 15, 1667–1689 (2003)

    CrossRef  MATH  Google Scholar 

  17. Köpcke, H., Thor, A., Rahm, E.: Comparative evaluation of entity resolution approaches with fever. Proc. VLDB Endow. 2(2), 1574–1577 (2009)

    Google Scholar 

  18. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection (Complex Adaptive Systems). The MIT Press (1992)

    Google Scholar 

  19. Liere, R., Tadepalli, P.: Active learning with committees for text categorization. In: Proceedings of the Fourteenth National Conference on Artificial Intelligence, pp. 591–596 (1997)

    Google Scholar 

  20. Ngonga Ngomo, A.-C.: A Time-Efficient Hybrid Approach to Link Discovery. In: Sixth International Ontology Matching Workshop (2011)

    Google Scholar 

  21. Ngonga Ngomo, A.-C., Auer, S.: LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data. In: Proceedings of IJCAI (2011)

    Google Scholar 

  22. Ngonga Ngomo, A.-C., Lehmann, J., Auer, S., Höffner, K.: RAVEN – Active Learning of Link Specifications. In: Proceedings of OM@ISWC (2011)

    Google Scholar 

  23. Nikolov, A., Uren, V., Motta, E., de Roeck, A.: Overcoming Schema Heterogeneity between Linked Semantic Repositories to Improve Coreference Resolution. In: Gómez-Pérez, A., Yu, Y., Ding, Y. (eds.) ASWC 2009. LNCS, vol. 5926, pp. 332–346. Springer, Heidelberg (2009)

    CrossRef  Google Scholar 

  24. Papadakis, G., Ioannou, E., Niedere, C., Palpanasz, T., Nejdl, W.: Eliminating the redundancy in blocking-based entity resolution methods. In: JCDL (2011)

    Google Scholar 

  25. Raimond, Y., Sutton, C., Sandler, M.: Automatic interlinking of music datasets on the semantic web. In: Proceedings of the 1st Workshop about Linked Data on the Web (2008)

    Google Scholar 

  26. Scharffe, F., Liu, Y., Zhou, C.: RDF-AI: an architecture for RDF datasets matching, fusion and interlink. In: Proc. IJCAI 2009 Workshop on Identity, Reference, and Knowledge Representation (IR-KR), Pasadena, CA, US (2009)

    Google Scholar 

  27. Settles, B.: Active learning literature survey. Technical Report 1648, University of Wisconsin-Madison (2009)

    Google Scholar 

  28. Sleeman, J., Finin, T.: Computing foaf co-reference relations with rules and machine learning. In: Proceedings of the Third International Workshop on Social Data on the Web (2010)

    Google Scholar 

  29. Song, D., Heflin, J.: Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection Approach. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 649–664. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  30. Winkler, W.: Overview of record linkage and current research directions. Technical report, Bureau of the Census - Research Report Series (2006)

    Google Scholar 

  31. Yuan, Y., Shaw, M.J.: Induction of fuzzy decision trees. Fuzzy Sets Syst. 69, 125–139 (1995)

    CrossRef  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Department of Computer Science, University of Leipzig, Johannisgasse 26, 04103, Leipzig, Germany

    Axel-Cyrille Ngonga Ngomo & Klaus Lyko

Authors
  1. Axel-Cyrille Ngonga Ngomo
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Klaus Lyko
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Institute AIFB, Karlsruhe Institute of Technology, Englerstrasse 11, 76131, Karlsruhe, Germany

    Elena Simperl

  2. CITEC, University of Bielefeld, Morgenbreede 39, 33615, Bielefeld, Germany

    Philipp Cimiano

  3. Siemens AG Österreich, Siemensstrasse 90, 1210, Vienna, Austria

    Axel Polleres

  4. Technical University of Madrid, C/ Severo Ochoa, 13, 28660, Boadilla del Monte, Madrid, Spain

    Oscar Corcho

  5. STLab, ISTC-CNR, Via Nomentana 56, 00161, Rome, Italy

    Valentina Presutti

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ngonga Ngomo, AC., Lyko, K. (2012). EAGLE: Efficient Active Learning of Link Specifications Using Genetic Programming. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds) The Semantic Web: Research and Applications. ESWC 2012. Lecture Notes in Computer Science, vol 7295. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30284-8_17

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-30284-8_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30283-1

  • Online ISBN: 978-3-642-30284-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature