Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

Extended Semantic Web Conference

ESWC 2012: The Semantic Web: Research and Applications pp 119–133Cite as

  1. Home
  2. The Semantic Web: Research and Applications
  3. Conference paper
Unsupervised Learning of Link Discovery Configuration

Unsupervised Learning of Link Discovery Configuration

  • Andriy Nikolov21,
  • Mathieu d’Aquin21 &
  • Enrico Motta21 
  • Conference paper
  • 3401 Accesses

  • 57 Citations

Part of the Lecture Notes in Computer Science book series (LNISA,volume 7295)

Abstract

Discovering links between overlapping datasets on the Web is generally realised through the use of fuzzy similarity measures. Configuring such measures is often a non-trivial task that depends on the domain, ontological schemas, and formatting conventions in data. Existing solutions either rely on the user’s knowledge of the data and the domain or on the use of machine learning to discover these parameters based on training data. In this paper, we present a novel approach to tackle the issue of data linking which relies on the unsupervised discovery of the required similarity parameters. Instead of using labeled data, the method takes into account several desired properties which the distribution of output similarity values should satisfy. The method includes these features into a fitness criterion used in a genetic algorithm to establish similarity parameters that maximise the quality of the resulting linkset according to the considered properties. We show in experiments using benchmarks as well as real-world datasets that such an unsupervised method can reach the same levels of performance as manually engineered methods, and how the different parameters of the genetic algorithm and the fitness criterion affect the results for different datasets.

Keywords

  • Genetic Algorithm
  • Decision Rule
  • Candidate Solution
  • Unsupervised Learn
  • Ontology Match

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Download conference paper PDF

References

  1. de Carvalho, M.G., Laender, A.H.F., Goncalves, M.A., da Silva, A.S.: A genetic programming approach to record deduplication. IEEE Transactions on Knowledge and Data Engineering 99(PrePrints) (2010)

    Google Scholar 

  2. Chaudhuri, S., Ganti, V., Motwani, R.: Robust identification of fuzzy duplicates. In: ICDE 2005, pp. 865–876 (2005)

    Google Scholar 

  3. Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1–16 (2007)

    CrossRef  Google Scholar 

  4. Euzenat, J., et al.: Results of the ontology alignment evaluation initiative 2010. In: Workshop on Ontology Matching (OM 2010), ISWC 2010 (2010)

    Google Scholar 

  5. Euzenat, J., et al.: Results of the ontology alignment evaluation initiative 2011. In: Workshop on Ontology Matching (OM 2011), ISWC 2011 (2011)

    Google Scholar 

  6. Fellegi, I.P., Sunter, A.B.: A theory for record linkage. Journal of American Statistical Association 64(328), 1183–1210 (1969)

    Google Scholar 

  7. Hu, W., Chen, J., Qu, Y.: A self-training approach for resolving object coreference on the semantic web. In: WWW 2011, pp. 87–96 (2011)

    Google Scholar 

  8. Isele, R., Bizer, C.: Learning linkage rules using genetic programming. In: Workshop on Ontology Matching (OM 2011), ISWC 2011, Bonn, Germany (2011)

    Google Scholar 

  9. Li, J., Tang, J., Li, Y., Luo, Q.: RiMOM: A dynamic multistrategy ontology alignment framework. IEEE Transactions on Knowledge and Data Engineering 21(8), 1218–1232 (2009)

    CrossRef  Google Scholar 

  10. Ngonga Ngomo, A.C., Lehmann, J., Auer, S., Höffner, K.: RAVEN - active learning of link specifications. In: Workshop on Ontology Matching (OM 2011), ISWC 2011 (2011)

    Google Scholar 

  11. Nikolov, A., Uren, V., Motta, E., de Roeck, A.: Integration of Semantically Annotated Data by the KnoFuss Architecture. In: Gangemi, A., Euzenat, J. (eds.) EKAW 2008. LNCS (LNAI), vol. 5268, pp. 265–274. Springer, Heidelberg (2008)

    CrossRef  Google Scholar 

  12. Noessner, J., Niepert, M., Meilicke, C., Stuckenschmidt, H.: Leveraging Terminological Structure for Object Reconciliation. In: Aroyo, L., Antoniou, G., Hyvönen, E., ten Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) ESWC 2010. LNCS, vol. 6089, pp. 334–348. Springer, Heidelberg (2010)

    CrossRef  Google Scholar 

  13. Stoermer, H., Rassadko, N., Vaidya, N.: Feature-Based Entity Matching: The FBEM Model, Implementation, Evaluation. In: Pernici, B. (ed.) CAiSE 2010. LNCS, vol. 6051, pp. 180–193. Springer, Heidelberg (2010)

    CrossRef  Google Scholar 

  14. Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and Maintaining Links on the Web of Data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 650–665. Springer, Heidelberg (2009)

    CrossRef  Google Scholar 

  15. Zardetto, D., Scannapietro, M., Catarci, T.: Effective automated object matching. In: ICDE 2010, pp. 757–768 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Knowledge Media Institute, The Open University, Milton Keynes, UK

    Andriy Nikolov, Mathieu d’Aquin & Enrico Motta

Authors
  1. Andriy Nikolov
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Mathieu d’Aquin
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Enrico Motta
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Institute AIFB, Karlsruhe Institute of Technology, Englerstrasse 11, 76131, Karlsruhe, Germany

    Elena Simperl

  2. CITEC, University of Bielefeld, Morgenbreede 39, 33615, Bielefeld, Germany

    Philipp Cimiano

  3. Siemens AG Österreich, Siemensstrasse 90, 1210, Vienna, Austria

    Axel Polleres

  4. Technical University of Madrid, C/ Severo Ochoa, 13, 28660, Boadilla del Monte, Madrid, Spain

    Oscar Corcho

  5. STLab, ISTC-CNR, Via Nomentana 56, 00161, Rome, Italy

    Valentina Presutti

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nikolov, A., d’Aquin, M., Motta, E. (2012). Unsupervised Learning of Link Discovery Configuration. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds) The Semantic Web: Research and Applications. ESWC 2012. Lecture Notes in Computer Science, vol 7295. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30284-8_15

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-30284-8_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30283-1

  • Online ISBN: 978-3-642-30284-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature