Evolutionary Intelligence

, Volume 12, Issue 4, pp 609–632 | Cite as

Unsupervised genetic programming based linkage rule (UGPLR) Miner for entity linking in semantic web

  • Amit SinghEmail author
  • Aditi Sharan
Research Paper


In the past decade, the Semantic web data community has focused on publishing and interlinking data. Data publication is now widely done activity, but more effort needs to be devoted to interlink data sources. Organizations have been publishing data using different data curation and publication policies that have resulted in the proliferation of data sources. This proliferation has brought several challenges in interlinking data sources. Different data sources use different properties, descriptions to describe the same entity. Entity linking problem is at the core of data interlinking, it identifies and links instances, records referring to the same real-world entity. The state-of-the-art Entity Linking approaches are based on supervised learning. Supervised approaches rely on the labeled data for a better learning model and suffer in the absence of labeled data. The cost of labeling is high, and it is infeasible to carry out manual labeling process for datasets having billions of records. In this work, the authors have proposed a simple heuristic-based approach to generate the labeled data. The proposed approach uses automatically generated labeled data to train an underlying Genetic Programming based linkage rule-learning model. The proposed approach is scalable for large datasets and achieves comparable performance to other supervised approaches while eliminating the need for labeled data. The proposed approach works in the unsupervised (fully automatic) way at the same time keeping the advantages of supervised approaches such as high accuracy and less complexity. Experimental analysis proves that the proposed approach is effective than many states of the art approaches.


Semantic web Linked data Entity linking Linked open data Genetic programming Blocking 



  1. 1.
    Bizer C, Heath T, Berners-Lee T (2009) Linked data—the story so far. Int J Semant Web Inf Syst 5:1–22Google Scholar
  2. 2.
    Schmachtenberg M, Bizer C, Paulheim H (2014) Adoption of the linked data best practices in different topical domains. In: International semantic web conference. pp 245–260CrossRefGoogle Scholar
  3. 3.
    Koza J, Poli R (2005) Genetic programming. MIT Press, CambridgeGoogle Scholar
  4. 4.
    Volz J, Bizer C, Gaedke M, Kobilarov G (2009) Silk-A link discovery framework for the web of data. Linked data web WWWGoogle Scholar
  5. 5.
    Ngonga Ngomo A-C, Auer S, Ngomo A, Auer S (2011) Limes-a time-efficient approach for large-scale link discovery on the web of data. In: Proceedings of the twenty-second international joint conference on artificial intelligence. pp 2312–2317Google Scholar
  6. 6.
    Demartini G, Difallah D (2012) ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st international conference on World Wide Web. ACM Press, Cambridge, pp 469–478Google Scholar
  7. 7.
    Tejada S, Knoblock CCA, Minton S (2001) Learning object identification rules for information integration. Inf Syst 26:607–633CrossRefGoogle Scholar
  8. 8.
    Elfeky M, Verykios V (2002) TAILOR: a record linkage toolbox. In: 18th international conference on data engineeringGoogle Scholar
  9. 9.
    Bilenko M, Mooney RRJ (2003) Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD-2003). ACM Press, New York, pp 39–48Google Scholar
  10. 10.
    Bilenko M, View M, Mooney RJ (2006) Adaptive blocking : learning to scale up record linkage. In: IEEE International conference on data mining. pp 87–96Google Scholar
  11. 11.
    Isele R, Bizer C (2011) Learning linkage rules using genetic programming. In: Proceedings of the 6th international conference on ontology matching. pp 13–24Google Scholar
  12. 12.
    Isele R, Bizer C (2013) Active learning of expressive linkage rules using genetic programming. J Web Semant 23:2–15CrossRefGoogle Scholar
  13. 13.
    Ngomo A, Lyko K, Ngonga Ngomo A-CC, Lyko K, Ngomo A, Lyko K, Ngonga Ngomo A-CC, Lyko K, Ngomo A, Lyko K, Ngonga Ngomo A-CC, Lyko K (2012) EAGLE: efficient active learning of link specifications using genetic programming. In: Extended semantic web conference. pp 149–163Google Scholar
  14. 14.
    Singh A, Sharan A (2018) Genetic-fuzzy programming based linkage rule miner (GFPLR-Miner) for entity linking in semantic web. Int J Semant Web Inf Syst 14:134–166CrossRefGoogle Scholar
  15. 15.
    Singh A, Sharan A (2017) Adaptive genetic programming based linkage rule miner for entity linking in Semantic Web. In: 2017 International conference on computing, communication and automation (ICCCA). IEEE, pp 373–378Google Scholar
  16. 16.
    Sherif MA, Ngonga Ngomo A-C, Lehmann J (2017) Wombat—a generalization approach for automatic link discovery. In: European semantic web conference. Springer, Cham, pp 103–119CrossRefGoogle Scholar
  17. 17.
    Lyko K, Lehmann J, Ngomo A-CN, Hassan M (2016) Induction of link specifications using refinement operators. In: Sack H, Blomqvist E, d’Aquin M, Ghidini C, Ponzetto SP, Lange C (eds) 13th International conference, ESWC 2016. Springer, Heraklion, Crete, GreeceGoogle Scholar
  18. 18.
    Palumbo E, Rizzo G, Troncy R (2018) STEM: stacked threshold-based entity matching for knowledge base generation. Semant Web 10:117–137CrossRefGoogle Scholar
  19. 19.
    Hu W, Chen J, Qu Y (2011) A self-training approach for resolving object conference on the semantic web. In: Proceedings of the 20th international conference on World wide web—WWW’11. ACM Press, New York, p 87Google Scholar
  20. 20.
    Kejriwal M, Miranker DDP (2015) Semi-supervised instance matching using boosted classifiers. In: European semantic web conference. pp 388–402CrossRefGoogle Scholar
  21. 21.
    Ngomo A, Lehmann J, Auer S (2011) Raven-active learning of link specifications. In: Proceedings of the 6th international conference on semantic web. pp 25–36Google Scholar
  22. 22.
    Araujo S, Tran DTD, De Vries AP, Schwabe D, de Vries A (2015) SERIMI: class-based matching for instance matching across heterogeneous datasets. IEEE Trans Knowl Data Eng 27:1397–1440CrossRefGoogle Scholar
  23. 23.
    Li J, Tang J, Li Y, Luo Q (2009) RiMOM: a dynamic multistrategy ontology alignment framework. IEEE Trans Knowl Data Eng 21:1218–1232CrossRefGoogle Scholar
  24. 24.
    Niu X, Rong S, Zhang Y, Wang H (2011) Zhishi. links results for OAEI 2011. In: CEUR workshop proceedingsGoogle Scholar
  25. 25.
    Saïs F, Niraula N, Pernelle N, Rousset MC (2010) LN2R—a knowledge based reference reconciliation system: OAEI 2010 results. In: CEUR workshop proceedings. pp 172–179Google Scholar
  26. 26.
    Luke S, Panait L (2002) Fighting bloat with nonparametric parsimony pressure. In: International conference on parallel problem solving from nature. Springer, Berlin, pp 411–421CrossRefGoogle Scholar
  27. 27.
    Luke S, Panait L (2006) A comparison of bloat control methods for genetic programming. Evol Comput 14:309–344CrossRefGoogle Scholar
  28. 28.
    Kejriwal M, Miranker DDP (2015) An unsupervised instance matcher for schema-free RDF data. Web Semant Sci Serv Agents World Wide Web 35:102–123CrossRefGoogle Scholar
  29. 29.
    Ramadan B, Christen P (2015) Unsupervised blocking key selection for real-time entity resolution. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Cham, pp 574–585CrossRefGoogle Scholar
  30. 30.
    Christen P (2008) Febrl: an open source data cleaning, deduplication and record linkage system with a graphical user interface. In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining—KDD 08. ACM Press, New York, pp 1065–1068Google Scholar
  31. 31.
    Obraczka D (2017) Active learning of link specifications using decision tree learning.
  32. 32.
    de Carvalho M, Laender AAHF, De Carvalho G, Laender AAHF, Andre M, Silva AS (2012) A genetic programming approach to record deduplication. IEEE Trans Knowl Data Eng 24:399–412CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Computer and Systems SciencesJawaharlal Nehru UniversityNew DelhiIndia

Personalised recommendations