Abstract
In the past decade, the Semantic web data community has focused on publishing and interlinking data. Data publication is now widely done activity, but more effort needs to be devoted to interlink data sources. Organizations have been publishing data using different data curation and publication policies that have resulted in the proliferation of data sources. This proliferation has brought several challenges in interlinking data sources. Different data sources use different properties, descriptions to describe the same entity. Entity linking problem is at the core of data interlinking, it identifies and links instances, records referring to the same real-world entity. The state-of-the-art Entity Linking approaches are based on supervised learning. Supervised approaches rely on the labeled data for a better learning model and suffer in the absence of labeled data. The cost of labeling is high, and it is infeasible to carry out manual labeling process for datasets having billions of records. In this work, the authors have proposed a simple heuristic-based approach to generate the labeled data. The proposed approach uses automatically generated labeled data to train an underlying Genetic Programming based linkage rule-learning model. The proposed approach is scalable for large datasets and achieves comparable performance to other supervised approaches while eliminating the need for labeled data. The proposed approach works in the unsupervised (fully automatic) way at the same time keeping the advantages of supervised approaches such as high accuracy and less complexity. Experimental analysis proves that the proposed approach is effective than many states of the art approaches.
Similar content being viewed by others
Notes
Online available at http://lod-cloud.net.
References
Bizer C, Heath T, Berners-Lee T (2009) Linked data—the story so far. Int J Semant Web Inf Syst 5:1–22
Schmachtenberg M, Bizer C, Paulheim H (2014) Adoption of the linked data best practices in different topical domains. In: International semantic web conference. pp 245–260
Koza J, Poli R (2005) Genetic programming. MIT Press, Cambridge
Volz J, Bizer C, Gaedke M, Kobilarov G (2009) Silk-A link discovery framework for the web of data. Linked data web WWW
Ngonga Ngomo A-C, Auer S, Ngomo A, Auer S (2011) Limes-a time-efficient approach for large-scale link discovery on the web of data. In: Proceedings of the twenty-second international joint conference on artificial intelligence. pp 2312–2317
Demartini G, Difallah D (2012) ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st international conference on World Wide Web. ACM Press, Cambridge, pp 469–478
Tejada S, Knoblock CCA, Minton S (2001) Learning object identification rules for information integration. Inf Syst 26:607–633
Elfeky M, Verykios V (2002) TAILOR: a record linkage toolbox. In: 18th international conference on data engineering
Bilenko M, Mooney RRJ (2003) Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD-2003). ACM Press, New York, pp 39–48
Bilenko M, View M, Mooney RJ (2006) Adaptive blocking : learning to scale up record linkage. In: IEEE International conference on data mining. pp 87–96
Isele R, Bizer C (2011) Learning linkage rules using genetic programming. In: Proceedings of the 6th international conference on ontology matching. pp 13–24
Isele R, Bizer C (2013) Active learning of expressive linkage rules using genetic programming. J Web Semant 23:2–15
Ngomo A, Lyko K, Ngonga Ngomo A-CC, Lyko K, Ngomo A, Lyko K, Ngonga Ngomo A-CC, Lyko K, Ngomo A, Lyko K, Ngonga Ngomo A-CC, Lyko K (2012) EAGLE: efficient active learning of link specifications using genetic programming. In: Extended semantic web conference. pp 149–163
Singh A, Sharan A (2018) Genetic-fuzzy programming based linkage rule miner (GFPLR-Miner) for entity linking in semantic web. Int J Semant Web Inf Syst 14:134–166
Singh A, Sharan A (2017) Adaptive genetic programming based linkage rule miner for entity linking in Semantic Web. In: 2017 International conference on computing, communication and automation (ICCCA). IEEE, pp 373–378
Sherif MA, Ngonga Ngomo A-C, Lehmann J (2017) Wombat—a generalization approach for automatic link discovery. In: European semantic web conference. Springer, Cham, pp 103–119
Lyko K, Lehmann J, Ngomo A-CN, Hassan M (2016) Induction of link specifications using refinement operators. In: Sack H, Blomqvist E, d’Aquin M, Ghidini C, Ponzetto SP, Lange C (eds) 13th International conference, ESWC 2016. Springer, Heraklion, Crete, Greece
Palumbo E, Rizzo G, Troncy R (2018) STEM: stacked threshold-based entity matching for knowledge base generation. Semant Web 10:117–137
Hu W, Chen J, Qu Y (2011) A self-training approach for resolving object conference on the semantic web. In: Proceedings of the 20th international conference on World wide web—WWW’11. ACM Press, New York, p 87
Kejriwal M, Miranker DDP (2015) Semi-supervised instance matching using boosted classifiers. In: European semantic web conference. pp 388–402
Ngomo A, Lehmann J, Auer S (2011) Raven-active learning of link specifications. In: Proceedings of the 6th international conference on semantic web. pp 25–36
Araujo S, Tran DTD, De Vries AP, Schwabe D, de Vries A (2015) SERIMI: class-based matching for instance matching across heterogeneous datasets. IEEE Trans Knowl Data Eng 27:1397–1440
Li J, Tang J, Li Y, Luo Q (2009) RiMOM: a dynamic multistrategy ontology alignment framework. IEEE Trans Knowl Data Eng 21:1218–1232
Niu X, Rong S, Zhang Y, Wang H (2011) Zhishi. links results for OAEI 2011. In: CEUR workshop proceedings
Saïs F, Niraula N, Pernelle N, Rousset MC (2010) LN2R—a knowledge based reference reconciliation system: OAEI 2010 results. In: CEUR workshop proceedings. pp 172–179
Luke S, Panait L (2002) Fighting bloat with nonparametric parsimony pressure. In: International conference on parallel problem solving from nature. Springer, Berlin, pp 411–421
Luke S, Panait L (2006) A comparison of bloat control methods for genetic programming. Evol Comput 14:309–344
Kejriwal M, Miranker DDP (2015) An unsupervised instance matcher for schema-free RDF data. Web Semant Sci Serv Agents World Wide Web 35:102–123
Ramadan B, Christen P (2015) Unsupervised blocking key selection for real-time entity resolution. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Cham, pp 574–585
Christen P (2008) Febrl: an open source data cleaning, deduplication and record linkage system with a graphical user interface. In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining—KDD 08. ACM Press, New York, pp 1065–1068
Obraczka D (2017) Active learning of link specifications using decision tree learning. https://pdfs.semanticscholar.org/4c58/9b2949e0accfb54a84bfac45567e452b99d3.pdf
de Carvalho M, Laender AAHF, De Carvalho G, Laender AAHF, Andre M, Silva AS (2012) A genetic programming approach to record deduplication. IEEE Trans Knowl Data Eng 24:399–412
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Singh, A., Sharan, A. Unsupervised genetic programming based linkage rule (UGPLR) Miner for entity linking in semantic web. Evol. Intel. 12, 609–632 (2019). https://doi.org/10.1007/s12065-019-00263-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-019-00263-0