Abstract
With the rapidly growing resource description framework (RDF) data on the Semantic Web, processing large semantic graph data has become more challenging. Constructing a summary graph structure from the raw RDF can help obtain semantic type relations and reduce the computational complexity for graph processing purposes. In this paper, we addressed the problem of graph summarization in RDF graphs, and we proposed an approach for building summary graph structures automatically from RDF graph data based on instance similarities. To scale our approach, we utilized locality-sensitive hashing technique for identifying instance pairs which are candidates to be in the same type class. Moreover, we introduced a measure to help discover optimum class dissimilarity thresholds and an effective method to discover the type classes automatically. In future work, we plan to investigate further improvement options on the scalability of the proposed method.
Similar content being viewed by others
References
Adida B, Birbeck M, McCarron S, Pemberton S (2008) RDFa in XHTML: syntax and processing. Recommendation W3C
Alzogbi A, Lausen G (2013) Similar structures inside rdf-graphs. LDOW 996
Antonellis I, Molina HG, Chang CC (2008) Simrank++: query rewriting through link analysis of the click graph. Proc VLDB Endow 1(1):408–421
Atre M, Chaoji V, Zaki MJ, Hendler JA (2010) Matrix bit loaded: a scalable lightweight join query processor for rdf data. In: Proceedings of the 19th international conference on World wide web, ACM, pp 41–50
Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z (2007) Dbpedia: a nucleus for a web of open data. Springer, Berlin
Aydar M, Ayvaz S (2018) An improved method of locality-sensitive hashing for scalable instance matching. Knowl Inf Syst pp 1–20
Aydar M, Ayvaz S, Melton AC (2015) Automatic weight generation and class predicate stability in rdf summary graphs. In: Workshop on intelligent exploration of semantic data (IESD2015), co-located with ISWC2015, vol 1472
Ayvaz S, Aydar M, Melton A (2015) Building summary graphs of rdf data in semantic web. In: 2015 IEEE 39th annual computer software and applications conference (COMPSAC), vol 2, pp 686–691. https://doi.org/10.1109/COMPSAC.2015.107
Bizer C, Heath T, Berners-Lee T (2009) Linked data-the story so far. Int J Seman Web Inf Syst 5(3):1–22
Brickley D, Guha RV (2014) RDF schema 1.1. W3c Recommendation. http://www.w3.org/TR/2014/REC-rdf-schema-20140225/
Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings of the compression and complexity of sequences 1997, IEEE, pp 21–29
Campinas S, Perry TE, Ceccarelli D, Delbru R, Tummarello G (2012) Introducing rdf graph summary with application to assisted sparql formulation. In: 2012 23rd international workshop on database and expert systems applications, IEEE, pp 261–266
Castano S, Ferrara A, Montanelli S, Lorusso D (2008) Instance matching for ontology population. In: SEBD, pp 121–132
Chakrabarti D, Faloutsos C (2006) Graph mining: laws, generators, and algorithms. ACM Comput Surv (CSUR) 38(1):2
Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P (2009) On compressing social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 219–228
Chu E, Beckmann J, Naughton J (2007) The case for a wide-table approach to manage sparse relational data sets. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data, ACM, pp 821–832
Consens MP, Fionda V, Khatchadourian S, Pirro G (2015) S+ epps: construct and explore bisimulation summaries, plus optimize navigational queries; all on existing sparql systems. Proc VLDB Endow 8(12):2028–2031
Cyganiak R, Wood D, Lanthaler M (2014) RDF 1.1 concepts and abstract syntax. W3c Recommendation. http://www.w3.org/TR/rdf11-concepts/section-IRIs
Pierce D, Booth C, Ogbuji D, Deaton CC, Blackstone E, Lenat D (2012) Semanticdb: a semantic web infrastructure for clinical research and quality reporting. Curr Bioinform 7(3):267–277
Duan S, Kementsietsidis A, Srinivas K, Udrea O (2011) Apples and oranges: a comparison of rdf benchmarks and real rdf datasets. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, ACM, pp 145–156
Fan W, Li J, Wang X, Wu Y (2012) Query preserving graph compression. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data, ACM, pp 157–168
Gaertler M (2005) Clustering. In: Brandes U, Erlebach T (eds) Network analysis. Lecture Notes in computer science, chap. 8, Springer, Berlin, pp 178–215
Goasdoué F, Manolescu I (2015) Query-oriented summarization of rdf graphs. Proc VLDB Endow 8(12). https://doi.org/10.14778/2824032.2824124
Guo Y, Pan Z, Heflin J (2005) Lubm: a benchmark for owl knowledge base systems. Web Semant Sci Serv Agents World Wide Web 3(2):158–182
He X, Kao MY, Lu HI (2000) A fast general methodology for information-theoretically optimal encodings of graphs. SIAM J Comput 30(3):838–846
Herrmann K, Voigt H, Lehner W (2014) Cinderella—adaptive online partitioning of irregularly structured data. In: 2014 IEEE 30th international conference on data engineering workshops (ICDEW), IEEE, pp 284–291
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc., Upper Saddle River
Jeh G, Widom J (2002) SimRank: a measure of structural-context similarity. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 538–543
Jin R, Lee VE, Hong H (2011) Axiomatic ranking of network role similarity. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 922–930
Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392
Khare R, Çelik T (2006) Microformats: a pragmatic path to the semantic web. In: Proceedings of the 15th international conference on world wide web, ACM, pp 865–866
Khatchadourian S, Consens MP (2010) Explod: summary-based exploration of interlinking and rdf usage in the linked open data cloud. In: Extended semantic web conference, vol 272–287, Springer, Berlin, pp 272–287
Levinson N (1946) The wiener (root mean square) error criterion in filter design and prediction. J Math Phys 25(1):261–278
Lin Z, Lyu MR, King I (2006) Pagesim: a novel link-based measure of web page aimilarity. In: Proceedings of the 15th international conference on world wide web, ACM, pp 1019–1020
Lin, Z., Lyu, MR, King I (2009) Matchsim: a novel neighbor-based similarity measure with maximum neighborhood matching. In: Proceedings of the 18th ACM conference on information and knowledge management, ACM, pp 1613–1616
Luhn HP (1957) A statistical approach to mechanized encoding and searching of literary information. IBM J Res Dev 1(4):309–317
Möller K, Heath T, Handschuh S, Domingue J (2007) Recipes for semantic web dog food—the ESWC and ISWC metadata projects. In: The semantic web, Springer, Berlin, pp 802–815
Newman ME (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256
Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026,113
Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the web. Stanford InfoLab
Paige R, Tarjan RE (1987) Three partition refinement algorithms. SIAM J Comput 16(6):973–989
Palma G, Vidal ME, Raschid L (2014) Drug-target interaction prediction using semantic similarity and edge partitioning. In: International semantic web conference, Springer, Berlin, pp 131–146
Parundekar R, Knoblock CA, Ambite JL (2012) Discovering concept coverings in ontologies of linked data sources. In: International semantic web conference, Springer, Berlin, pp 427–443
Pham MD, Passing L, Erling O, Boncz P (2015) Deriving an emergent relational schema from rdf data. In: Proceedings of the 24th international conference on world wide web, international world wide web conferences steering committee, pp 864–874
Picalausa F, Luo Y, Fletcher GH, Hidders J, Vansummeren S (2012) A structural approach to indexing triples. In: Extended semantic web conference, Springer, Berlin, pp 406–421
Raghavan S, Garcia-Molina H (2003) Representing web graphs. In: Proceedings of the 19th international conference on data engineering, 2003, IEEE, pp 405–416
Rajaraman A, Ullman JD (2011) Mining of massive datasets. Cambridge University Press, Cambridge
Seddiqui MH, Nath RPD, Aono M (2015) An efficient metric of automatic weight generation for properties in instance matching technique. Int J Web Semant Technol 6(1):1
Small H (1973) Co-citation in the scientific literature: a new measure of the relationship between two documents. J Am Soc Inf Sci 24(4):265–269
Sparck Jones K (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 28(1):11–21
Sun Y, Han J, Yan X, Yu PS, Wu T (2011) Pathsim: meta path-based top-k similarity search in heterogeneous information networks. VLDB–11
Tian Y, Hankins RA, Patel JM (2008) Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, ACM, pp 567–580
Tran T, Ladwig G (2010) Structure index for rdf data. In: Workshop on semantic data management
Tran T, Wang H, Rudolph S, Cimiano P (2009) Top-k exploration of query candidates for efficient keyword search on graph-shaped (rdf) data. In: ICDE’09. IEEE 25th international conference on data engineering, 2009, IEEE, pp 101–104
Traverso I, Vidal ME, Kämpgen B, Sure-Vetter Y (2016) Gades: a graph-based semantic similarity measure. In: Proceedings of the 12th international conference on semantic systems, ACM, pp 101–104
Traverso-Ribón I, Palma G, Flores A, Vidal ME (2016) Considering semantics on the discovery of relations in knowledge graphs. In: European knowledge acquisition workshop, Springer, Berlin, pp 666–680
Xu X, Yuruk N, Feng Z, Schweiger TA (2007) Scan: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 824–833
Zhang N, Tian Y, Patel JM (2010) Discovery-driven graph summarization. In: 2010 IEEE 26th international conference on data engineering (ICDE 2010), IEEE, pp 880–891
Zou L, Mo J, Chen L, Özsu MT, Zhao D (2011) gstore: answering sparql queries via subgraph matching. Proc VLDB Endow 4(8):482–493
Acknowledgements
The authors would like to thank Prof. Austin Melton for his invaluable help and his guidance during the study, Dr. Ruoming Jin and Dr. Viktor Lee for sharing RoleSim similarity measure.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ayvaz, S., Aydar, M. Dynamic Discovery of Type Classes and Relations in Semantic Web Data. J Data Semant 8, 57–75 (2019). https://doi.org/10.1007/s13740-019-00102-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13740-019-00102-6