Skip to main content
Log in

Dynamic Discovery of Type Classes and Relations in Semantic Web Data

  • Original Article
  • Published:
Journal on Data Semantics

Abstract

With the rapidly growing resource description framework (RDF) data on the Semantic Web, processing large semantic graph data has become more challenging. Constructing a summary graph structure from the raw RDF can help obtain semantic type relations and reduce the computational complexity for graph processing purposes. In this paper, we addressed the problem of graph summarization in RDF graphs, and we proposed an approach for building summary graph structures automatically from RDF graph data based on instance similarities. To scale our approach, we utilized locality-sensitive hashing technique for identifying instance pairs which are candidates to be in the same type class. Moreover, we introduced a measure to help discover optimum class dissimilarity thresholds and an effective method to discover the type classes automatically. In future work, we plan to investigate further improvement options on the scalability of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Adapted from [29]

Fig. 3

Adapted from [29]

Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://bit.ly/rdfsummarizer.

  2. https://github.com/RDF-Molecules/sim_service.

References

  1. Adida B, Birbeck M, McCarron S, Pemberton S (2008) RDFa in XHTML: syntax and processing. Recommendation W3C

  2. Alzogbi A, Lausen G (2013) Similar structures inside rdf-graphs. LDOW 996

  3. Antonellis I, Molina HG, Chang CC (2008) Simrank++: query rewriting through link analysis of the click graph. Proc VLDB Endow 1(1):408–421

    Article  Google Scholar 

  4. Atre M, Chaoji V, Zaki MJ, Hendler JA (2010) Matrix bit loaded: a scalable lightweight join query processor for rdf data. In: Proceedings of the 19th international conference on World wide web, ACM, pp 41–50

  5. Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z (2007) Dbpedia: a nucleus for a web of open data. Springer, Berlin

    Google Scholar 

  6. Aydar M, Ayvaz S (2018) An improved method of locality-sensitive hashing for scalable instance matching. Knowl Inf Syst pp 1–20

  7. Aydar M, Ayvaz S, Melton AC (2015) Automatic weight generation and class predicate stability in rdf summary graphs. In: Workshop on intelligent exploration of semantic data (IESD2015), co-located with ISWC2015, vol 1472

  8. Ayvaz S, Aydar M, Melton A (2015) Building summary graphs of rdf data in semantic web. In: 2015 IEEE 39th annual computer software and applications conference (COMPSAC), vol 2, pp 686–691. https://doi.org/10.1109/COMPSAC.2015.107

  9. Bizer C, Heath T, Berners-Lee T (2009) Linked data-the story so far. Int J Seman Web Inf Syst 5(3):1–22

    Article  Google Scholar 

  10. Brickley D, Guha RV (2014) RDF schema 1.1. W3c Recommendation. http://www.w3.org/TR/2014/REC-rdf-schema-20140225/

  11. Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings of the compression and complexity of sequences 1997, IEEE, pp 21–29

  12. Campinas S, Perry TE, Ceccarelli D, Delbru R, Tummarello G (2012) Introducing rdf graph summary with application to assisted sparql formulation. In: 2012 23rd international workshop on database and expert systems applications, IEEE, pp 261–266

  13. Castano S, Ferrara A, Montanelli S, Lorusso D (2008) Instance matching for ontology population. In: SEBD, pp 121–132

  14. Chakrabarti D, Faloutsos C (2006) Graph mining: laws, generators, and algorithms. ACM Comput Surv (CSUR) 38(1):2

    Article  Google Scholar 

  15. Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P (2009) On compressing social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 219–228

  16. Chu E, Beckmann J, Naughton J (2007) The case for a wide-table approach to manage sparse relational data sets. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data, ACM, pp 821–832

  17. Consens MP, Fionda V, Khatchadourian S, Pirro G (2015) S+ epps: construct and explore bisimulation summaries, plus optimize navigational queries; all on existing sparql systems. Proc VLDB Endow 8(12):2028–2031

    Article  Google Scholar 

  18. Cyganiak R, Wood D, Lanthaler M (2014) RDF 1.1 concepts and abstract syntax. W3c Recommendation. http://www.w3.org/TR/rdf11-concepts/section-IRIs

  19. Pierce D, Booth C, Ogbuji D, Deaton CC, Blackstone E, Lenat D (2012) Semanticdb: a semantic web infrastructure for clinical research and quality reporting. Curr Bioinform 7(3):267–277

    Article  Google Scholar 

  20. Duan S, Kementsietsidis A, Srinivas K, Udrea O (2011) Apples and oranges: a comparison of rdf benchmarks and real rdf datasets. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, ACM, pp 145–156

  21. Fan W, Li J, Wang X, Wu Y (2012) Query preserving graph compression. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data, ACM, pp 157–168

  22. Gaertler M (2005) Clustering. In: Brandes U, Erlebach T (eds) Network analysis. Lecture Notes in computer science, chap. 8, Springer, Berlin, pp 178–215

  23. Goasdoué F, Manolescu I (2015) Query-oriented summarization of rdf graphs. Proc VLDB Endow 8(12). https://doi.org/10.14778/2824032.2824124

  24. Guo Y, Pan Z, Heflin J (2005) Lubm: a benchmark for owl knowledge base systems. Web Semant Sci Serv Agents World Wide Web 3(2):158–182

    Article  Google Scholar 

  25. He X, Kao MY, Lu HI (2000) A fast general methodology for information-theoretically optimal encodings of graphs. SIAM J Comput 30(3):838–846

    Article  MathSciNet  MATH  Google Scholar 

  26. Herrmann K, Voigt H, Lehner W (2014) Cinderella—adaptive online partitioning of irregularly structured data. In: 2014 IEEE 30th international conference on data engineering workshops (ICDEW), IEEE, pp 284–291

  27. Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc., Upper Saddle River

    MATH  Google Scholar 

  28. Jeh G, Widom J (2002) SimRank: a measure of structural-context similarity. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 538–543

  29. Jin R, Lee VE, Hong H (2011) Axiomatic ranking of network role similarity. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 922–930

  30. Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392

    Article  MathSciNet  MATH  Google Scholar 

  31. Khare R, Çelik T (2006) Microformats: a pragmatic path to the semantic web. In: Proceedings of the 15th international conference on world wide web, ACM, pp 865–866

  32. Khatchadourian S, Consens MP (2010) Explod: summary-based exploration of interlinking and rdf usage in the linked open data cloud. In: Extended semantic web conference, vol 272–287, Springer, Berlin, pp 272–287

  33. Levinson N (1946) The wiener (root mean square) error criterion in filter design and prediction. J Math Phys 25(1):261–278

    Article  MathSciNet  Google Scholar 

  34. Lin Z, Lyu MR, King I (2006) Pagesim: a novel link-based measure of web page aimilarity. In: Proceedings of the 15th international conference on world wide web, ACM, pp 1019–1020

  35. Lin, Z., Lyu, MR, King I (2009) Matchsim: a novel neighbor-based similarity measure with maximum neighborhood matching. In: Proceedings of the 18th ACM conference on information and knowledge management, ACM, pp 1613–1616

  36. Luhn HP (1957) A statistical approach to mechanized encoding and searching of literary information. IBM J Res Dev 1(4):309–317

    Article  MathSciNet  Google Scholar 

  37. Möller K, Heath T, Handschuh S, Domingue J (2007) Recipes for semantic web dog food—the ESWC and ISWC metadata projects. In: The semantic web, Springer, Berlin, pp 802–815

  38. Newman ME (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256

    Article  MathSciNet  MATH  Google Scholar 

  39. Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026,113

    Article  Google Scholar 

  40. Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the web. Stanford InfoLab

  41. Paige R, Tarjan RE (1987) Three partition refinement algorithms. SIAM J Comput 16(6):973–989

    Article  MathSciNet  MATH  Google Scholar 

  42. Palma G, Vidal ME, Raschid L (2014) Drug-target interaction prediction using semantic similarity and edge partitioning. In: International semantic web conference, Springer, Berlin, pp 131–146

  43. Parundekar R, Knoblock CA, Ambite JL (2012) Discovering concept coverings in ontologies of linked data sources. In: International semantic web conference, Springer, Berlin, pp 427–443

  44. Pham MD, Passing L, Erling O, Boncz P (2015) Deriving an emergent relational schema from rdf data. In: Proceedings of the 24th international conference on world wide web, international world wide web conferences steering committee, pp 864–874

  45. Picalausa F, Luo Y, Fletcher GH, Hidders J, Vansummeren S (2012) A structural approach to indexing triples. In: Extended semantic web conference, Springer, Berlin, pp 406–421

  46. Raghavan S, Garcia-Molina H (2003) Representing web graphs. In: Proceedings of the 19th international conference on data engineering, 2003, IEEE, pp 405–416

  47. Rajaraman A, Ullman JD (2011) Mining of massive datasets. Cambridge University Press, Cambridge

    Book  Google Scholar 

  48. Seddiqui MH, Nath RPD, Aono M (2015) An efficient metric of automatic weight generation for properties in instance matching technique. Int J Web Semant Technol 6(1):1

    Article  Google Scholar 

  49. Small H (1973) Co-citation in the scientific literature: a new measure of the relationship between two documents. J Am Soc Inf Sci 24(4):265–269

    Article  MathSciNet  Google Scholar 

  50. Sparck Jones K (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 28(1):11–21

    Article  Google Scholar 

  51. Sun Y, Han J, Yan X, Yu PS, Wu T (2011) Pathsim: meta path-based top-k similarity search in heterogeneous information networks. VLDB–11

  52. Tian Y, Hankins RA, Patel JM (2008) Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, ACM, pp 567–580

  53. Tran T, Ladwig G (2010) Structure index for rdf data. In: Workshop on semantic data management

  54. Tran T, Wang H, Rudolph S, Cimiano P (2009) Top-k exploration of query candidates for efficient keyword search on graph-shaped (rdf) data. In: ICDE’09. IEEE 25th international conference on data engineering, 2009, IEEE, pp 101–104

  55. Traverso I, Vidal ME, Kämpgen B, Sure-Vetter Y (2016) Gades: a graph-based semantic similarity measure. In: Proceedings of the 12th international conference on semantic systems, ACM, pp 101–104

  56. Traverso-Ribón I, Palma G, Flores A, Vidal ME (2016) Considering semantics on the discovery of relations in knowledge graphs. In: European knowledge acquisition workshop, Springer, Berlin, pp 666–680

  57. Xu X, Yuruk N, Feng Z, Schweiger TA (2007) Scan: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 824–833

  58. Zhang N, Tian Y, Patel JM (2010) Discovery-driven graph summarization. In: 2010 IEEE 26th international conference on data engineering (ICDE 2010), IEEE, pp 880–891

  59. Zou L, Mo J, Chen L, Özsu MT, Zhao D (2011) gstore: answering sparql queries via subgraph matching. Proc VLDB Endow 4(8):482–493

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Prof. Austin Melton for his invaluable help and his guidance during the study, Dr. Ruoming Jin and Dr. Viktor Lee for sharing RoleSim similarity measure.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Serkan Ayvaz.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ayvaz, S., Aydar, M. Dynamic Discovery of Type Classes and Relations in Semantic Web Data. J Data Semant 8, 57–75 (2019). https://doi.org/10.1007/s13740-019-00102-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13740-019-00102-6

Keywords

Navigation