Skip to main content

Automatic Creation and Analysis of a Linked Data Cloud Diagram

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2016 (WISE 2016)

Abstract

Datasets published on the Web and following the Linked Open Data (LOD) practices have the potential to enrich other LOD datasets in multiple domains. However, the lack of descriptive information, combined with the large number of available LOD datasets, inhibits their interlinking and consumption. Aiming at facilitating such tasks, this paper proposes an automated clustering process for the LOD datasets that, thereby, provide an up-to-date description of the LOD cloud. The process combines metadata inspection and extraction strategies, community detection methods and dataset profiling techniques. The clustering process is evaluated using the LOD diagram as ground truth. The results show the ability of the proposed process to replicate the LOD diagram and to identify new LOD dataset clusters. Finally, experiments conducted by LOD experts indicate that the clustering process generates dataset clusters that tend to be more descriptive than those manually defined in the LOD diagram.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://datahub.io/.

  2. 2.

    http://linkeddatacatalog.dws.informatik.uni-mannheim.de/.

  3. 3.

    http://stats.lod2.eu/.

  4. 4.

    http://lodlaundromat.org.

  5. 5.

    http://www.rkbexplorer.com/

References

  1. Ngomo, A.-C.N., Auer, S.: LIMES - a time-efficient approach for large-scale link discovery on the web of data. In: Presented at the 22nd International Joint Conference on Artificial Intelligence (2011)

    Google Scholar 

  2. Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: SILK - a link discovery framework for the web of data. In: Presented at the Workshop on Linked Data on the Web Colocated with the 18th International World Wide Web Conference (2009)

    Google Scholar 

  3. Jentzsch, A., Cyganiak, R., Bizer, C.: State of the LOD Cloud. http://lod-cloud.net/state/

  4. Schmachtenberg, M., Bizer, C., Paulheim, H.: adoption of the linked data best practices in different topical domains. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandečić, D., Groth, P., Noy, N., Janowicz, K., Goble, C. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 245–260. Springer, Heidelberg (2014). doi:10.1007/978-3-319-11964-9_16

    Google Scholar 

  5. Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Presented at the SIAM International Conference on Data Mining, San Francisco, CA (2003)

    Google Scholar 

  6. Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. PNAS 99, 7821–7826 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  7. Lee, C., Reid, F., McDaid, A., Hurley, N.: Detecting highly overlapping community structure by greedy clique expansion. In: Presented at the 4th International Workshop on Social Network Mining and Analysis Colocated with the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2010)

    Google Scholar 

  8. Gregory, S.: Finding overlapping communities in networks by label propagation. New J. Phys. 12, 103018 (2010)

    Article  Google Scholar 

  9. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  10. Pereira Nunes, B., Mera, A., Casanova, M.A., Fetahu, B., Paes Leme, L.A.P., Dietze, S.: Complex matching of RDF datatype properties. In: Decker, H., Lhotská, L., Link, S., Basl, J., Tjoa, A.M. (eds.) DEXA 2013, Part I. LNCS, vol. 8055, pp. 195–208. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  11. Kawase, R., Siehndel, P., Nunes, B.P., Herder, E., Nejdl, W.: Exploiting the wisdom of the crowds for characterizing and connecting heterogeneous resources. In: Presented at the 25th ACM Conference on Hypertext and Social Media, New York, New York, USA (2014)

    Google Scholar 

  12. Fortunato, S.: Community detection in graphs. Physics Reports, vol. 486 (2010)

    Google Scholar 

  13. Xie, J., Kelley, S., Szymanski, B.K.: Overlapping community detection in networks: the state-of-the-art and comparative study. In: CSUR, vol. 45 (2013)

    Google Scholar 

  14. Rodriguez, M.A.: A Graph Analysis of the Linked Data Cloud. ArXiv e-prints (2009)

    Google Scholar 

  15. Fetahu, B., Dietze, S., Pereira Nunes, B., Antonio Casanova, M., Taibi, D., Nejdl, W.: A scalable approach for efficiently generating structured dataset topic profiles. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 519–534. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  16. Lalithsena, S., Hitzler, P., Sheth, A.P., Jain, P.: Automatic domain identification for linked open data. In: Presented at the International Conference on Web Intelligence and Conference on Intelligent Agent Technology (2013)

    Google Scholar 

  17. Emaldi, M., Corcho, O., López-de-Ipiña, D.: Detection of related semantic datasets based on frequent subgraph mining. In: Presented at the Workshop on Intelligent Exploration of Semantic Data Colocated with the 14th International Semantic Web Conference (2015)

    Google Scholar 

  18. Rabello Lopes, G., Paes Leme, L.A.P., Pereira Nunes, B., Casanova, M.A., Dietze, S.: Two approaches to the dataset interlinking recommendation problem. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds.) WISE 2014, Part I. LNCS, vol. 8786, pp. 324–339. Springer, Heidelberg (2014)

    Google Scholar 

  19. Caraballo, A.A.M., Nunes, B.P., Lopes, G.R., Paes Leme, L.A.P., Casanova, M.A., Dietze, S.: TRT - a tripleset recommendation tool. In: Presented at the 12th International Semantic Web Conference (2013)

    Google Scholar 

  20. Leme, L.A.P., Lopes, G.R., Nunes, B.P., Casanova, M.A., Dietze, S.: Identifying candidate datasets for data interlinking. In: Daniel, F., Dolog, P., Li, Q. (eds.) ICWE 2013. LNCS, vol. 7977, pp. 354–366. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  21. Lopes, Giseli Rabello, Leme, Luiz André PPaes, Nunes, Bernardo Pereira, Casanova, Marco Antonio, Dietze, Stefan: Recommending tripleset interlinking through a social network approach. In: Lin, Xuemin, Manolopoulos, Yannis, Srivastava, Divesh, Huang, Guangyan (eds.) WISE 2013, Part I. LNCS, vol. 8180, pp. 149–161. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

Download references

Acknowledgments

This work was partly funded by CNPq under grants 153908/2015-7, 557128/2009-9, 444976/2014-0, 303332/2013-1, 442338/2014-7 and 248743/2013-9 and by FAPERJ under grants e E-26-170028/2008 and E-26/201.337/2014. The authors would also like to thank the Microsoft Azure Research Program by the cloud resources awarded for the project entitled “Assessing Recommendation Approaches for Dataset Interlinking”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander Arturo Mera Caraballo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Caraballo, A.A.M., Nunes, B.P., Lopes, G.R., Leme, L.A.P.P., Casanova, M.A. (2016). Automatic Creation and Analysis of a Linked Data Cloud Diagram. In: Cellary, W., Mokbel, M., Wang, J., Wang, H., Zhou, R., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2016. WISE 2016. Lecture Notes in Computer Science(), vol 10041. Springer, Cham. https://doi.org/10.1007/978-3-319-48740-3_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48740-3_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48739-7

  • Online ISBN: 978-3-319-48740-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics