Advertisement

Link Analysis of Life Science Linked Data

  • Wei Hu
  • Honglei Qiu
  • Michel Dumontier
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9367)

Abstract

Semantic Web technologies offer a promising mechanism for the representation and integration of thousands of biomedical databases. Many of these databases provide cross-references to other data sources, but they are generally incomplete and error-prone. In this paper, we conduct an empirical link analysis of the life science Linked Data, obtained from the Bio2RDF project. Three different link graphs for datasets, entities and terms are characterized using degree distribution, connectivity, and clustering metrics, and their correlation is measured as well. Furthermore, we analyze the symmetry and transitivity of entity links to build a benchmark and preliminarily evaluate several entity matching methods. Our findings indicate that the life science data network can help identify hidden links, can be used to validate links, and may offer the mechanism to integrate a wider set of resources for biomedical knowledge discovery.

Keywords

Link analysis Bio2RDF Life sciences Linked data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adamic, L.A., Huberman, B.A.: Power-Law Distribution of the World Wide Web. Science 287(5461), 2115 (2000)CrossRefGoogle Scholar
  2. 2.
    Alexander, K., Cyganiak, R., Hausenblas, M., Zhao, J.: Describing Linked Datasets with the VoID Vocabulary. W3C Interest Group Note (2011)Google Scholar
  3. 3.
    Barabási, A.-L., Gulbahce, N., Loscalzo, J.: Network Medicine: A Network-Based Approach to Human Disease. Nature Reviews Genetics 12, 56–68 (2011)CrossRefGoogle Scholar
  4. 4.
    Batchelor, C., et al.: Scientific lenses to support multiple views over linked chemistry data. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 98–113. Springer, Heidelberg (2014) Google Scholar
  5. 5.
    Belleau, F., Nolin, M.-A., Tourigny, N., Rigault, P., Morissette, J.: Bio2RDF: Towards a Mashup to Build Bioinformatics Knowledge Systems. Journal of Biomedical Informatics 41(5), 706–716 (2008)CrossRefGoogle Scholar
  6. 6.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked Data - The Story So Far. International Journal on Semantic Web and Information Systems 5(3), 1–22 (2009)CrossRefGoogle Scholar
  7. 7.
    Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.: Graph Structure in the Web. Computer Networks 33(1–6), 309–320 (2000)CrossRefGoogle Scholar
  8. 8.
    Callahan, A., Cruz-Toledo, J., Ansell, P., Dumontier, M.: Bio2RDF release 2: improved coverage, interoperability and provenance of life science linked data. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 200–212. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  9. 9.
    Chen, B., Dong, X., Jiao, D., Wang, H., Zhu, Q., Ding, Y., Wild, D.J.: Chem2Bio2RDF: A Semantic Framework for Linking and Data Mining Chemogenomic and Systems Chemical Biology Data. BMC Bioinformatics 11, 255 (2010)CrossRefGoogle Scholar
  10. 10.
    Cheng, G., Qu, Y.: Relatedness between Vocabularies on the Web of Data: A Taxonomy and an Empirical Study. Journal of Web Semantics 20, 1–17 (2013)CrossRefGoogle Scholar
  11. 11.
    Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-Law Distributions in Empirical Data. SIAM Review 51(4), 661–703 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Ding, L., Shinavier, J., Shangguan, Z., McGuinness, D.L.: SameAs networks and beyond: analyzing deployment status and implications of owl:sameAs in linked data. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 145–160. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  13. 13.
    Ell, B., Vrandečić, D., Simperl, E.: Labels in the web of data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 162–176. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  14. 14.
    Euzenat, J., Shvaiko, P.: Ontology Matching, 2nd edn. Springer (2013)Google Scholar
  15. 15.
    Ferrara, A., Nikolov, A., Noessner, J., Scharffe, F.: Evaluation of Instance Matching Tools: The Experience of OAEI. Journal of Web Semantics 21, 49–60 (2013)CrossRefGoogle Scholar
  16. 16.
    Ge, W., Chen, J., Hu, W., Qu, Y.: Object link structure in the semantic web. In: Aroyo, L., Antoniou, G., Hyvönen, E., ten Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) ESWC 2010, Part II. LNCS, vol. 6089, pp. 257–271. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  17. 17.
    Ghazvinian, A., Noy, N.F., Jonquet, C., Shah, N., Musen, M.A.: What four million mappings can tell you about two hundred ontologies. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 229–242. Springer, Heidelberg (2009) CrossRefGoogle Scholar
  18. 18.
    Halpin, H., Hayes, P.J., McCusker, J.P., McGuinness, D.L., Thompson, H.S.: When owl:sameAs isn’t the same: an analysis of identity in linked data. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 305–320. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  19. 19.
    Hu, W., Chen, J., Zhang, H., Qu, Y.: How matchable are four thousand ontologies on the semantic web. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 290–304. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  20. 20.
    Hu, W., Qu, Y.: Falcon-AO: A Practical Ontology Matching System. Journal of Web Semantics 6(3), 237–239 (2008)CrossRefGoogle Scholar
  21. 21.
    Jupp, S., Malone, J., Bolleman, J., Brandizi, M., Davies, M., Garcia, L., Gaulton, A., Gehant, S., Laibe, C., Redaschi, N., Wimalaratne, S.M., Martin, M., Le Novère, N., Parkinson, H., Birney, E., Jenkinson, A.M.: The EBI RDF Platform: Linked Open Data for the Life Sciences. Bioinformatics 30(9), 1338–1339 (2014)CrossRefGoogle Scholar
  22. 22.
    Myers, J.L., Well, A.D., Lorch Jr., R.F.: Research Design and Statistical Analysis, 3rd edn. Routledge (2010)Google Scholar
  23. 23.
    Nikolov, A., Motta, E.: Capturing emerging relations between schema ontologies on the web of data. In: International Workshop on Consuming Linked Data (2010)Google Scholar
  24. 24.
    Ruttenberg, A., Rees, J.A., Samwald, M., Marshall, M.S.: Life Sciences on the Semantic Web: The Neurocommons and Beyond. Briefings in Bioinformatics 10(2), 193–204 (2009)CrossRefGoogle Scholar
  25. 25.
    Theoharis, Y., Tzitzikas, Y., Kotzinos, D., Christophides, V.: On Graph Features of Semantic Web Schemas. IEEE Transactions on Knowledge and Data Engineering 20(5), 692–702 (2008)CrossRefGoogle Scholar
  26. 26.
    Tordai, A., Ghazvinian, A., van Ossenbruggen, J., Musen, M.A., Noy, N.F.: Lost in translation? empirical analysis of mapping compositions for large ontologies. In: International Workshop on Ontology Matching (2010)Google Scholar
  27. 27.
    Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and maintaining links on the web of data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 650–665. Springer, Heidelberg (2009) CrossRefGoogle Scholar
  28. 28.
    Xu, M., Wang, Z., Bie, R., Li, J., Zheng, C., Ke, W., Zhou, M.: Discovering missing semantic relations between entities in Wikipedia. In: Alani, H., et al. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 673–686. Springer, Heidelberg (2013) CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.State Key Laboratory for Novel Software TechnologyNanjing UniversityNanjingChina
  2. 2.Stanford Center for Biomedical Informatics ResearchStanford UniversityStanfordUSA

Personalised recommendations