Bisociative Knowledge Discovery pp 364-378

Part of the Lecture Notes in Computer Science book series (LNCS, volume 7250)

Biomine: A Network-Structured Resource of Biological Entities for Link Prediction

  • Lauri Eronen
  • Petteri Hintsanen
  • Hannu Toivonen

Abstract

Biomine is a biological graph database constructed from public databases. Its entities (vertices) include biological concepts (such as genes, proteins, tissues, processes and phenotypes, as well as scientific articles) and relations (edges) between these entities correspond to real-world phenomena such as “a gene codes for a protein” or “an article refers to a phenotype”. Biomine also provides tools for querying the graph for connections and visualizing them interactively.

We describe the Biomine graph database. We also discuss link discovery in such biological graphs and review possible link prediction measures. Biomine currently contains over 1 million entities and over 8 million relations between them, with focus on human genetics. It is available on-line and can be queried for connecting subgraphs between biological entities.

References

  1. 1.
    Kötter, T., Berthold, M.R.: From Information Networks to Bisociative Information Networks. In: Berthold, M.R. (ed.) Bisociative Knowledge Discovery. LNCS (LNAI), vol. 7250, pp. 33–50. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  2. 2.
    Dubitzky, W., Kötter, T., Schmidt, O., Berthold, M.R.: Towards Creative Information Exploration Based on Koestler’s Concept of Bisociation. In: Berthold, M.R. (ed.) Bisociative Knowledge Discovery. LNCS (LNAI), vol. 7250, pp. 11–32. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  3. 3.
    Sevon, P., Eronen, L., Hintsanen, P., Kulovesi, K., Toivonen, H.: Link discovery in graphs derived from biological databases. In: Proceedings of Data Integration in the Life Sciences, Third International Workshop, pp. 35–49 (2006)Google Scholar
  4. 4.
    Getoor, L., Diehl, C.P.: Link mining: A survey. SIGKDD Explorations 7, 3–12 (2005)CrossRefGoogle Scholar
  5. 5.
    Maglott, D., Ostell, J., Pruitt, K.D., Tatusova, T.: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Research 35, D26–D31 (2007)Google Scholar
  6. 6.
    Sayers, E.W., Barrett, T., Benson, D.A., Bolton, E., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Federhen, S., Feolo, M., Geer, L.Y., Helmberg, W., Kapustin, Y., Landsman, D., Lipman, D.J., Lu, Z., Madden, T.L., Madej, T., Maglott, D.R., Marchler-Bauer, A., Miller, V., Mizrachi, I., Ostell, J., Panchenko, A., Pruitt, K.D., Schuler, G.D., Sequeira, E., Sherry, S.T., Shumway, M., Sirotkin, K., Slotta, D., Souvorov, A., Starchenko, G., Tatusova, T.A., Wagner, L., Wang, Y., Wilbur, W.J., Yaschenko, E., Ye, J.: Database resources of the National Center for Biotechnology Information. Nucleic Acids Research 38, 5–16 (2010)CrossRefGoogle Scholar
  7. 7.
    The Uniprot Consortium: The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Research 38, D142–D148 (2010)Google Scholar
  8. 8.
    Hunter, S., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., Bork, P., Das, U., Daugherty, L., Duquenne, L., Finn, R.D., Gough, J., Haft, D., Hulo, N., Kahn, D., Kelly, E., Laugraud, A., Letunic, I., Lonsdale, D., Lopez, R., Madera, M., Maslen, J., McAnulla, C., McDowall, J., Mistry, J., Mitchell, A., Mulder, N., Natale, D., Orengo, C., Quinn, A.F., Selengut, J.D., Sigrist, C.J.A., Thimma, M., Thomas, P.D., Valentin, F., Wilson, D., Wu, C.H., Yeats, C.: InterPro: the integrative protein signature database. Nucleic Acids Research 37, D211–D215 (2009)Google Scholar
  9. 9.
    Jensen, L.J., Kuhn, M., Stark, M., Chaffron, S., Creevey, C., Muller, J., Doerks, T., Julien, P., Roth, A., Simonovic, M., Bork, P., von Mering, C.: STRING 8—a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Research 37, D412–D416 (2009)Google Scholar
  10. 10.
    The Gene Ontology Consortium: Gene ontology: tool for the unification of biology. Nature Genetics 25(1), 25–29 (2000)Google Scholar
  11. 11.
    Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M., Hirakawa, M.: KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Research 38, D355–D360 (January 2010)Google Scholar
  12. 12.
    Gerhard, D.S., et al.: The status, quality, and expansion of the NIH full-length cDNA project: The Mammalian Gene Collection (MGC). Genome Research 14, 2121–2127 (2004), full list of authors http://dx.doi.org/10.1101/gr.2596504CrossRefGoogle Scholar
  13. 13.
    Liben-Nowell, D., Kleinberg, J.: The link prediction problem for social networks. Journal of the American Society for Information Science and Technology 58(7), 1019–1031 (2007)CrossRefGoogle Scholar
  14. 14.
    Newman, M.E.J.: Clustering and preferential attachment in growing networks. Physical Review E 64(2), 025102 (2001)CrossRefGoogle Scholar
  15. 15.
    Adamic, L.A., Adar, E.: Friends and neighbors on the Web. Social Networks 25(3), 211–230 (2003)CrossRefGoogle Scholar
  16. 16.
    Faloutsos, C., McCurley, K.S., Tomkins, A.: Fast discovery of connection subgraphs. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 118–127 (2004)Google Scholar
  17. 17.
    Koren, Y., North, S.C., Volinsky, C.: Measuring and extracting proximity graphs in networks. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245–255 (2006)Google Scholar
  18. 18.
    Doyle, P.G., Snell, J.L.: Random walks and electric networks (January 2000), http://arxiv.org/abs/math.PR/0001057
  19. 19.
    Brandes, U., Fleischer, D.: Centrality Measures Based on Current Flow. In: Diekert, V., Durand, B. (eds.) STACS 2005. LNCS, vol. 3404, pp. 533–544. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  20. 20.
    Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)CrossRefMATHGoogle Scholar
  21. 21.
    Chen, H., Zhang, F.: The expected hitting times for finite Markov chains. Linear Algebra and its Applications 428(11-12), 2730–2749 (2008)MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Canada, pp. 538–543. ACM (July 2002)Google Scholar
  23. 23.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998)CrossRefGoogle Scholar
  24. 24.
    Colbourn, C.J.: The Combinatorics of Network Reliability. Oxford University Press (1987)Google Scholar
  25. 25.
    Köhler, J., Baumbach, J., Taubert, J., Specht, M., Skusa, A., Rüegg, A., Rawlings, C., Verrier, P., Philippi, S.: Graph-based analysis and visualization of experimental results with ONDEX. Bioinformatics 22(11), 1383–1390 (2006)CrossRefGoogle Scholar
  26. 26.
    Birkland, A., Yona, G.: BIOZON: a system for unification, management and analysis of heterogeneous biological data. BMC Bioinformatics 7(1), 70 (2006)CrossRefGoogle Scholar
  27. 27.
    Lee, T., Pouliot, Y., Wagner, V., Gupta, P., Stringer-Calvert, D., Tenenbaum, J., Karp, P.: BioWarehouse: a bioinformatics database warehouse toolkit. BMC Bioinformatics 7(1), 170 (2006)CrossRefGoogle Scholar

Copyright information

© The Author(s) 2012 2012

Authors and Affiliations

  • Lauri Eronen
    • 1
  • Petteri Hintsanen
    • 1
  • Hannu Toivonen
    • 1
  1. 1.Department of Computer Science and HIITUniversity of HelsinkiFinland

Personalised recommendations