Link Discovery in Graphs Derived from Biological Databases

(Research Paper)
  • Petteri Sevon
  • Lauri Eronen
  • Petteri Hintsanen
  • Kimmo Kulovesi
  • Hannu Toivonen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4075)


Public biological databases contain vast amounts of rich data that can also be used to create and evaluate new biological hypothesis. We propose a method for link discovery in biological databases, i.e., for prediction and evaluation of implicit or previously unknown connections between biological entities and concepts. In our framework, information extracted from available databases is represented as a graph, where vertices correspond to entities and concepts, and edges represent known, annotated relationships between vertices. A link, an (implicit and possibly unknown) relation between two entities is manifested as a path or a subgraph connecting the corresponding vertices. We propose measures for link goodness that are based on three factors: edge reliability, relevance, and rarity. We handle these factors with a proper probabilistic interpretation. We give practical methods for finding and evaluating links in large graphs and report experimental results with Alzheimer genes and protein interactions.


Alzheimer Disease Random Graph Link Prediction Good Path Edge Type 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Turner, F.S., Clutterbuck, D.R., Semple, C.A.M.: POCUS: Mining genomic sequence annotation to predict disease genes. Genome Biology 4, R75 (2003)CrossRefGoogle Scholar
  2. 2.
    Perez-Iratxeta, C., Wjst, M., Bork, P., Andrade, M.A.: G2D: A tool for mining genes associated with disease. BMC Genetics 6, 45 (2005)CrossRefGoogle Scholar
  3. 3.
    Colbourn, C.J.: The Combinatorics of Network Reliability. Oxford University Press, Oxford (1987)Google Scholar
  4. 4.
    Getoor, L., Diehl, C.P.: Link mining: A survey. SIGKDD Explorations 7, 3–12 (2005)CrossRefGoogle Scholar
  5. 5.
    Swanson, D.R.: Fish oil, Raynaud’s syndrome and undiscovered public knowledge. Perspectives in Biology and Medicine 30, 7–18 (1986)Google Scholar
  6. 6.
    Swanson, D.R., Smalheiser, N.R.: An interactive system for finding complementary literatures: A stimulus to scientific discovery. Artificial Intelligence 91, 183–203 (1997)MATHCrossRefGoogle Scholar
  7. 7.
    Liben-Nowell, D., Kleinberg, J.: The link prediction problem fof social networks. In: Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM 2003), pp. 556–559 (2003)Google Scholar
  8. 8.
    Lin, S., Chalupsky, H.: Unsupervised link discovery in multi-relational data via rarity analysis. In: Proceedings of the Third IEEE International Conference on Data Mining (ICDM 2003), pp. 171–178 (2003)Google Scholar
  9. 9.
    Faloutsos, C., McCurley, K.S., Tomkins, A.: Fast discovery of connection subgraphs. In: KDD 2004: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 118–127 (2004)Google Scholar
  10. 10.
    Asthana, S., King, O.D., Gibbons, F.D., Roth, F.P.: Predicting protein complex memebership using probabilistic network reliability. Genome Research 14, 1170–1175 (2004)CrossRefGoogle Scholar
  11. 11.
    Ramakrishnan, C., Milnor, W.H., Perry, M., Sheth, A.P.: Discovering informative connection subgraphs in multi-relational graphs. SIGKDD Explorations 7, 56–63 (2005)CrossRefGoogle Scholar
  12. 12.
    Tarjan, R.E.: Data Structures and Network Algorithms. CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia (1983)Google Scholar
  13. 13.
    Eppstein, D.: Finding the k shortest paths. SIAM Journal on Computing 28, 652–673 (1998)MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Valiant, L.G.: The complexity of enumeration and reliability problems. SIAM Journal on Computing 8, 410–421 (1979)MATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Lacroix, Z., Raschid, L., Vidal, M.-E.: Efficient techniques to explore and rank paths in life science data sources. In: Rahm, E. (ed.) DILS 2004. LNCS (LNBI), vol. 2994, pp. 187–202. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  16. 16.
    Mork, P., Shaker, R., Halevy, A., Tarczy-Hornoch, P.: PQL: A declarative query language over dynamic biological schemata. In: Proceedings of the American Medical Informatics Association Annual Symposium 2002, pp. 533–537 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Petteri Sevon
    • 1
  • Lauri Eronen
    • 1
  • Petteri Hintsanen
    • 1
  • Kimmo Kulovesi
    • 1
  • Hannu Toivonen
    • 1
  1. 1.HIIT Basic Research Unit,Department of Computer ScienceUniversity of HelsinkiFinland

Personalised recommendations