Link Discovery in Graphs Derived from Biological Databases
Public biological databases contain vast amounts of rich data that can also be used to create and evaluate new biological hypothesis. We propose a method for link discovery in biological databases, i.e., for prediction and evaluation of implicit or previously unknown connections between biological entities and concepts. In our framework, information extracted from available databases is represented as a graph, where vertices correspond to entities and concepts, and edges represent known, annotated relationships between vertices. A link, an (implicit and possibly unknown) relation between two entities is manifested as a path or a subgraph connecting the corresponding vertices. We propose measures for link goodness that are based on three factors: edge reliability, relevance, and rarity. We handle these factors with a proper probabilistic interpretation. We give practical methods for finding and evaluating links in large graphs and report experimental results with Alzheimer genes and protein interactions.
KeywordsAlzheimer Disease Random Graph Link Prediction Good Path Edge Type
Unable to display preview. Download preview PDF.
- 3.Colbourn, C.J.: The Combinatorics of Network Reliability. Oxford University Press, Oxford (1987)Google Scholar
- 5.Swanson, D.R.: Fish oil, Raynaud’s syndrome and undiscovered public knowledge. Perspectives in Biology and Medicine 30, 7–18 (1986)Google Scholar
- 7.Liben-Nowell, D., Kleinberg, J.: The link prediction problem fof social networks. In: Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM 2003), pp. 556–559 (2003)Google Scholar
- 8.Lin, S., Chalupsky, H.: Unsupervised link discovery in multi-relational data via rarity analysis. In: Proceedings of the Third IEEE International Conference on Data Mining (ICDM 2003), pp. 171–178 (2003)Google Scholar
- 9.Faloutsos, C., McCurley, K.S., Tomkins, A.: Fast discovery of connection subgraphs. In: KDD 2004: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 118–127 (2004)Google Scholar
- 12.Tarjan, R.E.: Data Structures and Network Algorithms. CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia (1983)Google Scholar
- 16.Mork, P., Shaker, R., Halevy, A., Tarczy-Hornoch, P.: PQL: A declarative query language over dynamic biological schemata. In: Proceedings of the American Medical Informatics Association Annual Symposium 2002, pp. 533–537 (2002)Google Scholar