Bicliques in Graphs with Correlated Edges: From Artificial to Biological Networks
Networks representing complex biological interactions are often very intricate and rely on algorithmic tools for thorough quantitative analysis. In bi-layered graphs, identifying subgraphs of potential biological meaning relies on identifying bicliques between two sets of associated nodes, or variables – for example, diseases and genetic variants. Researchers have developed multiple approaches for forming bicliques and it is important to understand the features of these models and their applicability to real-life problems. We introduce a novel algorithm specifically designed for finding maximal bicliques in large datasets. In this study, we applied this algorithm to a variety of networks, including artificially generated networks as well as biological networks based on phenotype-genotype and phenotype-pathway interactions. We analyzed performance with respect to network features including density, node degree distribution, and correlation between nodes, with density being the major contributor to computational complexity. We also examined sample bicliques and postulate that these bicliques could be useful in elucidating the genetic and biological underpinnings of shared disease etiologies and in guiding hypothesis generation. Moving forward, we propose additional features, such as weighted edges between nodes, that could enhance our study of biological networks.
KeywordsBipartite Graph Degree Distribution Biological Network Maximal Clique Bipartite Network
This work was supported by National Institutes of Health grants LM009012, LM010098, and EY022300.
- 4.Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proceedings of the International Conference Intelligent Systems for Molecular Biology, vol. 8, pp. 93–103 (2000)Google Scholar
- 5.Cleynen, I., Boucher, G., Jostins, L., Schumm, L.P., Zeissig, S., Ahmad, T., Andersen, V., Andrews, J.M., Annese, V., Brand, S., Brant, S.R., Cho, J.H., Daly, M.J., Dubinsky, M., Duerr, R.H., Ferguson, L.R., Franke, A., Gearry, R.B., Goyette, P., Hakonarson, H., Halfvarson, J., Hov, J.R., Huang, H., Kennedy, N.A., Kupcinskas, L., Lawrance, I.C., Lee, J.C., Satsangi, J., Schreiber, S., Théâtre, E., van der Meulen-de Jong, A.E., Weersma, R.K., Wilson, D.C., Parkes, M., Vermeire, S., Rioux, J.D., Mansfield, J., Silverberg, M.S., Radford-Smith, G., McGovern, D.P.B., Barrett, J.C., Lees, C.W.: Inherited determinants of Crohn’s disease, ulcerative colitis phenotypes: a genetic association study. Lancet (2015)Google Scholar
- 6.Darabos, C., Desai, K., Cowper-Sal\(\cdot \)lari, R., Giacobini, M., Graham, B.E., Lupien, M., Moore, J.H.: Inferring human phenotype networks from genome-wide genetic associations. In: Vanneschi, L., Bush, W.S., Giacobini, M. (eds.) EvoBIO 2013. LNCS, vol. 7833, pp. 23–34. Springer, Heidelberg (2013)Google Scholar
- 7.Darabos, C., Grussing, E.D., Cricco, M.E., Clark, K.A., Moore, J.H.: A bipartite network approach to inferring interactions between environmental exposures and human diseases. In: Pacific Symposium on Biocomputing, pp. 171–182 (2015)Google Scholar
- 8.Darabos, C., Harmon, S.H., Moore, J.H.: Using the bipartite human phenotype network to reveal pleiotropy and epistasis beyond the gene. In: Pacific Symposium on Biocomputing, pp. 188–199 (2014)Google Scholar
- 15.Liu, J., Wang, W.: Op-cluster: clustering by tendency in high dimensional space. In: 2003 Third IEEE International Conference on Data Mining, ICDM 2003, pp. 187–194, November 2003Google Scholar
- 23.Qiu, J., Darabos, C., Moore, J.H.: Studying the genetics of complex diseases with ethnicity-specific human phenotype networks: the case of type 2 diabetes in east asian populations. In: 5th Translational Bioinformatics Conference (2014)Google Scholar
- 25.Sim, K., Li, J., Gopalkrishnan, V., Liu, G.: Mining maximal quasi-bicliques to co-cluster stocks and financial ratios for value investment. In 2006 Sixth International Conference on Data Mining, ICDM 2006, pp. 1059–1063, December 2006Google Scholar
- 27.Wang, H., Wang, W., Yang, J., Yu, P.S.: Clustering by pattern similarity in large data sets. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2002, pp. 394–405. ACM, New York (2002)Google Scholar