Advertisement

ISaaC: Identifying Structural Relations in Biological Data with Copula-Based Kernel Dependency Measures

Conference paper
  • 1.4k Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10813)

Abstract

The goal of this paper is to develop a novel statistical framework for inferring dependence between distributions of variables in omics data. We propose the concept of building a dependence network using a copula-based kernel dependency measures to reconstruct the underlying association network between the distributions. ISaaC is utilized for reverse-engineering gene regulatory networks and is competitive with several state-of-the-art gene regulatory inferrence methods on DREAM3 and DREAM4 Challenge datasets. An open-source implementation of ISaaC is available at https://bitbucket.org/HossamAlmeer/isaac/.

Keywords

DREAM Challenge Maximum Mean Discrepancy (MMD) Copula Transformation Reproducing Kernel Hilbert Space (RKHS) Hilbert-Schmidt Independence Criterion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Bach, F.R., Jordan, M.I.: Kernel independent component analysis. J. Mach. Learn. Res. 3(Jul), 1–48 (2002)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Borgwardt, K.M., Gretton, A., Rasch, M.J., Kriegel, H.P., Schölkopf, B., Smola, A.J.: Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22(14), e49–e57 (2006)CrossRefGoogle Scholar
  3. 3.
    Bosq, D.: Contribution à la théorie de l’estimation fonctionnelle. Institut de statistique de l’Université de Paris, Paris (1971)zbMATHGoogle Scholar
  4. 4.
    Dedecker, J., Doukhan, P., Lang, G., José Rafael, L., Louhichi, S., Prieur, C.: The empirical process. In: Dedecker, J., Doukhan, P., Lang, G., José Rafael, L., Louhichi, S., Prieur, C. (eds.) Weak Dependence: With Examples and Applications, pp. 223–246. Springer, New York (2007).  https://doi.org/10.1007/978-0-387-69952-3_10CrossRefGoogle Scholar
  5. 5.
    Evangelista, P.F., Embrechts, M.J., Szymanski, B.K.: Some properties of the gaussian kernel for one class learning. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D. (eds.) ICANN 2007. LNCS, vol. 4668, pp. 269–278. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-74690-4_28CrossRefGoogle Scholar
  6. 6.
    Fortet, R., Mourier, E.: Convergence de la répartition empirique vers la répartition théorique. Annales scientifiques de l’École Normale Supérieure 70(3), 267–285 (1953)CrossRefzbMATHGoogle Scholar
  7. 7.
    Gretton, A., Borgwardt, K.M., Rasch, M., Schölkopf, B., Smola, A.J.: A kernel method for the two-sample-problem. In: Advances in Neural Information Processing Systems, pp. 513–520 (2007)Google Scholar
  8. 8.
    Gretton, A., Bousquet, O., Smola, A., Schölkopf, B.: Measuring statistical dependence with Hilbert-Schmidt norms. In: Jain, S., Simon, H.U., Tomita, E. (eds.) ALT 2005. LNCS (LNAI), vol. 3734, pp. 63–77. Springer, Heidelberg (2005).  https://doi.org/10.1007/11564089_7CrossRefGoogle Scholar
  9. 9.
    Gretton, A., Herbrich, R., Smola, A.J.: The kernel mutual information. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings, ICASSP 2003, vol. 4, pp. IV-880. IEEE (2003)Google Scholar
  10. 10.
    Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis, vol. 46. Wiley, New Jersy (2004)Google Scholar
  11. 11.
    Irrthum, A., Wehenkel, L., Geurts, P., et al.: Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 5(9), e12776 (2010)CrossRefGoogle Scholar
  12. 12.
    Karlebach, G., Shamir, R.: Modelling and analysis of gene regulatory networks. Nat. Rev. Mol. Cell Biol. 9(10), 770–780 (2008)CrossRefGoogle Scholar
  13. 13.
    Krus, D.J., Blackman, H.S.: Test reliability and homogeneity from the perspective of the ordinal test theory. Appl. Measur. Educ. 1(1), 79–88 (1988)CrossRefGoogle Scholar
  14. 14.
    Mall, R., Cerulo, L., Garofano, L., Frattini, V., Kunji, K., Bensmail, H., Sabedot, T.S., Noushmehr, H., Lasorella, A., Iavarone, A., Ceccarelli, M.: RGBM: regularized gradient boosting machines for identification of the transcriptional regulators of discrete glioma subtypes. Nucleic Acids Res. gky015 (2018).  https://doi.org/10.1093/nar/gky015
  15. 15.
    Mall, R., Jumutc, V., Langone, R., Suykens, J.A.: Representative subsets for big data learning using k-NN graphs. In: 2014 IEEE International Conference on Big Data (Big Data), pp. 37–42. IEEE (2014)Google Scholar
  16. 16.
    Mall, R., Suykens, J.A.: Very sparse LSSVM reductions for large-scale data. IEEE Trans. Neural Netw. Learn. Syst. 26(5), 1086–1097 (2015)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Marbach, D., Costello, J.C., Küffner, R., Vega, N.M., Prill, R.J., Camacho, D.M., Allison, K.R., Kellis, M., Collins, J.J., Stolovitzky, G., et al.: Wisdom of crowds for robust gene network inference. Nat. Methods 9(8), 796–804 (2012)CrossRefGoogle Scholar
  18. 18.
    Marbach, D., Prill, R.J., Schaffter, T., Mattiussi, C., Floreano, D., Stolovitzky, G.: Revealing strengths and weaknesses of methods for gene network inference. Proc. Nat. Acad. Sci. 107(14), 6286–6291 (2010)CrossRefGoogle Scholar
  19. 19.
    Marbach, D., Schaffter, T., Mattiussi, C., Floreano, D.: Generating realistic in silico gene networks for performance assessment of reverse engineering methods. J. Comput. Biol. 16(2), 229–239 (2009)CrossRefGoogle Scholar
  20. 20.
    Margolin, A.A., Nemenman, I., Basso, K., Wiggins, C., Stolovitzky, G., Dalla Favera, R., Califano, A.: Aracne: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinform. 7(1), S7 (2006)CrossRefGoogle Scholar
  21. 21.
    Nelsen, R.B.: An Introduction to Copulas. Springer, Heidelberg (2007).  https://doi.org/10.1007/0-387-28678-0zbMATHGoogle Scholar
  22. 22.
    Pál, D., Póczos, B., Szepesvári, C.: Estimation of rényi entropy and mutual information based on generalized nearest-neighbor graphs. In: Advances in Neural Information Processing Systems, pp. 1849–1857 (2010)Google Scholar
  23. 23.
    Petralia, F., Wang, P., Yang, J., Tu, Z.: Integrative random forest for gene regulatory network inference. Bioinformatics 31(12), i197–i205 (2015)CrossRefGoogle Scholar
  24. 24.
    Pinna, A., Soranzo, N., De La Fuente, A.: From knockouts to networks: establishing direct cause-effect relationships through graph analysis. PLoS ONE 5(10), e12912 (2010)CrossRefGoogle Scholar
  25. 25.
    Plaisier, C.L., O’Brien, S., Bernard, B., Reynolds, S., Simon, Z., Toledo, C.M., Ding, Y., Reiss, D.J., Paddison, P.J., Baliga, N.S.: Causal mechanistic regulatory network for glioblastoma deciphered using systems genetics network analysis. Cell Syst. 3(2), 172–186 (2016)CrossRefGoogle Scholar
  26. 26.
    Póczos, B., Ghahramani, Z., Schneider, J.: Copula-based kernel dependency measures. arXiv preprint arXiv:1206.4682 (2012)
  27. 27.
    Prill, R.J., Marbach, D., Saez-Rodriguez, J., Sorger, P.K., Alexopoulos, L.G., Xue, X., Clarke, N.D., Altan-Bonnet, G., Stolovitzky, G.: Towards a rigorous assessment of systems biology models: the DREAM3 challenges. PLoS ONE 5(2), e9202 (2010)CrossRefGoogle Scholar
  28. 28.
    Rényi, A., et al.: On measures of entropy and information. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 547–561 (1961)Google Scholar
  29. 29.
    Sarmanov, O.: The maximum correlation coefficient (symmetrical case). Dokl. Akad. Nauk SSSR 120(4), 715–718 (1958)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Schaffter, T., Marbach, D., Floreano, D.: Genenetweaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics 27(16), 2263–2270 (2011)CrossRefGoogle Scholar
  31. 31.
    Schweizer, B., Wolff, E.F.: On nonparametric measures of dependence for random variables. Ann. Stat. 9(4), 879–885 (1981)MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Shannon, C.W., Weaver, W.: The Mathematical Theory of Communication. Press UoI, Urbana (1949)zbMATHGoogle Scholar
  33. 33.
    Sławek, J., Arodź, T.: Ennet: inferring large gene regulatory networks from expression data using gradient boosting. BMC Syst. Biol. 7(1), 1 (2013)CrossRefGoogle Scholar
  34. 34.
    van Someren, E., Wessels, L., Backer, E., Reinders, M.: Genetic network modeling. Pharmacogenomics 3(4), 507–525 (2002)CrossRefzbMATHGoogle Scholar
  35. 35.
    Steinwart, I.: On the influence of the kernel on the consistency of support vector machines. J. Mach. Learn. Res. 2(Nov), 67–93 (2001)MathSciNetzbMATHGoogle Scholar
  36. 36.
    Sun, X., Janzing, D., Schölkopf, B., Fukumizu, K.: A kernel-based causal learning algorithm. In: Proceedings of the 24th International Conference on Machine Learning, pp. 855–862. ACM (2007)Google Scholar
  37. 37.
    Székely, G.J., Rizzo, M.L., Bakirov, N.K., et al.: Measuring and testing dependence by correlation of distances. Ann. Stat. 35(6), 2769–2794 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Tsallis, C.: Possible generalization of Boltzmann-gibbs statistics. J. Stat. Phys. 52(1), 479–487 (1988)MathSciNetCrossRefzbMATHGoogle Scholar
  39. 39.
    Weisstein, E.: Sklar’s theorem. Retrieved 4, 15 (2011)Google Scholar
  40. 40.
    Yip, K.Y., Alexander, R.P., Yan, K.K., Gerstein, M.: Improved reconstruction of in silico gene regulatory networks by integrating knockout and perturbation data. PLoS ONE 5(1), e8121 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Qatar Computing Research InstituteHamad Bin Khalifa UniversityDohaQatar
  2. 2.Department of MathematicsAl Faisal UniversityRiyadhKingdom of Saudi Arabia

Personalised recommendations