Abstract
Biological domain has been blessed with more and more data from biotechnologies as well as data integration tools. In the renaissance of machine learning and artificial intelligence, there is so much promise of data-driven biological knowledge discovery. However, it is not straight forward due to the complexity of the domain knowledge hidden in the data. At any level, be it atoms, molecules, cells or organisms, there are rich interdependencies among biological components. Machine learning approaches in this domain usually involves analyzing interdependency structures encoded in graphs and related formalisms. In this report, we review our work in developing new Machine Learning methods for these applications with improved performances in comparison with state-of-the-art methods. We show how the networks among biological components can be used to predict properties.
Similar content being viewed by others
References
Bakir G, Hofmann T, Schoelkopf B, Smola AJ, Taskar B, Vishwanathan, SVN editors (2006). Predicting Structured Data. MIT Press, Cambridge, MA.
Ben-Hur A, Noble WS (2005). Kernel methods for predicting protein- protein interactions. Bioinformatics 21(1): 38–46.
Brouard C et al. (2016). Fast metabolite identification with input output kernel regression. Bioinformatics 32: i28–i36.
de Hoffmann E, Stroobant V (2007). Mass Spectrometry, Principles and Applications (3ed). John Wiley & Sons.
Duvenaud DK et al. (2015). Convolutional networks on graphs for learning molecular fingerprints. Neural Information Processing Systems 2: 2224–2232. Curran Associates, Inc., Montreal, Canada.
Gama-Castro S et al. (2011). RegulonDB version 7.0: Transcriptional regulation of escherichia coli k-12 integrated within genetic sensory response units (gensor units). Nucleic Acids Research 39(1): 98–105.
Getoor L, Taskar B (2007). Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning). Cambridge, MA: MIT Press.
Gilmer J et al. (2017). Neural message passing for quantum chemistry. Proceedings of the 34th International Conference on Machine Learning(PMLR): 1263–1272. Sydney, Australia.
Gretton A et al. (2005). Measuring statistical dependence with Hilbert-Schmidt norms. Proceedings of the 16th International Conference on Algorithmic Learning Theory(ALT05): 63–77. Springer-Verlag, Berlin, Heidelberg.
Griffiths T, Ghahramani Z (2005). Infinite latent feature models and the Indian buffet process. Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press.
Imre T et al. (2008). Mass spectrometric and linear discriminant analysis of n-glycans of human serum alpha-1-acid glycoprotein in cancer patients and healthy individuals. Journal of Proteomics 71: 186–197.
Jebara T et al. (2004). Probability product kernels. Journal of Machine Learning Research 5: 819–844.
Kanehisa M, Araki, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi, T (2008). KEGG for linking genomes to life and the environment. Nucleic Acids Research 36(1): 480–484.
Kato T, Tsuda K, Asai K (2005). Selective integration of multiple biological data for supervised network inference. Bioinformatics 21(10): 2488–2495.
Kitano H (2002). Systems biology: A brief overview. Science 295(5560): 1662–1664.
Liben-Nowell D, Kleinberg J (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology 58: 1019–1031.
Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P (2002). Comparative assessment of large-scale data sets of protein? Protein interactions. Nature 417(6887): 399–403.
Nguyen CH, Mamitsuka H (2011). Discriminative graph embedding for label propagation. IEEE Transactions on Neural Networks 22(9): 1395–1405.
Nguyen CH, Mamitsuka H (2012). Latent feature kernels for link prediction on sparse graphs. IEEE Transactions on Neural Networks and Learning Systems 23(11): 1793–1804.
Nguyen DH, Nguyen CH, Mamitsuka H (2018). SIMPLE: Sparse interaction model over peaks of MoLEcules for fast, interpretable metabolite identification from tandem mass spectra. Bioinformatics: Proceedings of the 26th International Conference on Intelligent Systems for Molecular Biology (ISMB 2018): i323-i332.
Nguyen DH, Nguyen CH, Mamitsuka, H (2019). ADAPTIVE: learning Data-dependent, concise molecular VEctors for fast, accurate metabolite identification from tandem mass spectra. Bioinformatics 35: Proceedings of the 26th International Conference on Intelligent Systems for Molecular Biology (ISMB 2019): i164-i172.
Salwinski, L Miller, CS Smith, AJ Pettit, FK Bowie, FU, Eisenberg D (2004). The database of interacting proteins: 2004 update. Nucleic Acids Research 32(1): 449–451.
Scheubert K et al. (2013). Computational mass spectrometry for small molecules. Journal of Cheminformatics 5: 12.
Smola AJ, Kondor RI (2003). Kernels and regularization on graphs. In Proceedings of Conference on Learning Theory: 144–158.
Srebro N, Rennie JDM, Jaakola TS (2005). Maximum-margin matrix factorization. Advances in Neural Information Processing Systems 17: 1329–1336. Cambridge, MA: MIT Press.
Tsuda K, Noble WS (2004). Learning kernels from biological networks by maximizing entropy. Bioinformatics 20(1): 326–333.
Wishart D S (2007). Current progress in computational metabolomics. Briefings in Bioinformatics 8: 279–293.
Yamanishi Y (2008). Supervised bipartite graph inference. Advances in Neural Information Processing Systems 1841–1848. Cambridge, MA: MIT Press.
Zhu X, Ghahramani Z, Lafferty J (2003). Semi-supervised learning using Gaussian fields and harmonic functions. The 20th International Conference on Machine Learning (ICML): 912–919.
Acknowledgments
The work is partially supported by Japan MEXT Kakenhi 18K11434 and Vingroup Innovation Foundation (VINIF) project code VINIF.2019.DA18. We greatly appreciate anonymous reviewers for their constructive comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Canh Hao Nguyen is currently a senior lecturer of Bioinformatics Center, Institute for Chemical Research, Kyoto University, Japan. His research interest is machine learning, with application to biomedical domains.
Rights and permissions
About this article
Cite this article
Nguyen, C.H. Structured Learning in Biological Domain. J. Syst. Sci. Syst. Eng. 29, 440–453 (2020). https://doi.org/10.1007/s11518-020-5461-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11518-020-5461-5