Structured Learning in Biological Domain

Nguyen, Canh Hao

doi:10.1007/s11518-020-5461-5

Canh Hao Nguyen¹

81 Accesses
1 Citation
Explore all metrics

Abstract

Biological domain has been blessed with more and more data from biotechnologies as well as data integration tools. In the renaissance of machine learning and artificial intelligence, there is so much promise of data-driven biological knowledge discovery. However, it is not straight forward due to the complexity of the domain knowledge hidden in the data. At any level, be it atoms, molecules, cells or organisms, there are rich interdependencies among biological components. Machine learning approaches in this domain usually involves analyzing interdependency structures encoded in graphs and related formalisms. In this report, we review our work in developing new Machine Learning methods for these applications with improved performances in comparison with state-of-the-art methods. We show how the networks among biological components can be used to predict properties.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introduction to Bioinformatics

Current progress and open challenges for applying deep learning across the biosciences

Article Open access 01 April 2022

Machine Learning Framework: Predicting Protein Structural Features

References

Bakir G, Hofmann T, Schoelkopf B, Smola AJ, Taskar B, Vishwanathan, SVN editors (2006). Predicting Structured Data. MIT Press, Cambridge, MA.
Google Scholar
Ben-Hur A, Noble WS (2005). Kernel methods for predicting protein- protein interactions. Bioinformatics 21(1): 38–46.
Article Google Scholar
Brouard C et al. (2016). Fast metabolite identification with input output kernel regression. Bioinformatics 32: i28–i36.
Article Google Scholar
de Hoffmann E, Stroobant V (2007). Mass Spectrometry, Principles and Applications (3ed). John Wiley & Sons.
Google Scholar
Duvenaud DK et al. (2015). Convolutional networks on graphs for learning molecular fingerprints. Neural Information Processing Systems 2: 2224–2232. Curran Associates, Inc., Montreal, Canada.
Google Scholar
Gama-Castro S et al. (2011). RegulonDB version 7.0: Transcriptional regulation of escherichia coli k-12 integrated within genetic sensory response units (gensor units). Nucleic Acids Research 39(1): 98–105.
Article Google Scholar
Getoor L, Taskar B (2007). Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning). Cambridge, MA: MIT Press.
Book MATH Google Scholar
Gilmer J et al. (2017). Neural message passing for quantum chemistry. Proceedings of the 34th International Conference on Machine Learning(PMLR): 1263–1272. Sydney, Australia.
Gretton A et al. (2005). Measuring statistical dependence with Hilbert-Schmidt norms. Proceedings of the 16th International Conference on Algorithmic Learning Theory(ALT05): 63–77. Springer-Verlag, Berlin, Heidelberg.
Chapter Google Scholar
Griffiths T, Ghahramani Z (2005). Infinite latent feature models and the Indian buffet process. Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press.
Google Scholar
Imre T et al. (2008). Mass spectrometric and linear discriminant analysis of n-glycans of human serum alpha-1-acid glycoprotein in cancer patients and healthy individuals. Journal of Proteomics 71: 186–197.
Article Google Scholar
Jebara T et al. (2004). Probability product kernels. Journal of Machine Learning Research 5: 819–844.
MathSciNet MATH Google Scholar
Kanehisa M, Araki, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi, T (2008). KEGG for linking genomes to life and the environment. Nucleic Acids Research 36(1): 480–484.
Google Scholar
Kato T, Tsuda K, Asai K (2005). Selective integration of multiple biological data for supervised network inference. Bioinformatics 21(10): 2488–2495.
Article Google Scholar
Kitano H (2002). Systems biology: A brief overview. Science 295(5560): 1662–1664.
Article Google Scholar
Liben-Nowell D, Kleinberg J (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology 58: 1019–1031.
Article Google Scholar
Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P (2002). Comparative assessment of large-scale data sets of protein? Protein interactions. Nature 417(6887): 399–403.
Article Google Scholar
Nguyen CH, Mamitsuka H (2011). Discriminative graph embedding for label propagation. IEEE Transactions on Neural Networks 22(9): 1395–1405.
Article Google Scholar
Nguyen CH, Mamitsuka H (2012). Latent feature kernels for link prediction on sparse graphs. IEEE Transactions on Neural Networks and Learning Systems 23(11): 1793–1804.
Article Google Scholar
Nguyen DH, Nguyen CH, Mamitsuka H (2018). SIMPLE: Sparse interaction model over peaks of MoLEcules for fast, interpretable metabolite identification from tandem mass spectra. Bioinformatics: Proceedings of the 26th International Conference on Intelligent Systems for Molecular Biology (ISMB 2018): i323-i332.
Nguyen DH, Nguyen CH, Mamitsuka, H (2019). ADAPTIVE: learning Data-dependent, concise molecular VEctors for fast, accurate metabolite identification from tandem mass spectra. Bioinformatics 35: Proceedings of the 26th International Conference on Intelligent Systems for Molecular Biology (ISMB 2019): i164-i172.
Salwinski, L Miller, CS Smith, AJ Pettit, FK Bowie, FU, Eisenberg D (2004). The database of interacting proteins: 2004 update. Nucleic Acids Research 32(1): 449–451.
Article Google Scholar
Scheubert K et al. (2013). Computational mass spectrometry for small molecules. Journal of Cheminformatics 5: 12.
Article Google Scholar
Smola AJ, Kondor RI (2003). Kernels and regularization on graphs. In Proceedings of Conference on Learning Theory: 144–158.
Srebro N, Rennie JDM, Jaakola TS (2005). Maximum-margin matrix factorization. Advances in Neural Information Processing Systems 17: 1329–1336. Cambridge, MA: MIT Press.
Google Scholar
Tsuda K, Noble WS (2004). Learning kernels from biological networks by maximizing entropy. Bioinformatics 20(1): 326–333.
Article Google Scholar
Wishart D S (2007). Current progress in computational metabolomics. Briefings in Bioinformatics 8: 279–293.
Article Google Scholar
Yamanishi Y (2008). Supervised bipartite graph inference. Advances in Neural Information Processing Systems 1841–1848. Cambridge, MA: MIT Press.
Google Scholar
Zhu X, Ghahramani Z, Lafferty J (2003). Semi-supervised learning using Gaussian fields and harmonic functions. The 20th International Conference on Machine Learning (ICML): 912–919.

Download references

Acknowledgments

The work is partially supported by Japan MEXT Kakenhi 18K11434 and Vingroup Innovation Foundation (VINIF) project code VINIF.2019.DA18. We greatly appreciate anonymous reviewers for their constructive comments.

Author information

Authors and Affiliations

Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, Japan
Canh Hao Nguyen

Authors

Canh Hao Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Canh Hao Nguyen.

Additional information

Canh Hao Nguyen is currently a senior lecturer of Bioinformatics Center, Institute for Chemical Research, Kyoto University, Japan. His research interest is machine learning, with application to biomedical domains.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, C.H. Structured Learning in Biological Domain. J. Syst. Sci. Syst. Eng. 29, 440–453 (2020). https://doi.org/10.1007/s11518-020-5461-5

Download citation

Published: 16 July 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s11518-020-5461-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Structured Learning in Biological Domain

Abstract

Access this article

Similar content being viewed by others

Introduction to Bioinformatics

Current progress and open challenges for applying deep learning across the biosciences

Machine Learning Framework: Predicting Protein Structural Features

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Structured Learning in Biological Domain

Abstract

Access this article

Similar content being viewed by others

Introduction to Bioinformatics

Current progress and open challenges for applying deep learning across the biosciences

Machine Learning Framework: Predicting Protein Structural Features

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation