Skip to main content
Log in

Structured Learning in Biological Domain

  • Published:
Journal of Systems Science and Systems Engineering Aims and scope Submit manuscript

Abstract

Biological domain has been blessed with more and more data from biotechnologies as well as data integration tools. In the renaissance of machine learning and artificial intelligence, there is so much promise of data-driven biological knowledge discovery. However, it is not straight forward due to the complexity of the domain knowledge hidden in the data. At any level, be it atoms, molecules, cells or organisms, there are rich interdependencies among biological components. Machine learning approaches in this domain usually involves analyzing interdependency structures encoded in graphs and related formalisms. In this report, we review our work in developing new Machine Learning methods for these applications with improved performances in comparison with state-of-the-art methods. We show how the networks among biological components can be used to predict properties.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bakir G, Hofmann T, Schoelkopf B, Smola AJ, Taskar B, Vishwanathan, SVN editors (2006). Predicting Structured Data. MIT Press, Cambridge, MA.

    Google Scholar 

  • Ben-Hur A, Noble WS (2005). Kernel methods for predicting protein- protein interactions. Bioinformatics 21(1): 38–46.

    Article  Google Scholar 

  • Brouard C et al. (2016). Fast metabolite identification with input output kernel regression. Bioinformatics 32: i28–i36.

    Article  Google Scholar 

  • de Hoffmann E, Stroobant V (2007). Mass Spectrometry, Principles and Applications (3ed). John Wiley & Sons.

    Google Scholar 

  • Duvenaud DK et al. (2015). Convolutional networks on graphs for learning molecular fingerprints. Neural Information Processing Systems 2: 2224–2232. Curran Associates, Inc., Montreal, Canada.

    Google Scholar 

  • Gama-Castro S et al. (2011). RegulonDB version 7.0: Transcriptional regulation of escherichia coli k-12 integrated within genetic sensory response units (gensor units). Nucleic Acids Research 39(1): 98–105.

    Article  Google Scholar 

  • Getoor L, Taskar B (2007). Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning). Cambridge, MA: MIT Press.

    Book  MATH  Google Scholar 

  • Gilmer J et al. (2017). Neural message passing for quantum chemistry. Proceedings of the 34th International Conference on Machine Learning(PMLR): 1263–1272. Sydney, Australia.

  • Gretton A et al. (2005). Measuring statistical dependence with Hilbert-Schmidt norms. Proceedings of the 16th International Conference on Algorithmic Learning Theory(ALT05): 63–77. Springer-Verlag, Berlin, Heidelberg.

    Chapter  Google Scholar 

  • Griffiths T, Ghahramani Z (2005). Infinite latent feature models and the Indian buffet process. Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press.

    Google Scholar 

  • Imre T et al. (2008). Mass spectrometric and linear discriminant analysis of n-glycans of human serum alpha-1-acid glycoprotein in cancer patients and healthy individuals. Journal of Proteomics 71: 186–197.

    Article  Google Scholar 

  • Jebara T et al. (2004). Probability product kernels. Journal of Machine Learning Research 5: 819–844.

    MathSciNet  MATH  Google Scholar 

  • Kanehisa M, Araki, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi, T (2008). KEGG for linking genomes to life and the environment. Nucleic Acids Research 36(1): 480–484.

    Google Scholar 

  • Kato T, Tsuda K, Asai K (2005). Selective integration of multiple biological data for supervised network inference. Bioinformatics 21(10): 2488–2495.

    Article  Google Scholar 

  • Kitano H (2002). Systems biology: A brief overview. Science 295(5560): 1662–1664.

    Article  Google Scholar 

  • Liben-Nowell D, Kleinberg J (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology 58: 1019–1031.

    Article  Google Scholar 

  • Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P (2002). Comparative assessment of large-scale data sets of protein? Protein interactions. Nature 417(6887): 399–403.

    Article  Google Scholar 

  • Nguyen CH, Mamitsuka H (2011). Discriminative graph embedding for label propagation. IEEE Transactions on Neural Networks 22(9): 1395–1405.

    Article  Google Scholar 

  • Nguyen CH, Mamitsuka H (2012). Latent feature kernels for link prediction on sparse graphs. IEEE Transactions on Neural Networks and Learning Systems 23(11): 1793–1804.

    Article  Google Scholar 

  • Nguyen DH, Nguyen CH, Mamitsuka H (2018). SIMPLE: Sparse interaction model over peaks of MoLEcules for fast, interpretable metabolite identification from tandem mass spectra. Bioinformatics: Proceedings of the 26th International Conference on Intelligent Systems for Molecular Biology (ISMB 2018): i323-i332.

  • Nguyen DH, Nguyen CH, Mamitsuka, H (2019). ADAPTIVE: learning Data-dependent, concise molecular VEctors for fast, accurate metabolite identification from tandem mass spectra. Bioinformatics 35: Proceedings of the 26th International Conference on Intelligent Systems for Molecular Biology (ISMB 2019): i164-i172.

  • Salwinski, L Miller, CS Smith, AJ Pettit, FK Bowie, FU, Eisenberg D (2004). The database of interacting proteins: 2004 update. Nucleic Acids Research 32(1): 449–451.

    Article  Google Scholar 

  • Scheubert K et al. (2013). Computational mass spectrometry for small molecules. Journal of Cheminformatics 5: 12.

    Article  Google Scholar 

  • Smola AJ, Kondor RI (2003). Kernels and regularization on graphs. In Proceedings of Conference on Learning Theory: 144–158.

  • Srebro N, Rennie JDM, Jaakola TS (2005). Maximum-margin matrix factorization. Advances in Neural Information Processing Systems 17: 1329–1336. Cambridge, MA: MIT Press.

    Google Scholar 

  • Tsuda K, Noble WS (2004). Learning kernels from biological networks by maximizing entropy. Bioinformatics 20(1): 326–333.

    Article  Google Scholar 

  • Wishart D S (2007). Current progress in computational metabolomics. Briefings in Bioinformatics 8: 279–293.

    Article  Google Scholar 

  • Yamanishi Y (2008). Supervised bipartite graph inference. Advances in Neural Information Processing Systems 1841–1848. Cambridge, MA: MIT Press.

    Google Scholar 

  • Zhu X, Ghahramani Z, Lafferty J (2003). Semi-supervised learning using Gaussian fields and harmonic functions. The 20th International Conference on Machine Learning (ICML): 912–919.

Download references

Acknowledgments

The work is partially supported by Japan MEXT Kakenhi 18K11434 and Vingroup Innovation Foundation (VINIF) project code VINIF.2019.DA18. We greatly appreciate anonymous reviewers for their constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Canh Hao Nguyen.

Additional information

Canh Hao Nguyen is currently a senior lecturer of Bioinformatics Center, Institute for Chemical Research, Kyoto University, Japan. His research interest is machine learning, with application to biomedical domains.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nguyen, C.H. Structured Learning in Biological Domain. J. Syst. Sci. Syst. Eng. 29, 440–453 (2020). https://doi.org/10.1007/s11518-020-5461-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11518-020-5461-5

Keywords

Navigation