Skip to main content
Log in

Setup of a scientific computing environment for computational biology: Simulation of a genome-scale metabolic model of Escherichia coli as an example

  • Protocol
  • Published:
Journal of Microbiology Aims and scope Submit manuscript

Abstract

Computational analysis of biological data is becoming increasingly important, especially in this era of big data. Computational analysis of biological data allows efficiently deriving biological insights for given data, and sometimes even counterintuitive ones that may challenge the existing knowledge. Among experimental researchers without any prior exposure to computer programming, computational analysis of biological data has often been considered to be a task reserved for computational biologists. However, thanks to the increasing availability of user-friendly computational resources, experimental researchers can now easily access computational resources, including a scientific computing environment and packages necessary for data analysis. In this regard, we here describe the process of accessing Jupyter Notebook, the most popular Python coding environment, to conduct computational biology. Python is currently a mainstream programming language for biology and biotechnology. In particular, Anaconda and Google Colaboratory are introduced as two representative options to easily launch Jupyter Notebook. Finally, a Python package COBRApy is demonstrated as an example to simulate 1) specific growth rate of Escherichia coli as well as compounds consumed or generated under a minimal medium with glucose as a sole carbon source, and 2) theoretical production yield of succinic acid, an industrially important chemical, using E. coli. This protocol should serve as a guide for further extended computational analyses of biological data for experimental researchers without computational background.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation ({OSDI} 16), pp. 265–283. USENIX Assocaion.

  • Cardoso, J.G.R., Jensen, K., Lieven, C., Laerke Hansen, A.S., Galkina, S., Beber, M., Zdemir, E., Herrgrd, M.J., Redestig, H., and Sonnenschein, N. 2018. Cameo: A Python library for computer aided metabolic engineering and optimization of cell factories. ACS Synth. Biol.7, 1163–1166.

    Article  CAS  Google Scholar 

  • Choi, H.S., Lee, S.Y., Kim, T.Y., and Woo, H.M. 2010. In silico identification of gene amplification targets for improvement of lycopene production. Appl. Environ. Microbiol.76, 3097–3105.

    Article  CAS  Google Scholar 

  • Cock, P.J., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., et al. 2009. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics25, 1422–1423.

    Article  CAS  Google Scholar 

  • Ebrahim, A., Lerman, J.A., Palsson, B.O., and Hyduke, D.R. 2013. COBRApy: constraints-based reconstruction and analysis for Python. BMC Syst. Biol.7, 74.

    Article  Google Scholar 

  • Gu, C., Kim, G.B., Kim, W.J., Kim, H.U., and Lee, S.Y. 2019. Current status and applications of genome-scale metabolic models. Genome Biol.20, 121.

    Article  Google Scholar 

  • Hunter, J.D. 2007. Matplotlib: A 2D graphics environment. Comput. Sci. Eng.9, 90–95.

    Article  Google Scholar 

  • Kim, H.U., Kim, T.Y., and Lee, S.Y. 2008. Metabolic flux analysis and metabolic engineering of microorganisms. Mol. Biosyst.4, 113–120.

    Article  Google Scholar 

  • King, Z.A., Lu, J., Drger, A., Miller, P., Federowicz, S., Lerman, J.A., Ebrahim, A., Palsson, B.O., and Lewis, N.E. 2016. BiGG Models: A platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Res.44, D515–D522.

    Article  CAS  Google Scholar 

  • Mariano, D., Martins, P., Helene Santos, L., and de Melo-Minardi, R.C. 2019. Introducing programming skills for life science students. Biochem. Mol. Biol. Educ.47, 288–295.

    Article  CAS  Google Scholar 

  • McKinney, W. 2010. Data structures for statistical computing in Python. Proc. of the 9th Python in Science Conf. (SCIPY 2010). pp. 51–56.

  • Monk, J.M., Lloyd, C.J., Brunk, E., Mih, N., Sastry, A., King, Z., Takeuchi, R., Nomura, W., Zhang, Z., Mori, H., et al. 2017. iML-1515, a knowledgebase that computes Escherichia coli traits. Nat. Biotechnol.35, 904–908.

    Article  CAS  Google Scholar 

  • Nagpal, A. and Gabrani, G. 2019. Python for data analytics, scientific and technical applications. In 2019 Amity International Conference on Artificial Intelligence (AICAI), pp. 140–145. Dubai, United Arab Emirates.

  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. 2011. Scikit-learn: machine learning in Python. J. Mach. Learn. Res.12, 2825–2830.

    Google Scholar 

  • Perkel, J.M. 2018. Why Jupyter is data scientists’ computational notebook of choice. Nature563, 145–146.

    Article  CAS  Google Scholar 

  • Rule, A., Birmingham, A., Zuiga, C., Altintas, I., Huang, S.C., Knight, R., Moshiri, N., Nguyen, M., Rosenthal, S., Prez, F., et al. 2018. Ten simple rules for reproducible research in Jupyter notebooks. ArXivabs/1810.08055.

  • Ryu, J.Y., Kim, H.U., and Lee, S.Y. 2019. Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers. Proc. Natl. Acad. Sci. USA116, 13996–14001.

    Article  CAS  Google Scholar 

  • Sukumaran, J. and Holder, M.T. 2010. DendroPy: a Python library for phylogenetic computing. Bioinformatics26, 1569–1571.

    Article  CAS  Google Scholar 

  • Thiele, I. and Palsson, B.Ø. 2010. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat. Protoc.5, 93–121.

    Article  CAS  Google Scholar 

  • van der Walt, S., Colbert, S., and Varoquaux, G. 2011. The NumPy array: A structure for efficient numerical computation. Comput. Sci. Eng.13, 22–30.

    Article  Google Scholar 

Download references

Acknowledgments

We thank Mohammad Rifqi Ghiffary and Komal for their kind review of the manuscript. This work was supported by the Bio-Synergy Research Project (NRF-2018M3A9C4076475) of the Ministry of Science and ICT through the National Research Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hyun Uk Kim.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jeon, J., Kim, H.U. Setup of a scientific computing environment for computational biology: Simulation of a genome-scale metabolic model of Escherichia coli as an example. J Microbiol. 58, 227–234 (2020). https://doi.org/10.1007/s12275-020-9516-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12275-020-9516-6

Keywords

Navigation