Abstract
This article is intended as a guide for new graduate students entering the field of computational science. With the increasing influx of students with diverse backgrounds joining the ever-popular field, the aim of this short guide is to help students navigate through the various computational techniques that they are likely to encounter during their studies. Here, we cover a broad spectrum of techniques, including Bash scripting, scientific programming, and machine learning, among other fields. This paper is structured into nine sections, each introducing a different computational method. To enhance readability, we have adopted a casual and instructive tone throughout and included relevant code snippets. Please note that due to the introductory nature of this article, it is not intended to be exhaustive; instead, we direct readers to a list of references to expand their knowledge of the techniques discussed within the paper. Finally, readers should note this article serves as an extension to our student-led seminar series, with additional resources and videos available at https://computationaltoolkit.github.io/ for reference.
Graphical abstract
Similar content being viewed by others
Data availability
Not applicable.
Code availability
Not applicable.
Notes
Conda is a widely used package-management environment that allows users to install specific software packages and dependencies. It facilitates the replication of software environments by creating isolated and self-contained spaces, preventing conflicts between distinct projects.
References
S.J. Clark, M.D. Segall, C.J. Pickard, P.J. Hasnip, M.J. Probert, K. Refson, M.C. Payne, First principles methods using CASTEP. Z. Kristall. 220, 567–570 (2005)
P. Rüßmann, P. Mavropoulos, R. Zeller, J. Bouaziz, M. Dos Santos Dias, S. Blügel, D.S.G. Bauer, P.F. Baumeister, M. Bornemann, S. Brinker, P.H. Dederichs, B.H. Drittler, N. Essing, G. Géranton, N.H. Long, S. Lounis, E. Mendive Tapia, E. Rabel, F. Dos Santos, B. Schweflinghaus, D. Antognini Silva, A.R. Thiess, B. Zimmermann, The JuKKR code (2022). https://doi.org/10.5281/zenodo.7284738
C.D. Woodgate, D. Hedlund, L.H. Lewis, J.B. Staunton, Interplay between magnetism and short-range order in medium- and high-entropy alloys: Crconi, crfeconi, and crmnfeconi. Phys. Rev. Mater. 7, 053801 (2023). https://doi.org/10.1103/PhysRevMaterials.7.053801
R. Chadwick, Linux Tutorial for Beginners - Learn Linux and the Bash Command Line. [Online; accessed 20. Oct. 2023] (2023). https://ryanstutorials.net/linuxtutorial
R. Chadwick, Bash Scripting Tutorial - Ryans Tutorials. [Online; accessed 20. Oct. 2023] (2023). https://ryanstutorials.net/bash-scripting-tutorial
K. Dowd, C.R. Severance, High performance computing, 1st edn. (O’Reilly & Associates, Cambridge, 1998)
W.P. Huhn, B. Lange, V.W.-Z. Yu, M. Yoon, V. Blum, GPU acceleration of all-electron electronic structure theory using localized numeric atom-centered basis functions. Comput. Phys. Commun. 254, 107314 (2020). https://doi.org/10.1016/j.cpc.2020.107314
F. Spiga, I. Girotto, phiGEMM: a CPU-GPU library for porting quantum ESPRESSO on hybrid systems. In: 2012 20th Euromicro international conference on parallel, distributed and network-based processing, pp. 368–375 (2012). https://doi.org/10.1109/PDP.2012.72 . ISSN: 2377-5750
L. Vogt, R. Olivares-Amaya, S. Kermes, Y. Shao, C. Amador-Bedolla, A. Aspuru-Guzik, Accelerating resolution-of-the-identity second-order Møller-Plesset quantum chemistry calculations with graphical processing units. J. Phys. Chem. A 112(10), 2049–2057 (2008). https://doi.org/10.1021/jp0776762. (Accessed 2023-06-29)
K. Wilkinson, C.-K. Skylaris, Porting ONETEP to graphical processing unit-based coprocessors. 1. FFT box operations. J. Comp. Chem. 34(28), 2446–2459 (2013). https://doi.org/10.1002/jcc.23410
J. Yan, L. Li, C. O’Grady, Graphics processing unit acceleration of the random phase approximation in the projector augmented wave method. Comput. Phys. Commun. 184(12), 2728–2733 (2013). https://doi.org/10.1016/j.cpc.2013.07.014. (Accessed 2023-06-29)
L. Genovese, M. Ospici, T. Deutsch, J.-F. Méhaut, A. Neelov, S. Goedecker, Density functional theory calculation on many-cores hybrid central processing unit-graphic processing unit architectures. J. Chem. Phys. 131(3), 034103 (2009). https://doi.org/10.1063/1.3166140. (Accessed 2023-06-29)
C. Bishop, Pattern recognition and machine learning. J. Electron. Imaging 16(4), 140–155 (2006). https://doi.org/10.1117/1.2819119
M.A. Lones, How to avoid machine learning pitfalls: a guide for academic researchers. arXiv (2021) https://doi.org/10.48550/arXiv.2108.02497, arXiv:2108.02497
M. Belyaev, E. Burnaev, Y. Kapushev, Exact inference for gaussian process regression in case of big data with the cartesian product structure. arXiv (2014) https://doi.org/10.48550/arXiv.1403.6573, arXiv:1403.6573
H. Liu, Y.-S. Ong, X. Shen, J. Cai, When gaussian process meets big data: a review of scalable GPs. arXiv (2018) https://doi.org/10.48550/arXiv.1807.01065, arXiv:1807.01065
L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel, V. Niculae, P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. VanderPlas, A. Joly, B. Holt, G. Varoquaux, API design for machine learning software: experiences from the scikit-learn project. In: ECML PKDD workshop: languages for data mining and machine learning, pp. 108–122 (2013)
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, É. Duchesnay, Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, TensorFlow: large-scale machine learning on heterogeneous systems. Software available from tensorflow.org (2015). https://www.tensorflow.org/
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, PyTorch: An Imperative Style, High-Performance Deep Learning Library. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
J.D. Hunter, Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 3, 90–95 (2007)
C.R. Harris, K. Jarrod Millman, S.J. Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N.J. Smith, R. Kern, M. Picus, S. Hoyer, M.H. Kerkwijk, M. Brett, A. Haldane, J. Río, M. Wiebe, P. Peterson, P. Gérard-Marchant, K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi, C. Gohlke, T.E. Oliphant, Array programming with NumPy. Nature 585, 357–362 (2007)
P. Virtanen, R. Gommers, T.E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S.J. Walt, M. Brett, J. Wilson, K. Jarod Millman, N. Mayorov, A.R.J. Nelson, E. Jones, R. Kern, R. Larson, C.J. Carey, İ Polat, Y. Feng, E.W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Heriksen, E.A. Quintero, C.R. Harris, A.M. Archibald, A.H. Ribeiro, F. Pedregosa, P. Mulbregt, SciPy 1.0 Contributors: SciPy 1.0: fundamental algorithms for scientific computing in python. Nat. Methods 17, 261–272 (2020)
M.L. Waskom 2021 seaborn: statistical data visualization. J. Open Source Softw. 6(60), 3021 https://doi.org/10.21105/joss.03021
T. Team, pandas-dev/pandas: pandas. Zenodo (2020). https://doi.org/10.5281/zenodo.3509134
T. Kluyver, B. Ragan-Kelley, F. Pérez, B. Granger, M. Bussonnier, J. Frederic, K. Kelley, J. Hamrick, J. Grout, S. Corlay, P. Ivanov, D. Avila, S. Abdalla, C. Willing, Jupyter notebooks – a publishing format for reproducible computational workflows. In: Loizides, F., Schmidt, B. (eds.) Positioning and power in academic publishing: players, Agents and Agendas, pp. 87–90 (2016). IOS Press
Ligo: tutorials. [Online; accessed 13. Nov. 2023] (2023). https://gwosc.org/tutorials
J. Bradbury, R. Frostig, P. Hawkins, M.J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, Q. Zhang, JAX: composable transformations of Python+NumPy programs (2018). http://github.com/google/jax
D. Maclaurin, D. Duvenaud, R.P. Adams, Autograd: effortless gradients in numpy. In: ICML 2015 AutoML Workshop, vol. 238, p. 5 (2015)
C. Pilgrim piecewise-regression (aka segmented regression) in Python. J Open Source Softw, 6(68):3859 (2021) https://doi.org/10.21105/joss.0385
R.D. Peng, Reproducible research in computational science. Science 334(6060), 1226–1227 (2011). https://doi.org/10.1126/science.1213847
C. Pilgrim, P. Kent, K. Hosseini, E. Chalstrey, Ten simple rules for working with other people’s code. PLoS Comput. Biol. 19(4), 1011031 (2023). https://doi.org/10.1371/journal.pcbi.1011031
Sphinx: Sphinx. [Online; accessed 6. Nov. 2023] (2023). https://www.sphinx-doc.org/en/master
Doxygen: Doxygen: Doxygen. [Online; accessed 19. Oct. 2023] (2023). https://www.doxygen.nl/index.html
Gitlab: The DevSecOps Platform. [Online; accessed 19. Oct. 2023] (2023). https://about.gitlab.com
Github: Build software better, together. [Online; accessed 19. Oct. 2023] (2023). https://github.com
P.w. Mosh, Git tutorial for beginners: learn Git in 1 hour. Youtube. [Online; accessed 12. Nov. 2023] (2020). https://www.youtube.com/watch?v=8JJ101D3knE
Figshare: figshare - credit for all your research. [Online; accessed 12. Nov. 2023] (2023). https://figshare.com
Zenodo: Zenodo. [Online; accessed 12. Nov. 2023] (2023). https://zenodo.org
C. Schafer, Python tutorial: unit testing your code with the unittest Module. Youtube. [Online; accessed 12. Nov. 2023] (2017). https://www.youtube.com/watch?v=6tNS--WetLI
GeeksforGeeks: principles of software design. GeeksforGeeks. [Online; accessed 12. Nov. 2023] (2022). https://www.geeksforgeeks.org/principles-of-software-design
K. Chris, SOLID Design principles in software development. FreeCodeCamp (2023)
B.W. Boehm, Seven basic principles of software engineering. J. Syst. Softw. 3(1), 3–24 (1983). https://doi.org/10.1016/0164-1212(83)90003-1
Archiveddocs: chapter 16: quality attributes. [Online; accessed 12. Nov. 2023] (2023). https://learn.microsoft.com/en-us/previous-versions/msp-n-p/ee658094(v=pandp.10)?redirectedfrom=MSDN
Molssi: MolSSI’s Best Practices – MolSSI. [Online; accessed 12. Nov. 2023] (2023). https://molssi.org/molssis-best-practices
D. Abbasi, Cutting-edge free tools to unlock the power of computational chemistry - Silico Studio. Silico Studio (2023)
Molssi: MolSSI’s Best Practices – MolSSI. [Online; accessed 19. Oct. 2023] (2023). https://molssi.org/molssis-best-practices
A. Athalye, The missing semester of your CS education. [Online; accessed 19. Oct. 2023] (2023). https://missing.csail.mit.edu
M.D. Learning, MIT deep learning 6.S191. [Online; accessed 19. Oct. 2023] (2023). http://introtodeeplearning.com
T. Chem, Computational Chemistry 0.1 - Introduction. Youtube. [Online; accessed 19. Oct. 2023] (2017). https://www.youtube.com/watch?v=YF-amZgE2h4 &list=PLm8ZSArAXicIWTHEWgHG5mDr8YbrdcN1K
Virtual Simulation Lab. Youtube. [Online; accessed 19. Oct. 2023] (2023). https://www.youtube.com/@VirtualSimulationLab/videos
StatQuest with Josh Starmer. Youtube. [Online; accessed 19. Oct. 2023] (2023). https://www.youtube.com/@statquest
The Computational Toolkit. Youtube. [Online; accessed 19. Oct. 2023] (2023). https://www.youtube.com/@thecomputationaltoolkit2890/videos
J. Cumby, M. Degiacomi, V. Erastova, J. Güven, C. Hobday, A. Mey, H. Pollak, R. Szabla, Course materials for an introduction to data-driven chemistry. J. Open Source Educ. 6(63), 192 (2023) https://doi.org/10.21105/jose.00192
T. French R for data analysis: an open-source resource for teaching and learning analytics with r. J. Open Source Educ. 6(63), 202 (2023) https://doi.org/10.21105/jose.00202
J. Storopoli, R. Huijzer, L. Alonso, Julia Data Science, (2021). https://juliadatascience.io
A.D. White, Deep learning for molecules and materials. Living J. Comput. Molecul. Sci, 3(1), 1499 (2021) https://doi.org/10.33011/livecoms.3.1.1499
G.C. Solomon, J.Z. Zhang, T. Cuk, An open letter to aspiring authors. ACS Phys. Chem. Au 2(2), 68–69 (2022). https://doi.org/10.1021/acsphyschemau.2c00011
Y.-F. Shi, Z.-X. Yang, S. Ma, P.-L. Kang, C. Shang, P. Hu, Z.-P. Liu, Machine learning for chemistry: basics and applications. Engineering (2023). https://doi.org/10.1016/j.eng.2023.04.013
F.A. Rodrigues, Machine learning in physics: a short guide. Europhys. Lett. 144(2), 22001 (2023). https://doi.org/10.1209/0295-5075/ad0575
Acknowledgements
I.I., D.M., J.M.T., Z.F., C.M., and C.D.W. acknowledge funding from the Engineering and Physical Sciences Research Council (EPSRC) Centre for Doctoral Training in Modelling of Heterogeneous Systems [EP/S022848/1]. S.C. acknowledges funding from the EPSRC Centre for Doctoral Training in Diamond Science and Technology [EP/L015315/1] and the Research Development Fund of the University of Warwick. C.P. acknowledges funding from the EPSRC Mathematics for Real-World Systems Centre for Doctoral Training [EP/S022244/1]. In addition, we would like to acknowledge the valuable contributions of several other colleagues for their early efforts and input in the computational toolkit seminar series, on which this paper is based, and they are Peter Lewin-Jones, Kyle Fogarty, Lakshmi Shenoy, Matthew Harrison, and Charlotte Rogerson. Finally, we would like to thank both Professors James Kermode and Julie Staunton (University of Warwick) for their time in reading our drafts and offering valuable advice and comments on our manuscript.
Funding
This study was funded by Engineering and Physical Sciences Research Council (EPSRC) [EP/S022848/1].
Author information
Authors and Affiliations
Contributions
All authors have contributed equally to this manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ismail, I., Chaudhuri, S., Morgan, D. et al. Eat, sleep, code, repeat: tips for early-career researchers in computational science. Eur. Phys. J. Plus 138, 1094 (2023). https://doi.org/10.1140/epjp/s13360-023-04732-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1140/epjp/s13360-023-04732-5