Skip to main content
Log in

Eat, sleep, code, repeat: tips for early-career researchers in computational science

  • Tutorial
  • Published:
The European Physical Journal Plus Aims and scope Submit manuscript

Abstract

This article is intended as a guide for new graduate students entering the field of computational science. With the increasing influx of students with diverse backgrounds joining the ever-popular field, the aim of this short guide is to help students navigate through the various computational techniques that they are likely to encounter during their studies. Here, we cover a broad spectrum of techniques, including Bash scripting, scientific programming, and machine learning, among other fields. This paper is structured into nine sections, each introducing a different computational method. To enhance readability, we have adopted a casual and instructive tone throughout and included relevant code snippets. Please note that due to the introductory nature of this article, it is not intended to be exhaustive; instead, we direct readers to a list of references to expand their knowledge of the techniques discussed within the paper. Finally, readers should note this article serves as an extension to our student-led seminar series, with additional resources and videos available at https://computationaltoolkit.github.io/ for reference.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data availability

Not applicable.

Code availability

Not applicable.

Notes

  1. Conda is a widely used package-management environment that allows users to install specific software packages and dependencies. It facilitates the replication of software environments by creating isolated and self-contained spaces, preventing conflicts between distinct projects.

References

  1. S.J. Clark, M.D. Segall, C.J. Pickard, P.J. Hasnip, M.J. Probert, K. Refson, M.C. Payne, First principles methods using CASTEP. Z. Kristall. 220, 567–570 (2005)

    Article  Google Scholar 

  2. P. Rüßmann, P. Mavropoulos, R. Zeller, J. Bouaziz, M. Dos Santos Dias, S. Blügel, D.S.G. Bauer, P.F. Baumeister, M. Bornemann, S. Brinker, P.H. Dederichs, B.H. Drittler, N. Essing, G. Géranton, N.H. Long, S. Lounis, E. Mendive Tapia, E. Rabel, F. Dos Santos, B. Schweflinghaus, D. Antognini Silva, A.R. Thiess, B. Zimmermann, The JuKKR code (2022). https://doi.org/10.5281/zenodo.7284738

  3. C.D. Woodgate, D. Hedlund, L.H. Lewis, J.B. Staunton, Interplay between magnetism and short-range order in medium- and high-entropy alloys: Crconi, crfeconi, and crmnfeconi. Phys. Rev. Mater. 7, 053801 (2023). https://doi.org/10.1103/PhysRevMaterials.7.053801

    Article  Google Scholar 

  4. R. Chadwick, Linux Tutorial for Beginners - Learn Linux and the Bash Command Line. [Online; accessed 20. Oct. 2023] (2023). https://ryanstutorials.net/linuxtutorial

  5. R. Chadwick, Bash Scripting Tutorial - Ryans Tutorials. [Online; accessed 20. Oct. 2023] (2023). https://ryanstutorials.net/bash-scripting-tutorial

  6. K. Dowd, C.R. Severance, High performance computing, 1st edn. (O’Reilly & Associates, Cambridge, 1998)

    Google Scholar 

  7. W.P. Huhn, B. Lange, V.W.-Z. Yu, M. Yoon, V. Blum, GPU acceleration of all-electron electronic structure theory using localized numeric atom-centered basis functions. Comput. Phys. Commun. 254, 107314 (2020). https://doi.org/10.1016/j.cpc.2020.107314

    Article  MathSciNet  MATH  Google Scholar 

  8. F. Spiga, I. Girotto, phiGEMM: a CPU-GPU library for porting quantum ESPRESSO on hybrid systems. In: 2012 20th Euromicro international conference on parallel, distributed and network-based processing, pp. 368–375 (2012). https://doi.org/10.1109/PDP.2012.72 . ISSN: 2377-5750

  9. L. Vogt, R. Olivares-Amaya, S. Kermes, Y. Shao, C. Amador-Bedolla, A. Aspuru-Guzik, Accelerating resolution-of-the-identity second-order Møller-Plesset quantum chemistry calculations with graphical processing units. J. Phys. Chem. A 112(10), 2049–2057 (2008). https://doi.org/10.1021/jp0776762. (Accessed 2023-06-29)

    Article  Google Scholar 

  10. K. Wilkinson, C.-K. Skylaris, Porting ONETEP to graphical processing unit-based coprocessors. 1. FFT box operations. J. Comp. Chem. 34(28), 2446–2459 (2013). https://doi.org/10.1002/jcc.23410

    Article  Google Scholar 

  11. J. Yan, L. Li, C. O’Grady, Graphics processing unit acceleration of the random phase approximation in the projector augmented wave method. Comput. Phys. Commun. 184(12), 2728–2733 (2013). https://doi.org/10.1016/j.cpc.2013.07.014. (Accessed 2023-06-29)

    Article  ADS  MATH  Google Scholar 

  12. L. Genovese, M. Ospici, T. Deutsch, J.-F. Méhaut, A. Neelov, S. Goedecker, Density functional theory calculation on many-cores hybrid central processing unit-graphic processing unit architectures. J. Chem. Phys. 131(3), 034103 (2009). https://doi.org/10.1063/1.3166140. (Accessed 2023-06-29)

    Article  ADS  Google Scholar 

  13. C. Bishop, Pattern recognition and machine learning. J. Electron. Imaging 16(4), 140–155 (2006). https://doi.org/10.1117/1.2819119

    Article  MATH  Google Scholar 

  14. M.A. Lones, How to avoid machine learning pitfalls: a guide for academic researchers. arXiv (2021) https://doi.org/10.48550/arXiv.2108.02497, arXiv:2108.02497

  15. M. Belyaev, E. Burnaev, Y. Kapushev, Exact inference for gaussian process regression in case of big data with the cartesian product structure. arXiv (2014) https://doi.org/10.48550/arXiv.1403.6573, arXiv:1403.6573

  16. H. Liu, Y.-S. Ong, X. Shen, J. Cai, When gaussian process meets big data: a review of scalable GPs. arXiv (2018) https://doi.org/10.48550/arXiv.1807.01065, arXiv:1807.01065

  17. L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel, V. Niculae, P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. VanderPlas, A. Joly, B. Holt, G. Varoquaux, API design for machine learning software: experiences from the scikit-learn project. In: ECML PKDD workshop: languages for data mining and machine learning, pp. 108–122 (2013)

  18. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, É. Duchesnay, Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  19. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, TensorFlow: large-scale machine learning on heterogeneous systems. Software available from tensorflow.org (2015). https://www.tensorflow.org/

  20. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, PyTorch: An Imperative Style, High-Performance Deep Learning Library. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

  21. J.D. Hunter, Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 3, 90–95 (2007)

    Article  Google Scholar 

  22. C.R. Harris, K. Jarrod Millman, S.J. Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N.J. Smith, R. Kern, M. Picus, S. Hoyer, M.H. Kerkwijk, M. Brett, A. Haldane, J. Río, M. Wiebe, P. Peterson, P. Gérard-Marchant, K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi, C. Gohlke, T.E. Oliphant, Array programming with NumPy. Nature 585, 357–362 (2007)

    Article  ADS  Google Scholar 

  23. P. Virtanen, R. Gommers, T.E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S.J. Walt, M. Brett, J. Wilson, K. Jarod Millman, N. Mayorov, A.R.J. Nelson, E. Jones, R. Kern, R. Larson, C.J. Carey, İ Polat, Y. Feng, E.W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Heriksen, E.A. Quintero, C.R. Harris, A.M. Archibald, A.H. Ribeiro, F. Pedregosa, P. Mulbregt, SciPy 1.0 Contributors: SciPy 1.0: fundamental algorithms for scientific computing in python. Nat. Methods 17, 261–272 (2020)

    Article  Google Scholar 

  24. M.L. Waskom 2021 seaborn: statistical data visualization. J. Open Source Softw. 6(60), 3021 https://doi.org/10.21105/joss.03021

  25. T. Team, pandas-dev/pandas: pandas. Zenodo (2020). https://doi.org/10.5281/zenodo.3509134

  26. T. Kluyver, B. Ragan-Kelley, F. Pérez, B. Granger, M. Bussonnier, J. Frederic, K. Kelley, J. Hamrick, J. Grout, S. Corlay, P. Ivanov, D. Avila, S. Abdalla, C. Willing, Jupyter notebooks – a publishing format for reproducible computational workflows. In: Loizides, F., Schmidt, B. (eds.) Positioning and power in academic publishing: players, Agents and Agendas, pp. 87–90 (2016). IOS Press

  27. Ligo: tutorials. [Online; accessed 13. Nov. 2023] (2023). https://gwosc.org/tutorials

  28. J. Bradbury, R. Frostig, P. Hawkins, M.J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, Q. Zhang, JAX: composable transformations of Python+NumPy programs (2018). http://github.com/google/jax

  29. D. Maclaurin, D. Duvenaud, R.P. Adams, Autograd: effortless gradients in numpy. In: ICML 2015 AutoML Workshop, vol. 238, p. 5 (2015)

  30. C. Pilgrim piecewise-regression (aka segmented regression) in Python. J Open Source Softw, 6(68):3859 (2021) https://doi.org/10.21105/joss.0385

  31. R.D. Peng, Reproducible research in computational science. Science 334(6060), 1226–1227 (2011). https://doi.org/10.1126/science.1213847

    Article  ADS  Google Scholar 

  32. C. Pilgrim, P. Kent, K. Hosseini, E. Chalstrey, Ten simple rules for working with other people’s code. PLoS Comput. Biol. 19(4), 1011031 (2023). https://doi.org/10.1371/journal.pcbi.1011031

    Article  ADS  Google Scholar 

  33. Sphinx: Sphinx. [Online; accessed 6. Nov. 2023] (2023). https://www.sphinx-doc.org/en/master

  34. Doxygen: Doxygen: Doxygen. [Online; accessed 19. Oct. 2023] (2023). https://www.doxygen.nl/index.html

  35. Gitlab: The DevSecOps Platform. [Online; accessed 19. Oct. 2023] (2023). https://about.gitlab.com

  36. Github: Build software better, together. [Online; accessed 19. Oct. 2023] (2023). https://github.com

  37. P.w. Mosh, Git tutorial for beginners: learn Git in 1 hour. Youtube. [Online; accessed 12. Nov. 2023] (2020). https://www.youtube.com/watch?v=8JJ101D3knE

  38. Figshare: figshare - credit for all your research. [Online; accessed 12. Nov. 2023] (2023). https://figshare.com

  39. Zenodo: Zenodo. [Online; accessed 12. Nov. 2023] (2023). https://zenodo.org

  40. C. Schafer, Python tutorial: unit testing your code with the unittest Module. Youtube. [Online; accessed 12. Nov. 2023] (2017). https://www.youtube.com/watch?v=6tNS--WetLI

  41. GeeksforGeeks: principles of software design. GeeksforGeeks. [Online; accessed 12. Nov. 2023] (2022). https://www.geeksforgeeks.org/principles-of-software-design

  42. K. Chris, SOLID Design principles in software development. FreeCodeCamp (2023)

  43. B.W. Boehm, Seven basic principles of software engineering. J. Syst. Softw. 3(1), 3–24 (1983). https://doi.org/10.1016/0164-1212(83)90003-1

    Article  Google Scholar 

  44. Archiveddocs: chapter 16: quality attributes. [Online; accessed 12. Nov. 2023] (2023). https://learn.microsoft.com/en-us/previous-versions/msp-n-p/ee658094(v=pandp.10)?redirectedfrom=MSDN

  45. Molssi: MolSSI’s Best Practices – MolSSI. [Online; accessed 12. Nov. 2023] (2023). https://molssi.org/molssis-best-practices

  46. D. Abbasi, Cutting-edge free tools to unlock the power of computational chemistry - Silico Studio. Silico Studio (2023)

  47. Molssi: MolSSI’s Best Practices – MolSSI. [Online; accessed 19. Oct. 2023] (2023). https://molssi.org/molssis-best-practices

  48. A. Athalye, The missing semester of your CS education. [Online; accessed 19. Oct. 2023] (2023). https://missing.csail.mit.edu

  49. M.D. Learning, MIT deep learning 6.S191. [Online; accessed 19. Oct. 2023] (2023). http://introtodeeplearning.com

  50. T. Chem, Computational Chemistry 0.1 - Introduction. Youtube. [Online; accessed 19. Oct. 2023] (2017). https://www.youtube.com/watch?v=YF-amZgE2h4 &list=PLm8ZSArAXicIWTHEWgHG5mDr8YbrdcN1K

  51. Virtual Simulation Lab. Youtube. [Online; accessed 19. Oct. 2023] (2023). https://www.youtube.com/@VirtualSimulationLab/videos

  52. StatQuest with Josh Starmer. Youtube. [Online; accessed 19. Oct. 2023] (2023). https://www.youtube.com/@statquest

  53. The Computational Toolkit. Youtube. [Online; accessed 19. Oct. 2023] (2023). https://www.youtube.com/@thecomputationaltoolkit2890/videos

  54. J. Cumby, M. Degiacomi, V. Erastova, J. Güven, C. Hobday, A. Mey, H. Pollak, R. Szabla, Course materials for an introduction to data-driven chemistry. J. Open Source Educ. 6(63), 192 (2023) https://doi.org/10.21105/jose.00192

  55. T. French R for data analysis: an open-source resource for teaching and learning analytics with r. J. Open Source Educ. 6(63), 202 (2023) https://doi.org/10.21105/jose.00202

  56. J. Storopoli, R. Huijzer, L. Alonso, Julia Data Science, (2021). https://juliadatascience.io

  57. A.D. White, Deep learning for molecules and materials. Living J. Comput. Molecul. Sci, 3(1), 1499 (2021) https://doi.org/10.33011/livecoms.3.1.1499

  58. G.C. Solomon, J.Z. Zhang, T. Cuk, An open letter to aspiring authors. ACS Phys. Chem. Au 2(2), 68–69 (2022). https://doi.org/10.1021/acsphyschemau.2c00011

    Article  Google Scholar 

  59. Y.-F. Shi, Z.-X. Yang, S. Ma, P.-L. Kang, C. Shang, P. Hu, Z.-P. Liu, Machine learning for chemistry: basics and applications. Engineering (2023). https://doi.org/10.1016/j.eng.2023.04.013

    Article  Google Scholar 

  60. F.A. Rodrigues, Machine learning in physics: a short guide. Europhys. Lett. 144(2), 22001 (2023). https://doi.org/10.1209/0295-5075/ad0575

    Article  ADS  Google Scholar 

Download references

Acknowledgements

I.I., D.M., J.M.T., Z.F., C.M., and C.D.W. acknowledge funding from the Engineering and Physical Sciences Research Council (EPSRC) Centre for Doctoral Training in Modelling of Heterogeneous Systems [EP/S022848/1]. S.C. acknowledges funding from the EPSRC Centre for Doctoral Training in Diamond Science and Technology [EP/L015315/1] and the Research Development Fund of the University of Warwick. C.P. acknowledges funding from the EPSRC Mathematics for Real-World Systems Centre for Doctoral Training [EP/S022244/1]. In addition, we would like to acknowledge the valuable contributions of several other colleagues for their early efforts and input in the computational toolkit seminar series, on which this paper is based, and they are Peter Lewin-Jones, Kyle Fogarty, Lakshmi Shenoy, Matthew Harrison, and Charlotte Rogerson. Finally, we would like to thank both Professors James Kermode and Julie Staunton (University of Warwick) for their time in reading our drafts and offering valuable advice and comments on our manuscript.

Funding

This study was funded by Engineering and Physical Sciences Research Council (EPSRC) [EP/S022848/1].

Author information

Authors and Affiliations

Authors

Contributions

All authors have contributed equally to this manuscript.

Corresponding author

Correspondence to Idil Ismail.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ismail, I., Chaudhuri, S., Morgan, D. et al. Eat, sleep, code, repeat: tips for early-career researchers in computational science. Eur. Phys. J. Plus 138, 1094 (2023). https://doi.org/10.1140/epjp/s13360-023-04732-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1140/epjp/s13360-023-04732-5

Navigation