Skip to main content

Big Data-Driven Materials Science and Its FAIR Data Infrastructure

Abstract

This chapter addresses the fourth paradigm of materials research – big data-driven materials science. Its concepts and state of the art are described, and its challenges and chances are discussed. For furthering the field, open data and an all-embracing sharing, an efficient data infrastructure, and the rich ecosystem of computer codes used in the community are of critical importance. For shaping this fourth paradigm and contributing to the development or discovery of improved and novel materials, data must be what is now called FAIR – Findable, Accessible, Interoperable, and Repurposable/Reusable. This sets the stage for advances of methods from artificial intelligence that operate on large data sets to find trends and patterns that cannot be obtained from individual calculations and not even directly from high-throughput studies. Recent progress is reviewed and demonstrated, and the chapter is concluded by a forward-looking perspective, addressing important not yet solved challenges.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-44677-6_104
  • Chapter length: 25 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   849.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-44677-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Hardcover Book
USD   999.99
Price excludes VAT (USA)
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. 1.

    In technical terms “workflow” refers to the sequence and full description of operations for creating the input file and performing the actual calculations. Important workflow frameworks that allow to automatically steer, analyze, and/or manage electronic structure theory calculation are ASE (atomic simulation environment) (Larsen et al. 2017), Fireworks (Jain et al. 2015), AFLOW (Calderon et al. 2015), and AiiDa (Pizzi et al. 2016).

References

  • AFLOW, Automatic FLOW for materials discovery, http://aflowlib.org/

  • Agrawal A, Choudhary A (2016) Perspective: materials informatics and big data: realization of the “fourth paradigm” of science in materials science. APL Mater 4:053208

    ADS  CrossRef  Google Scholar 

  • Alder BJ, Wainwright TE (1958) Molecular dynamics by electronic computers. In: Prigogine I (ed) International symposium on transport processes in statistical mechanics. Wiley, New York, pp 97–131

    Google Scholar 

  • Alder BJ, Wainwright TE (1962) Phase transition in elastic disks. Phys Rev 127:359–361

    ADS  CrossRef  Google Scholar 

  • Alder BJ, Wainwright TE (1970) Decay of velocity autocorrelation function. Phys Rev A 1:18–21

    ADS  CrossRef  Google Scholar 

  • Atzmueller M (2015) Subgroup discovery. WIREs Data Min Knowl Discov 5:35

    CrossRef  Google Scholar 

  • Bartók AP, Payne MC, Kondor R, Csányi G (2010) Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. Phys Rev Lett 104:136403

    ADS  CrossRef  Google Scholar 

  • Bartók AP, Kondor R, Csányi G (2013) On representing chemical environments. Phys Rev B 87:184115

    ADS  CrossRef  Google Scholar 

  • Blaha P, Schwarz K, Sorantin P, Trickey SB (1990) Full-potential, linearized augmented plane wave programs for crystalline systems. Comp Phys Commun 59:399

    ADS  CrossRef  Google Scholar 

  • Blank TB, Brown SD, Calhoun AW, Doren DJ (1995) Neural network models of potential energy surfaces. J Chem Phys 103:4129

    ADS  CrossRef  Google Scholar 

  • Blum V, Gehrke R, Hanke F, Havu P, Havu V, Ren X, Reuter K, Scheffler M (2009) Ab initio molecular simulations with numeric atom-centered orbitals. Comput Phys Commun 180:2175–2196

    ADS  MATH  CrossRef  Google Scholar 

  • Boley M (2017) Private communications. In the figure, the Gaussian radial basis function (rbf) kernel was used plus a 0.1 noise component: k(a,b)=rbf(a,b | scale=0.2) + 0.1 delta(a,b)

    Google Scholar 

  • Calderon CE, Plata JJ, Toher C, Oses C, Levy O, Fornari M, Natan A, Mehl MJ, Hart G, Nardelli MB, Curtarolo S (2015) The AFLOW standard for high-throughput materials science calculations. Comput Mater Sci 108:233

    CrossRef  Google Scholar 

  • Candès EJ, Wakin MB (2008) An introduction to compressive sampling. IEEE Signal Proc Mag 25:21

    ADS  CrossRef  Google Scholar 

  • Candès EJ, Romberg J, Tao T (2006) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inf Theory 52:489

    MathSciNet  MATH  CrossRef  Google Scholar 

  • Candro EJ, Romberg J, Tao T (2006) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inf Theory 52:489

    MathSciNet  MATH  CrossRef  Google Scholar 

  • Carbogno C, Thygesen KS, Bieniek B, Draxl C, Ghiringhelli LM, Gulans A, Hofmann OT, Jacobsen KW, Lubeck S, Mortensen JJ, Strange M, Wruss E, Scheffler M (2020) Numerical quality control for DFT-based materials databases. Preprint to be published

    Google Scholar 

  • Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321

    MATH  CrossRef  Google Scholar 

  • Curtarolo S, Setyawan W, Hart GLW, Jahnatek M, Chepulskii RV, Taylor RH, Wanga S, Xue J, Yang K, Levy O, Mehl MJ, Stokes HT, Demchenko DO, Morgan D (2012) AFLOW: an automatic framework for high-throughput materials discovery. Comput Mater Sci 58:218

    CrossRef  Google Scholar 

  • Donoho DL (2006) Compressed sensing. IEEETrans InformTheory 52:1289

    MathSciNet  MATH  CrossRef  Google Scholar 

  • Draxl C, Scheffler M (2018) NOMAD: the FAIR concept for big-data-driven materials science. MRS Bull 43:676

    CrossRef  Google Scholar 

  • Draxl C, Scheffler M (2019) The NOMAD laboratory: from data sharing to artificial intelligence. J Phys Mater 2:036001

    Google Scholar 

  • Draxl C, Illas F, Scheffler M (2017) Open data settled in materials theory. Nature 548:523

    ADS  CrossRef  Google Scholar 

  • Duivesteijn W, Feelders AJ, Knobbe A (2016) Exceptional model mining: supervised descriptive local pattern mining with complex target concepts. Data Min Knowl Discov 30:47

    MathSciNet  MATH  CrossRef  Google Scholar 

  • Enkovaara J, Rostgaard MJJ, Chen J, Dułak M, Ferrighi L, Gavnholt J, Glinsvad C, Haikola V, Hansen HA, Kristoffersen HH, Kuisma M, Larsen AH, Lehtovaara L, Ljungberg M, Lopez-Acevedo O, Moses PG, Ojanen J, Olsen T, Petzold V, Romero NA, Stausholm-Møller J, Strange M, Tritsaris GA, Vanin M, Walter M, Hammer B, Häkkinen H, Madsen GKH, Nieminen RM, Nørskov JK, Puska M, Rantala TT, Schiøtz J, Thygesen KS, Jacobsen KW (2010) Electronic structure calculations with GPAW: a real-space implementation of the projector augmented-wave method. J Phys Condens Matter 22:253202

    ADS  CrossRef  Google Scholar 

  • Faber F, Lindmaa A, von Lilienfeld OA, Armiento R (2015) Crystal structure representations for machine learning models of formation energies. Int J Quantum Chem 115:1094

    CrossRef  Google Scholar 

  • Friedman JH, Fisher NI (1999) Bump hunting in high-dimensional data. Statistics and Computing 9:123

    CrossRef  Google Scholar 

  • Garrity KF, Bennett JW, Rabe KM, Vanderbilt D (2014) Pseudopotentials for high-throughput DFT calculations. Comput Mater Sci 81:446–452

    CrossRef  Google Scholar 

  • Ghiringhelli LM, Vybiral J, Levchenko SV, Draxl C, Scheffler M (2015) Big data of material science: critical role of the descriptor. Phys Rev Lett 114:105503

    ADS  CrossRef  Google Scholar 

  • Ghiringhelli LM, Carbogno C, Levchenko S, Mohamed F, Huhs G, Lüder M, Oliveira M, Scheffler M (2016) Towards a common format for computational materials science data. Psi-k Scientific Highlight of the Month No. 131. http://psi-k.net/download/highlights/Highlight_131.pdf

  • Ghiringhelli LM, Carbogno C, Levchenko S, Mohamed F, Hus G, Lüder M, Oliveira M, Scheffler M (2017a) Towards efficient data exchange and sharing for big-data driven materials science: metadata and data formats. npj Comput Mater 3:46

    ADS  CrossRef  Google Scholar 

  • Ghiringhelli LM, Vybiral J, Ahmetcik E, Ouyang R, Levchenko SV, Draxl C, Scheffler M (2017b) Learning physical descriptors for material science by compressed sensing. New J Phys 19:023017

    CrossRef  Google Scholar 

  • Gibson WF (1999) “The Science in Science Fiction” on Talk of the Nation (30 Nov 1999, Timecode 11:55). Available via NPR. https://www.npr.org/2018/10/22/1067220/the-science-in-science-fiction or https://www.npr.org/programs/talk-of-the-nation/1999/11/30/12966633/

  • Goldsmith BR, Boley M, Vreeken J, Scheffler M, Ghiringhelli LM (2017) Uncovering structure-property relationships of materials by subgroup discovery. New J Phys 19:013031

    CrossRef  Google Scholar 

  • Gray J (2007) The concept of a fourth paradigm was probably first discussed by J. Gray at a workshop on January 11, 2007 before he went missing at the Pacific on January 28, 2007. See: Hey T, Tansley S, Tolle K (eds) (2009) The fourth paradigm, data intensive discovery. Microsoft Research, Redmond, Washington 2009, ISBN 978–0–9825442-0-4

    Google Scholar 

  • Gulans A, Kontur S, Meisenbichler C, Nabok D, Pavone P, Rigamonti S, Sagmeister S, Werner U, Draxl C (2014) Exciting: a full-potential all-electron package implementing density-functional theory and many-body perturbation theory. J Phys Condens Matter 26:363202

    CrossRef  Google Scholar 

  • Hansen K, Montavon G, Biegler F, Fazli S, Rupp M, Scheffler M, von Lilienfeld OA, Tkatchenko A, Müller K-K (2013) Assessment and validation of machine learning methods for predicting molecular atomization energies. J Chem Theory Comput 9:3404

    CrossRef  Google Scholar 

  • Hansen K, Biegler F, Ramakrishnan R, Pronobis W, von Lilienfeld OA, Müller K-R, Tkatchenko A (2015) Machine learning predictions of molecular properties: accurate many-body potentials and nonlocality in chemical space. J Phys Chem Lett 6:2326

    CrossRef  Google Scholar 

  • Hedin L (1965) New method for calculating the one-particle Green's function with application to the electron-gas problem. Phys Rev 139:A796

    ADS  CrossRef  Google Scholar 

  • Herrera F, Carmona CJ, González P, del Jesus MJ (2011) An overview on subgroup discovery: foundations and applications. Knowl Inf Syst 29:495

    CrossRef  Google Scholar 

  • Hinton GE (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507

    ADS  MathSciNet  MATH  CrossRef  Google Scholar 

  • Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18:1527

    MathSciNet  MATH  CrossRef  Google Scholar 

  • Hirn M, Poilvert N, Mallat S (2015) Quantum energy regression using scattering transforms. https://arxiv.org/abs/1502.02077

  • Hohenberg P, Kohn W (1964) Inhomogeneous Electron Gas. Phys Rev 136:B864

    ADS  MathSciNet  CrossRef  Google Scholar 

  • Huo H, Rupp M (2017) Unified representation for machine learning of molecules and crystals. https://arxiv.org/abs/1704.06439

  • Jain A, Ong SP, Hautier G, Chen W, Richards WD, Dacek S, Cholia S, Gunter D, Skinner D, Ceder G, Persson KA (2013) The materials project: a materials genome approach to accelerating materials innovation. APL Mater 1:011002

    ADS  CrossRef  Google Scholar 

  • Jain A, Ong SP, Chen W, Medasani B, Qu X, Kocher M, Brafman M, Petretto G, Rignanese GM, Hautier G, Gunter D, Persson KA (2015) FireWorks: a dynamic workflow system designed for high-throughput applications. Concurr Comput: Pract Exper 27:5037–5059

    CrossRef  Google Scholar 

  • Kaggle/Nomad2018 (2018) Predicting transparent conductors – predict the key properties of novel transparent semiconductors https://www.kaggle.com/c/nomad2018-predict-transparent-conductors

  • Klösgen W (1996) Explora: a multipattern and multistrategy discovery assistant. In: Advanced techniques in knowledge discovery and data mining. American Association for Artificial Intelligence, Menlo Park, p 249

    Google Scholar 

  • Kohn W, Sham LJ (1965) Self-consistent equations including exchange and correlation effects. Phys Rev 140:A1133–A1138

    ADS  MathSciNet  CrossRef  Google Scholar 

  • Kresse G, Furthmüller J (1996) Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys Rev B 54:11169

    ADS  CrossRef  Google Scholar 

  • Larsen AH, Mortensen JJ, Blomqvist J, Castelli IE, Christensen R, Dułak M, Friis J, Groves MN, Hammer B, Hargus C, Hermes ED, Jennings PC, Jensen PB, Kermode J, Kitchin JR, Kolsbjerg EL, Kubal J, Kaasbjerg K, Lysgaard S, Maronsson JB, Maxson T, Olsen T, Pastewka L, Peterson A, Rostgaard C, Schiøtz J, Schütt O, Strange M, Thygesen KS, Vegge T, Vilhelmsen L, Walter M, Zeng Z, Jacobsen KW (2017) The atomic simulation environment – a Python library for working with atoms. J Phys Condens Mat 29:273002

    CrossRef  Google Scholar 

  • Lazer D, Kennedy R, King G, Vespignani A (2014) The parable of Google flu: traps in big data analysis. Science 343:1203

    ADS  CrossRef  Google Scholar 

  • Lejaeghere K, Bihlmayer G, Björkamn T, Blaha P, Blügel S, Blum V, Caliste D, Castelli IE, Clark SJ, Corso AD, de Gironcoli S, Deutsch T, Dewhurst JK, Di Marco I, Draxl C, Dulak M, Eriksson O, Flores-Livas JA, Garrity KF, Genovese L, Giannozzi P, Giantomassi M, Goedecker S, Gonze X, Grånäs O, Gross EKU, Gulans A, Gygi F, Hamann DR, Hasnip PJ, Holzwarth NAW, Iuşan D, Jochym DB, Jollet F, Jones D, Kresse G, Koepernik K, Küçükbenli E, Kvashnin YO, Locht ILM, Lubeck S, Marsman M, Marzari N, Nitzsche U, Nordström L, Ozaki T, Paulatto L, Pickard CJ, Poelmans W, Probert MIJ, Refson K, Richter M, Rignanese G-M, Saha S, Scheffler M, Schlipf M, Schwarz K, Sharma S, Tavazza F, Thunström P, Tkatchenko A, Torrent M, Vanderbildt D, van Setten MJ, Speyvroeck VV, Wills JM, Yates JR, Zhang G-X, Cottenier S (2016) Reproducibility in density functional theory calculations of solids. Science 351:aad3000

    CrossRef  Google Scholar 

  • Lorenz S, Groß A, Scheffler M (2004) Representing high-dimensional potential-energy surfaces for reactions at surfaces by neural networks. Chem Phys Lett 395:210

    ADS  CrossRef  Google Scholar 

  • Lorenz S, Scheffler M, Groß A (2006) Descriptions of surface chemical reactions using a neural network representation of the potential-energy surface. Phys Rev B 73:115431

    ADS  CrossRef  Google Scholar 

  • Materials Project. https://materialsproject.org

  • Mazheika A, Wang Y, Ghiringhelli LM, Illas F, Levchenko SV, Scheffler M (2019) Ab initio data analytics study of carbon-dioxide activation on semiconductor oxide surfaces. http://arxiv.org/abs/1912.06515

  • Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21:1087

    ADS  MATH  CrossRef  Google Scholar 

  • Moruzzi VL, Janak JF, Williains AR (1978) Calculated electronic properties of metals. Pergamon, New York

    Google Scholar 

  • Nature editorial (2017) Not-so-open data. Nature 546:327. Empty rhetoric over data sharing slows science https://www.nature.com/news/empty-rhetoric-over-data-sharing-slows-science-1.22133

    Google Scholar 

  • Nelson IJ, Hart GLW, Zhou F, Ozolins V (2013) Compressive sensing as a paradigm for building physics models. Phys Rev B 87:035125

    ADS  CrossRef  Google Scholar 

  • NOMAD (2014) The concept of the NOMAD Repository and Archive (NOMAD) was developed in 2014 (see e.g. the discussion in Ghiringhelli et al. 2016), independently and parallel to the “FAIR Guiding Principles” (Wilkinson et al. 2016). Interestingly, the essence is practically identical. However, the accessibility of data in NOMAD goes further than meant in the FAIR Guiding Principles, as for searching and even downloading data from NOMAD, users don’t even need to register

    Google Scholar 

  • NOMAD, The NOMAD (Novel Materials Discovery) Center of Excellence (CoE) was launched in November 2015. https://nomad-coe.eu, https://youtu.be/yawM2ThVlGw

  • OQMD, Open quantum materials database. http://oqmd.org/

  • Ouyang R, Curtarolo S, Ahmetcik E, Scheffler M, Ghiringhelli LM (2018) SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates. Phys Rev Mat 2:083802

    Google Scholar 

  • Pearl J (2009) Causality: models, reasoning and inference, 2nd edn. Cambridge University Press, New York

    MATH  CrossRef  Google Scholar 

  • Pizzi J, Cepellotti A, Sabatini R, Marzari N, Kozinsky B (2016) AiiDA: automated interactive infrastructure and database for computational science. Comput Mater Sci 111:218–230

    CrossRef  Google Scholar 

  • Pyykkö P (2012) The physics behind chemistry and the periodic table. Chem Rev 112:371–384

    CrossRef  Google Scholar 

  • Rahman A (1964) Correlations in the motion of atoms in liquid argon. Phys Rev 136:A405–A411

    ADS  CrossRef  Google Scholar 

  • Reuter K, Stampfl C, Scheffler M (2005) Ab Initio atomistic thermodynamics and statistical mechanics of surface properties and functions. In: Yip S (ed) Handbook of materials modeling. Springer, Dordrecht, pp 149–194

    CrossRef  Google Scholar 

  • Rupp M, Tkatchenko A, Müller K-R, von Lilienfeld OA (2012) Fast and accurate modeling of molecular atomization energies with machine learning. Phys Rev Lett 108:058301

    ADS  CrossRef  Google Scholar 

  • Saal J, Kirklin S, Aykol M, Meredig B, Wolverton C (2013) Materials design and discovery with high-throughput density functional theory: the open quantum materials database (OQMD). JOM 65:1501

    CrossRef  Google Scholar 

  • Scerri ER (2008) The periodic table: its story and its significance. Oxford University Press, New York. ISBN 978-0-19-530573-9

    Google Scholar 

  • Schütt KT, Glawe H, Brockherde F, Sanna A, Müller K-R, Gross EKU (2014) How to represent crystal structures for machine learning: towards fast prediction of electronic properties. Phys Rev B 89:205118

    ADS  CrossRef  Google Scholar 

  • Seko A, Hayashi H, Nakayama K, Takahashi A, Tanaka I (2017) Representation of compounds for machine-learning prediction of physical properties. Phys Rev B 95:144110

    ADS  CrossRef  Google Scholar 

  • Siebes A (1995) Data surveying foundations of an inductive query language. KDD-95 proceedings. AAAI Press, Montreal, p 269

    Google Scholar 

  • Singh AK, Montoya JH, Gregoire JM, Persson KA (2019) Robust and synthesizable photocatalysts for CO2 reduction: a data-driven materials discovery. Nat Commun 10:443

    ADS  CrossRef  Google Scholar 

  • Slater JC (1937) Wave functions in a periodic potential. Phys Rev 51:846

    ADS  MATH  CrossRef  Google Scholar 

  • Slater JC (1953) An augmented plane wave method for the periodic potential problem. Phys Rev 92:603

    ADS  MATH  CrossRef  Google Scholar 

  • Slater JC (1965) Quantum theory of molecules and solids, Symmetry and energy bands in crystals, vol 2. McGraw-Hill, New York

    MATH  Google Scholar 

  • Slater JC (1967) Quantum theory of molecules and solids, insulators, semiconductors and metals, vol 3. McGraw-Hill, New York

    Google Scholar 

  • Slater JC, Johnson KH (1972) Self-consistent-field Xα cluster method for polyatomic molecules and solids. Phys Rev B 5:844

    ADS  CrossRef  Google Scholar 

  • Sutton C, Ghiringhelli LM, Yamamoto T, Lysogorskiy Y, Blumenthal L, Hammerschmidt T, Golebiowski J, Liu X, Ziletti A, Scheffler M (2019) Crowd-sourcing materials-science challenges with the NOMAD 2018 Kaggle competition. npj Comput Mater 5:1–11. https://doi.org/10.1038/s41524-019-0239-3

  • Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B 58:267

    MathSciNet  MATH  Google Scholar 

  • van Setten MJ, Caruso F, Sharifzadeh S, Ren X, Scheffler M, Liu F, Lischner J, Lin L, Deslippe JR, Louie SG, Yang C, Weigend F, Neaton JB, Evers F, Rinke P (2015) GW100: benchmarking G0W0 for molecular systems. J Chem Theory Comput 11:5665

    Google Scholar 

  • Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJG, Groth P, Goble C, Grethe JS, Heringa J, ‘t Hoen PAC, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone S-A, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Monsal B (2016) The FAIR guiding principles for scientific data management and stewardship. Sci Data 3:160018

    CrossRef  Google Scholar 

  • Wimmer E, Krakauer H, Weinert M, Freeman AJ (1981) Full-potential self-consistent linearized-augmented-plane-wave method for calculating the electronic structure of molecules and surfaces: O2 molecule. Phys Rev B 24:864

    Google Scholar 

  • Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Komorowski J, Zytkow J (eds) Principles of data mining and knowledge discovery: first European symposium, PKDD’97, Trondheim, Norway, 24–27 June 1997. Springer, Berlin, p 78

    Google Scholar 

  • Xie T, Grossman JC (2018) Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. Phys Rev Lett 120:145301

    Google Scholar 

  • Yin MT, Cohen ML (1982) Theory of static structural properties, crystal stability, and phase transformations: application to Si and Ge. Phys Rev B 26:5668

    ADS  CrossRef  Google Scholar 

  • Zhang IY, Logsdail AJ, Ren X, Levchenko SV, Ghiringhelli L, Scheffler M (2019) Test set for materials science and engineering with user-friendly graphic tools for error analysis: systematic benchmark of the numerical and intrinsic errors in state-of-the-art electronic-structure approximations. New J Phys 1:013025

    CrossRef  Google Scholar 

  • Zhang Y, Ling C (2018) A strategy to apply machine learning to small datasets in materials science. npj Comput Mater 4:25

    ADS  CrossRef  Google Scholar 

  • Ziletti A, Kumar D, Scheffler M, Ghiringhelli LM (2018) Insightful classification of crystal structures using deep learning. Nat Commun 9:2775

    ADS  CrossRef  Google Scholar 

Download references

Acknowledgments

We gratefully acknowledge helpful discussions with Luca Ghiringhelli, Mario Boley, and Sergey Levchenko and their critically reading of the manuscript. This work received funding from the European Union’s Horizon 2020 Research and Innovation Programme, Grant Agreement No. 676580, the NOMAD Laboratory CoE and No. 740233, ERC: TEC1P. We thank P. Wittenburg for clarification of the FAIR concept. The work profited from programs and discussions at the Institute for Pure and Applied Mathematics (IPAM) at UCLA, supported by the NFS, and from BIGmax, the Max Planck Society’s Research Network on Big-Data-Driven Materials Science.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Claudia Draxl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this entry

Verify currency and authenticity via CrossMark

Cite this entry

Draxl, C., Scheffler, M. (2020). Big Data-Driven Materials Science and Its FAIR Data Infrastructure. In: Andreoni, W., Yip, S. (eds) Handbook of Materials Modeling. Springer, Cham. https://doi.org/10.1007/978-3-319-44677-6_104

Download citation