Introduction

In their seminal work, Hey et al.1 identify four distinct paradigms that represent the historical evolution of science and technology: (1) empirical science, (2) theoretical-model science, (3) computational science (simulations), and (4) the emerging paradigm of data-driven science. A recent perspective from Agrawal and Choudhary2 points out that this sequence of paradigms appears throughout the materials science community specifically and corresponds to advancements over time in the ability to observe, interpret, and represent material behavior. In the first two paradigms, the representation of physical or mechanical behavior is typically of low dimension and is limited to those behaviors with relatively simple governing physics. Prime examples include elasticity and yield in structural metals, where the governing physics and the corresponding salient features of the material structure have been identified to a high level of confidence. More complicated governing physics typically require numerical approaches to predict physical phenomena with reasonable confidence. This spurred the evolution of the third paradigm of computational science. Even with improvements through this third paradigm, the space of the governing physics (including all model forms and parameters needed to connect the material structure to phenomena of interest) that needs to be explored systematically can be quite large. Furthermore, the available experimental data for model validation are rather small in terms of number of distinct test cases, yet can be rich in terms of data dimensionality for a given test case. Thus, the fourth paradigm of data-driven science is simultaneously motivated and enabled by the relatively recent and unprecedented access to multidimensional and multimodal data derived from sophisticated experiments, simulations, and combinations thereof. New ways to mine, leverage, and manage these data are now required, and addressing this requirement is foundational to the Materials Genome Initiative.3

As with other areas of materials science and engineering, the area of fatigue of materials has closely followed the four paradigms outlined above. The first paradigm was based strictly on empirical approaches and included, for example, the development of the Basquin4 and Coffin-Manson5,6 formulas to describe, respectively, the stress-life (S-N) or strain-life (\(\varepsilon \)-N) response, as well as the Paris Law7,8 to relate crack-growth rate to cyclic stress-intensity factor. Such approaches are limited in their predictive power and provide little insight into the mechanisms governing cyclic deformation, crack nucleation, and crack propagation. The second paradigm was marked by the derivation of theoretical fatigue models, which included, for example, closed-form energy-based models relating crack initiation to persistent slip bands.9 Such theoretical models provide more mechanistic insight but remain limited in their range of applicability. The third paradigm has been characterized by the development of computational models to replicate or predict fatigue behavior. Included in the third paradigm are simulations based on dislocation dynamics and crystal plasticity. The third paradigm has provided a much higher-fidelity representation and improved understanding of fatigue phenomena. However, the ability to fully predict the evolution of fatigue failure by accounting for microstructure-dependent scatter in the very early stages of fatigue-crack evolution (including incubation, nucleation, and microstructurally small propagation) remains a challenging and unresolved task. This task is further complicated by the amount and type of data that are typically produced by current computational simulations and advanced experimental measurements, which can prove unwieldy to manage, process, and analyze.

This JOM special topic focuses specifically on the emerging fourth paradigm as it pertains to the study and prediction of fatigue behavior. With this introductory article and the papers that follow, we hope to provide a platform to highlight both the challenges and opportunities associated with the analysis of large and/or complex data sets and how researchers in the fatigue community are working to address these challenges through data-driven approaches. In the following subsections, we present more details on the emerging paradigm of data-driven science and present cases where such science has facilitated materials discovery, enhanced or accelerated characterization of materials, and led to new mechanistic understanding. We then describe opportunities for leveraging data-driven approaches in the fatigue community, and we close by offering recommendations moving forward.

Paradigm of Data-Driven Materials Science

In general, data-driven approaches aim to identify objectively (relying largely on the available data) the embedded correlations among selected inputs and outputs needed to study or model a given phenomenon. Data-driven approaches are particularly advantageous when a full, physics-based understanding of relevant phenomena is lacking, and also when gathering new experimental data is particularly slow and costly.10 The relatively recent fourth paradigm of data-driven materials science has been or is being realized in various areas of materials science. Efforts in this paradigm are mainly focused on extracting high-value information from all available materials data (generated by either experiments or computations) and expressing it as high-value linkages among material processes, structures, and properties. This is especially well suited for practical materials-design explorations. The central impediment in this effort arises from the lack of a rigorous mathematical framework for quantifying the material structure, whose salient features span multiple material length scales (from the atomistic to the macroscale). The very large number of parameters needed to fully capture all details of the hierarchical material structure make the structure representation inherently high dimensional. However, from a practical viewpoint, it is essential to be able to identify high-value, low-dimensional representations of the material structure that can be employed reliably to drive the material innovation efforts leading to enhanced properties. One type of data-driven method that has already made inroads across many subfields of materials research is that of machine learning (ML).11,12 Below, we briefly discuss examples of well-established use cases for ML that have been fruitful in areas of materials science.

Materials Discovery

ML-based models can aid materials discovery by predicting materials properties of interest. The models may be trained directly on experimental data13,14 or act as very fast surrogates for more expensive physics-based simulation such as density functional theory15. ML has successfully guided the experimental discovery of novel Heusler alloys,16 Ni-based superalloys,17 and shape-memory alloys,18 among many other application areas.

Data-driven experimental planning, sometimes called sequential learning or active learning, has emerged as a key application for ML in materials science. In such an approach, we cast materials discovery as a fundamentally iterative process and address the question of optimally planning a series of experiments to deliver high-performing materials. ML-driven experiments tend to identify promising materials much more efficiently than a naive search.19 Several implementations of ML-optimized experimental design have emerged, including COMBO,20 FUELS,19 and work from Xue et al.18

Enhanced or Accelerated Characterization

Given the abundance of spectroscopic and image-based materials characterization techniques, researchers in materials informatics have found it fruitful to apply state-of-the-art computer vision techniques21 to materials problems. DeCost and Holm22 used convolutional neural networks (CNNs) to predict properties of materials samples based on their microstructures. Ziatdinov et al.23 automatically identified defects in atomic-scale microscopy data with deep learning. Xu and LeBeau24 employed CNNs to dramatically accelerate the process of analyzing electron beam diffraction patterns. Across these applications, the primary benefits of applying ML to characterization data include (1) extracting more scientific insight from each experiment and/or (2) processing experimental data much more efficiently, with less human time-intensive input.

New Mechanistic Understanding

The materials informatics community has begun to recognize the scientific limitations of black-box ML models and increasingly is exploring questions of model interpretability. In an ideal scenario, ML would help scientists generate improved hypotheses and would assist in the identification of underlying mechanisms of materials phenomena. Recent work has addressed physical interpretability of ML in the context of micrograph classification,25 graph representations of crystal structure,26 and molecular structures of promising organic solar materials.27 Learning physics with the assistance of data-driven methods is a rapidly emerging area of research that is likely to substantially increase interest in these methods as it will reduce the dissonance between the traditional scientific method—which seeks greater understanding—and the perceived black-box nature of ML.

A Role for Data-Driven Science in the Fatigue Community

Indeed, the fatigue community has already made progress through this fourth paradigm,28,29,30,31 and many opportunities exist to continue to support or leverage data-driven approaches in the pursuit of better understanding and prediction of fatigue behavior. For example, implementation of image-processing algorithms are needed to mine and parameterize—automatically and robustly—the full-field data derived from state-of-the-art experiments or simulations. Opportunities also exist to develop new methods of integrating rich experimental data sets with high-fidelity modeling to enable correlative studies or prediction of fatigue behavior. Similarly, data-driven algorithms are needed to link length and time scales, to correlate multimodal data sets, to systematically identify patterns, and ultimately to predict fatigue behavior. Such algorithms might be applications of ML, which could either enable discovery of fatigue mechanisms or serve as surrogate models to represent the physical mechanisms of fatigue. In the context of fatigue models, data-driven approaches can play an important role in addressing several problems. Specific examples might include (1) rank-ordering the combinations of materials parameters (related to chemistry, microstructure, and processing-history) that exhibit strong correlations to the fatigue life and (2) rank-ordering alternate specifications of the governing physics (including model forms and associated model parameters) based on how they are supported by the available experimental observations. Problems of this type are indeed best answered by data-driven approaches as they are capable of accounting rigorously for the uncertainty associated with the data sets (these can include both experimental and modeling data sets). In other words, these methods fully recognize that one cannot provide deterministic answers to the exemplary problems identified above within the constraints of the available limited data, but one can provide the answers in a much more practically useful probabilistic framework. Therefore, the potential benefits and value of employing the data-driven approaches in addressing challenging problems in fatigue are clearly self-evident. However, much work still needs to be done in learning how to best employ the existing toolsets with any needed modifications to facilitate their efficient application.

The following papers presented in this issue of JOM provide cutting-edge examples of how researchers are currently pursuing data-driven approaches in fatigue applications. To download any of the papers, follow the url (https://link.springer.com/journal/11837/70/7) to the table of contents page for the July 2018 issue (vol. 70, no. 7).

  • “A data-analytics approach for discovering unique microstructural configurations susceptible to fatigue” by S. K. Jha, R. A. Brockman, R. M. Hoffman, V. Sinha, A. L. Pilchak, W. J. Porter, D. J. Buchanan, J. M. Larsen, and R. John.

  • “Data-driven mechanistic modeling of microstructural influence on the high cycle fatigue life of nickel titanium” by O. L. Kafka, C. Yu, M. Shakoor, Z. Liu, G. J. Wagner, and W. K. Liu.

  • “Data-driven correlation analysis between observed 3D fatigue-crack path and computed fields from high-fidelity, crystal-plasticity, finite-element simulations” by K. D. Pierson, J. D. Hochhalter, and A. D. Spear.

  • “Data-science analysis of the macro-scale features governing the corrosion to crack transition in AA7050-T7451” by N. E. C. Co, D. E. Brown, and J. T. Burns.

  • “Visualization and quantitative analysis of crack-tip plastic zone in pure nickel” by R. D. Kelton, J. Fathi, E. Meletis, and H. Huang.

  • “Fatigue damage assessment leveraging nondestructive evaluation data” by K. Mazur, B. J. Wisner, and A. Kontsos.

Recommendations to Foster Data-Driven Fatigue Modeling

In closing, the authors would like to offer three concrete recommendations that would help to facilitate the adoption and accountability of data-driven methodologies among the fatigue community. First, the community should create several publicly accessible benchmark data sets, which could be used to test and directly compare models derived from data-driven approaches. An example of such a published materials data set is the UltraHigh Carbon Steel Micrograph DataBase (UHCSDB).32 Second, the community should organize blind-prediction competitions to assess data-driven fatigue models, analogous to the Sandia Fracture Challenge33,34 or the long-running organic crystal structure prediction blind tests.35 Finally, the community should adopt data platforms such as Citrination,36 Materials Commons,37 and Materials Data Facility,38 as these systems greatly facilitate data and model sharing, reproducibility of results, and reuse of code.