1 Introduction

Fracture of structural metals has been a pervasive engineering concern, dating back to the origins of metallurgy itself. There are numerous examples where structural metal failure has altered the course of human history, including notable examples such as the catastrophic failure of Liberty ships in World War II, and the failure of tin coat buttons which some believe halted the advance of Napoleon’s army into Russia in 1812. Modern engineering design against structural fracture is historically attributed to contributions by C. E. Inglis in the 1910s (Inglis 1913), A. A. Griffith in the 1920s (Griffith 1921) and G. R. Irwin in the 1950s (Irwin 1958). Today, most engineering classes on failure of structural materials focus on concepts around linear elastic fracture mechanics (Williams 1957) and elastoplastic fracture mechanics (Rice and Rosengren 1968; Rice 1967). However, courses and textbooks in fracture may foster misconceptions that fracture scenarios are all predictable and can be prevented using LEFM and EPFM tools. This is not the case. There are many realistic engineering circumstances where the fracture community’s collective knowledge-base can only provide “ball-park” estimates for the critical conditions that cause fracture. The purpose of the Sandia Fracture Challenge was to assess the fracture community’s current capabilities for predicting failure of a ductile structural metal. In this assessment, 13 computational teams representing academic, industry, and research labs reported blind predictions for a tearing scenario. While round-robin style computational assessments of ductile fracture have been performed previously, e.g. Bernauer and Brocks (2002), some important features of the present study were (1) the test geometry was heretofore unknown and significantly distinct from most existing test geometries, (2) the modeling teams all reported predictions that were blind to each other’s predictions and to the experimental outcome, (3) the teams were not given any instructions about what modeling approach was to be used, (4) details provided regarding the test geometry and material property data was commensurate with information that may be available in a typical ‘real-world’ engineering scenario, and (5) the teams were given the opportunity to bound their predictions, but were not instructed as to how to do so.

While many of the basic concepts in fracture are now over 50 years old, there has been a continued effort in the development of innovative methods to predict fracture behavior, especially in the numerical methodologies for predicting fracture in complex geometries, loading, and boundary conditions. Meshless computational methods, automated adaptive remeshing algorithms, microstructurally-informed multiscale models, and enriched/extended finite elements are just a few of the recent advances that have been applied to resolve longstanding issues in the computational prediction of fracture. Despite these advances, the evaluation of the true predictive ability of computational methods is lacking. In the early development of a modeling approach, developers usually test the method against certain standards and known cases. However, to evaluate a method’s true predictive ability it is necessary to probe the method beyond the investigator’s knowledge into problems whose outcome is unknown a priori. The approach taken in this work was to invent a never-seen-before scenario and collect blind predictions made without foreknowledge of the experimentally observed outcome. The scenario was the prediction of the crack initiation and propagation of a ductile structural stainless steel (15-5 PH) under quasi-static room temperature test conditions in a test specimen that possessed modest geometric simplicity, but challenging fracture conditions. The specimen geometry chosen for this study had never been studied before, either experimentally or computationally, but possessed some important similarities to a previous scenario involving many non-uniformly arranged interacting holes (Al-Ostaz and Jasiuk 1997; Li et al. 2000). The geometry was mechanically challenging because (1) it contained multiple holes that could potentially deflect the crack and influence the crack-tip stress state, (2) it did not contain a pre-existing sharp crack, (3) it was of a thickness somewhere between plane stress dominance and plane strain dominance, and (4) there was a competition between a tensile-dominated and shear-dominated failure mode. There was also limited standard experimental data provided on which to calibrate material model parameters. Tensile test data and sharp crack Mode-I fracture data were provided, as well as details of the material and even some limited microstructural information. Engineering drawings for all specimens were provided along with nominal tolerances. The experimental and computational results were presented at a special symposium at the ASME 2012 International Mechanical Engineering Congress and Exposition (IMECE) in Houston, TX on November 9-15, 2012. Another meeting was held in Albuquerque, NM on June 18-19, 2013 in order to coordinate the writing of this manuscript.

The outline of this article is given as follows. Section 2 is a review of the 2012 Sandia Fracture Challenge along with a detailed description of the problem. Test setup and results from three testing labs are given in Sect. 3. A brief summary of numerical methods provided by each of the thirteen (13) teams is given in Sect. 4 followed by a comparison of their predictions with the test data in Sect. 5. Finally, in Sect. 6, discussion and assessment of discrepancy between predictions and experiments are provided followed by a summary of the existing technology gap and future research and development efforts needed to enhance the fidelity of our modeling methodologies in ductile fracture. The “Appendix” contains short descriptions of the methods and blind prediction results of each team that participated in the Challenge. Some of the teams have presented a more complete description of their modeling efforts in articles that are included in this special volume of the International Journal of Fracture.

2 The Challenge

2.1 Concept for a challenge scenario

In recent years, Sandia National Laboratories has conducted a series of double-blind assessments of computational predictions in the area of ductile failure of structural alloys (Boyce et al. 2011). Based on these past efforts, it was clear that the double-blind evaluation methodology should be governed by some common constraints. First, this ‘toy problem’ or ‘puzzle’ should have no obvious or closed-form solution. It should be sufficiently distinct from other standard or known test geometries so that the outcome of the exercise is unknown to the participants. The scenario should be readily confirmed through experiments. This implies that the sample geometry is readily manufactured with easily measured geometric features. The manufacturing process should avoid unintentional complications such as significant residual stresses or non-negligible surface damage. The quantities of interest, such as forces and displacements, should be readily measurable with common instrumentation so that the tests can be repeated in numerous labs in a cost effective manner. The experiment should involve simple, uniaxial loading conditions that are readily tested with common lab-scale load frames and common grips. The sample and loading conditions should avoid unwanted modes of deformation such as buckling. Finally, it may be desirable for the challenge scenario to result in a single unambiguous repeatable experimental outcome, or as is the case for the present work, the scenario could be near a juncture of two competing outcomes. Since the challenge scenario involves a novel test geometry, the repeatability of the behavior may not be apparent until after significant experimental effort. In the present work and similar, prior efforts at Sandia, the experiments were not performed until after the computational challenge had been issued. This approach ensured that all participants (including the experimentalists) were not biased by any prior knowledge of the outcome.

2.2 The 2012 Sandia Fracture Challenge scenario

The fracture challenge was advertised to potentially interested parties through a mechanics weblog site, imechanica.org, and through an e-mail solicitation to many known researchers in the fracture community. The fracture challenge was issued via these same electronic formats on May 15, 2012; with final predictions all due on September 15, 2012, four months after the issuance of the challenge. The initial packet of information contained material processing and test data on mechanical properties, the test specimen geometry, the loading conditions, and instructions on how to report the predictions. The degree of detail provided was intended to be commensurate with the level of detail that is typically available in real engineering scenarios in industry. These details regarding the material, test geometry/loading conditions, and quantities of interest are described in the following three subsections.

2.2.1 Material

The alloy of interest was 15-5 PH, a precipitation hardened martensitic stainless steel. This alloy was chosen because it provided a useful representation of a moderately ductile structural alloy that would likely be unfamiliar to the participants. All test specimens were extracted from a single plate purchased from AK Steel (West Chester, Ohio) with a nominal thickness of 3.18 mm. The actual measured thickness was 3.124 mm. The original material certification was provided to the participants, and included the following chemical analysis (in wt%): C 0.04, Mn 0.48, P 0.019, S 0.0005, Si 0.40, Cr 15.21, Ni 4.19, Mo 0.12, Cu 3.39, Nb 0.32, Ta 0.001.

The plate was heat treated at Sandia National Labs with the intention of producing the H1100 heat treatment condition. Detailed furnace thermocouple records were provided to the participants, showing that the plate was heat treated at \(593\,^{\circ }\hbox {C} (1,100\,^{\circ }\hbox {F}\)) for 4 h followed by an inert gas flow cooling rate similar to that of a typical air cool. A detailed machining diagram was provided showing the location and orientation of the challenge test specimens as well as the tensile, compact tension C(T), and metallurgical witness coupons, as shown in Fig. 1.

Fig. 1
figure 1

Layout of challenge specimens as well as tensile coupons, C(T) specimens, and metallurgical witness coupons

The participants were also given detailed metallographic analysis of the microstructure of the martensitic stainless steel, provided by Drs. Yuxiong Mao and Mark Horstemeyer of Mississippi State University. These images show the equiaxed grain shape and 5–20 \(\upmu \hbox {m}\) grain size, occasional inclusions and longitudinal segregation/banding. Examples are shown in Fig. 2.

Fig. 2
figure 2

Examples of (upper) the polished surface showing inclusions indicated by circles, and (lower) an etched surface showing grain structure and banding. Both examples are taken along the longitudinal-short orientation to emphasize the features associated with rolling. Microstructural analysis was provided on all three planes, courtesy of Drs. Yuxiong Mao and Mark Horstemeyer of Mississippi State University

Four tensile coupons were tested, two oriented along the rolling direction and two oriented along the transverse-to-rolling plate direction. All tests were conducted according to ASTM E8 using the nominal geometry shown in Fig. 3. Strain was measured using an extensometer with a 25.4 mm gage length. Engineering stress–strain curves were provided as shown in Fig. 4, as well as the raw force-displacement data for each tensile test. The observed strength values were \(\sim \)8 % higher than is typically reported for the H1100 condition, and were more consistent with an H1075 condition. This discrepancy was noted to the participants. Images of the fracture surface morphology shown in Fig. 5 were also provided to the participants.

Fig. 3
figure 3

Tensile bar geometry used to provide stress–strain data for model calibration. Dimensions are in millimeters. Actual plate thickness was 3.124 mm

Fig. 4
figure 4

Engineering stress–strain curves for four tensile coupons. Longitudinal 1 and Longitudinal 2 refer to those oriented along the rolling direction and Transverse 1 and Transverse 2 refer to those oriented along the transverse-to-rolling direction

Fig. 5
figure 5

Images of the a fracture morphology and b geometry of necking for the Longitudinal 1 tensile sample

Three fracture toughness tests were performed on C(T) specimens (Fig. 6) extracted from the same plate of material used for the challenge tests and the tensile bars; due to insufficient plate thickness, these measurements were not performed under plane strain conditions. Force measurements were made with a load cell and load line displacement measurements were made with a crack opening displacement (COD) gauge inserted on the knife-edge features in the mouth of the C(T) specimens. The load cell capacity was 22.2 kN and the COD gauge had a range of 5.08 mm. The as-machined normalized notch length, taken as the ratio of notch length, a, to specimen width, W, was \(a/W = 0.5\). The specimens were fatigue precracked at a load ratio of \(R=P_\mathrm{min} / P_\mathrm{max}=0.1\) to a typical precrack length of \(a/W \approx 0.6\), with actual measured fatigue precrack lengths reported for each specimen. The observed force versus COD measurements are shown in Fig. 7. This type of data, while not valid for the determination of plane strain toughness, could be used to calibrate model parameters for tearing. The decision on if or how to use all of the material property data was left to the individual participants.

Fig. 6
figure 6

Specimen geometry for C(T) specimens. Dimensions are in millimeters. Actual plate thickness was 3.124 mm

Fig. 7
figure 7

Force versus COD for C(T) tests

2.2.2 Fracture challenge geometry and loading condition

The Fracture Challenge specimen geometry is shown in Fig. 8 with detailed dimensions shown in Fig. 9. The specimen features a blunt notch, labeled ‘A’, with a diameter of 2.54 mm and three holes, labeled ‘B’, ‘C’, and ‘D’. Holes ‘B’ and ‘C’ are of equal diameter (1.78 mm), while hole ‘D’ has a larger diameter (3.05 mm). The holes are located approximately one plate thickness away from the tip of the blunt notch, with the goal of generating three separate potential localization paths.

Fig. 8
figure 8

Fracture challenge specimen geometry: a photograph displaying critical features and b isometric view

Fig. 9
figure 9

Dimensions of fracture challenge specimen geometry in millimeters. The engineering drawings included a machining tolerance of \(\pm \).05 mm on all dimensions. Actual plate thickness was 3.124 mm

Two pin holes were machined well away from the notch tip for insertion of loading pins. These pin holes provided for standard clevis grip loading in either a screw or hydraulic uniaxial load frame. The participants were instructed that the sample would be loaded at a loading rate of 0.0127 mm/s. No other details regarding the boundary conditions were provided. It is important to note that the primary test lab, Sandia’s Structural Mechanical Laboratory, was also provided this same level of detail regarding how the tests should be performed. No additional constraints were placed on the test lab’s decision of how to apply boundary conditions. Any undeclared aspects of loading that were salient to the outcome were considered as sources of potential uncertainty. This limited definition of the boundary conditions bears similarity to real world engineering problems, where the detailed boundary conditions are rarely well defined.

2.2.3 Quantities of interest

A set of quantitative questions were posed to the participants to facilitate comparing the analyses to the experimental results. These questions were meant to evaluate the robustness of the analysis technique in predicting specimen fracture behavior. All challenge participants were issued the following three questions:

  • (Q1) What is the force and COD displacement at which a crack first initiates?

  • (Q2) The starter notch, A, holes B–D, and the backside edge, E are labeled in the drawing. What is the path of crack propagation? i.e. a crack that initiated on the backside and propagated to hole D and then to notch A would be labeled “E–D–A”.

  • (Q3) If the crack does propagate to either holes B, C, or D, at what force and COD displacement does the crack re-initiate out of the first hole?

The crack opening displacement measurement was defined for the participants in the following way: “A Crack opening displacement (COD) gage will be used to monitor load-line displacement at the point of the ‘knife-edge’ features, akin to fracture toughness testing. Only \(\Delta \) COD will be measured (the test will begin with COD\(\,{=}\,\)0 mm)”. Also, the condition of crack initiation was defined for the participants: “For the purposes of this challenge, crack initiation will be defined as a crack \(\ge 100\,\upmu \hbox {m}\) on the sidewall surface of the sample, so as to be witnessed by in-situ microscope”.

All participants were also asked to report their entire predicted force-COD displacement curve. Ultimately, the comparison of this force-displacement curve between experiments and the model predictions was the most instructive quantity of interest.

3 Experimental method and results

A series of experiments were performed to observe the natural failure process for the challenge. Ideally, the experiments would provide an unambiguous, repeatable observation of failure. However, materials are rarely homogeneous, machined geometries always have dimensional variability, boundary conditions rarely mimic our idealized conceptions, and the intrinsic fracture process can be stochastic/chaotic. For these reasons, there is a need to repeat the experimental observation several times. It is also beneficial to repeat the experiments in multiple independent test labs to show the variation of results from one experimental setup to another. In the present work, Sandia’s Structural Mechanics Laboratory was chosen as the primary test lab to perform ten detailed repetitions of nominally identical tests. Two other labs performed a smaller set of experiments, intended to confirm the primary results, or reveal lab-to-lab variation: Sandia’s Materials Mechanics Laboratory and the laboratory of Prof. Ravi-Chandar at the University of Texas at Austin. All three labs utilized specimens machined in one batch from the same plate of material. The remainder of the experimental section contains details from the experiments for each of these three labs, with an emphasis on the core set of ten observations from the Sandia Structural Mechanics Laboratory.

3.1 Observations from the Sandia Structural Mechanics Lab

3.1.1 Test setup and methodology

Fabrication of all specimens occurred from the same lot of material by the same machine shop. In anticipation of the potential influence of small variations in the specimen dimensions on the failure, many measurements were taken of the specimens tested in both the Structural Mechanics Laboratory (specimens D1, D2, and S1–S8) and the UT-Austin laboratory (specimens S9–S11) prior to testing. Figure 10 identifies the locations of each measurement.

Fig. 10
figure 10

Measurement locations (length measurements shown as orange and thickness measurements shown in blue)

The blue circles represent thickness measurements taken using a 0–6.35 mm QuantuMike micrometer with a resolution of 1.27 \(\upmu \)m and an accuracy of \(\pm \)1.27 \(\upmu \)m. The measurement surfaces of the micrometer were circular thus spanning a larger measurement area compared to a point measurement. Ten thickness measurements were taken of each specimen. The orange lines represent length measurements taken with an optical Wild M3Z stereomicroscope with a 0.254-\(\upmu \)m resolution and an accuracy of \(\pm \)0.508 \(\upmu \)m. Twenty vertical and thirteen horizontal length measurements were taken. A zoomed-in view of the features B, C, and D in Fig. 11 illustrates the diametric measures of these holes.

Fig. 11
figure 11

Diameter measurements of features B, C, and D

The measured lengths and thicknesses for specimens D1, D2, and S1–S11 are included as Supplementary Information for this article. Specimens D1, D2, and S1–S8 were tested in the Sandia Structural Mechanics Laboratory. Specimens S9–S11 were tested at UT-Austin. Dimensional measurements revealed that some of the features were not manufactured within the specified tolerance of \(\pm \)50.8 \(\upmu \)m (detailed measurements for each test sample are shown in Supplementary Information). Specifically, the ratio of the vertical distance from Hole D to the notch divided by the horizontal distance from Hole C to the notch was below tolerance for all specimens except specimens D1, S9, and S10. The potential failure paths appear to be affected by the relative ligament lengths represented by this ratio.

All tests were performed at ambient temperature on an MTS servo-hydraulic 97.9-kN (22-kip) load frame at a displacement rate of 12.7 \(\upmu \)m/s, controlled by the MTS FlexTest Controller. The test setup consisted of a simple, well-defined uniaxial load imparted on the test specimens. The test was set up to meet the challenge of measuring force and COD at which the crack first initiated, to determine the crack path, and measure the force and COD if a crack reinitiated out of a hole. Crack initiation was defined as a crack 100 \(\upmu \)m in length on the sidewall surface of the specimen, visible by an in-situ microscope. For the test series, the Epsilon Tech Corp. COD gage (Jackson, WY) was situated on the knife-edges of the specimen and began with a reading of 0 mm. A photograph of the actual experimental setup is shown in Fig. 12.

Fig. 12
figure 12

Experimental test setup in the Structural Mechanics laboratory

Two load cells were connected to the upper, stationary crosshead. One load cell was a 97.9-kN (22-kip) load cell and the second load cell, referred to as an auxiliary load cell, had a rated capacity of 8.9-kN (2-kip). The actuator was located on the lower portion of the frame and moved in a downward direction to apply the required tensile load. The test specimen was attached to two clevis fixtures with round pin holes for metal pins. In turn, the clevises were threaded into the load cell and actuator using threaded adapters. These clevis fixtures were securely mounted to the load train without rotational degrees of freedom. Only the specimens were allowed to rotate through the pin joints. Three displacement measurements were recorded. The first was an internal LVDT monitoring the actuator stroke. The second displacement measurement came from an external \(\pm \)5.08-mm “grip” LVDT positioned between the clevis-pin fixtures, allowing displacement measurements closer to the test article. This LVDT from Macro Sensors (Pennsauken, NJ) was used for control at a rate of 12.7 \(\upmu \)m/s. The grip LVDT was calibrated at time of use with a Boeckeler Digital Micrometer, having \(\pm \)0.508-\(\upmu \)m resolution and repeatability within \(\pm \)0.508 \(\upmu \)m. The Epsilon COD gage was calibrated at the time of use with a Starret Micrometer. The COD gage measured the displacement change in the notch opening, having \(\pm \)0.508-\(\upmu \)m resolution and repeatability within \(\pm \)8.6 \(\upmu \)m.

Two cameras were used to capture visible cracks on the specimen surface, each with a different field of view. A 5-megapixel Point Grey Research (PGR) Grasshopper camera with a Navitar Zoom 6000 lens was used to view one side of the specimen. This zoom lens had a lens resolution of 102 line pairs per mm (lp/mm), and the pixels/\(\upmu \)m ratio ranged from 0.207 to 0.511. Images for this camera were acquired at an approximate rate of 1 Hz. The second camera employed was a Canon EOS Rebel T31 Digital Single Lens Relflex (DSLR) with a macro lens focused on the opposite side of the test specimen. This DSLR camera had a lens resolution of 36 lp/mm, and the pixels/\(\upmu \)m ratio ranged from 0.113 to 0.124. Images for this camera were acquired at an approximate rate of 0.25 Hz. These two cameras were situated perpendicular to the surfaces of the specimens; thus, they could not observe any crack initiation on the through-thickness faces of the features. The cameras were both triggered by the MTS FlexTest Controller, and the MTS FlexTest DAQ system simultaneously collected the time, force, grip LVDT displacement, and COD data corresponding to each image.

To situate all parts within the load train, the specimen was exercised in tension within the elastic region between 89 N and 445 N. Although not shown in Fig. 12, dial indicators were positioned in the test setup to measure the lateral displacement of the upper and lower clevises. The dial indicators measured less than 25 \(\upmu \)m of lateral displacement at maximum load.

Ten specimens were tested, each with one of three specific orientations in the grips. The purpose for the different specimen orientations was to assess if the experimental setup led to a preferential loading path rather than the specimen geometry and material properties alone. From the perspective of the PGR zoom camera with the lower MTS actuator moving down, the three orientations were (1) the notch on the right with hole D above (Specimens D1, D2, S1, S2, S3, and S7), (2) the notch on the right with hole B above (Specimens S4, S5, and S6), and (3) the notch on the left with hole D above (Specimen S8.)

After testing, the force and displacement data was correlated with the image sequences from the two cameras. While the cameras were supposed to be triggered at periodic intervals (every 1 s for PGR, every 4 s for DSLR), post-test analysis revealed that \(\sim \)2 % of the images had not been captured for each camera, presumably due to ineffective triggering. Embedded image timestamps and file timestamps were used to determine the times of the missing images for all DSLR image sequences and for the PGR camera sequences for specimens D2 and S1–S8. The only image sequence without embedded timestamps or useful file timestamps was for the PGR camera for specimen D1; here, visual cues such as camera motion, lighting changes, and large displacements from elastic recovery due to the load drops associated with crack formation were used to correlate the DSLR and PGR camera images in the vicinity of crack events only. This post-test data-image alignment allowed for the comparison of load versus COD profile and the visual observations of the surface cracks.

3.1.2 Test results and observations

Load versus COD profiles

Nine out of the ten specimens tested in the Structural Mechanics Laboratory exhibited crack path of A–D–C–E, while one specimen, labeled D1, exhibited a different crack path of A–C–E. Figure 13 is the load versus COD measurement plot with the post-test images of the ten specimens. The load versus COD curve for D1 has a different profile than the curves for the other nine specimens; specimen D1 had the highest peak load and the most delayed first load drop. The nine specimens with A–D–C–E crack path had similar peak load values and had small variations in load for load drops of each of the cracks, but with significant variation in the COD measurement at the load drops. Specimen D1 broke from A–C directly as opposed to A–D–C for the other specimens, but the overall load drop from A–C, regardless of crack path, is approximately the same for all ten specimens from around 8.0 to 5.3 kN. The cracks from A–D and D–C occurred in quick succession, with more overall total COD for a similar reduction in load as compared to the A–C crack in specimen D1. All ten specimens had a similar load plateau after the crack propagated from either D–C or A–C. The crack from C–E resulted in similar load versus COD profiles below 5.3 kN for all ten specimens. There was no apparent correlation between crack path and specimen orientation or between load versus COD profile and specimen orientation.

Fig. 13
figure 13

a Load versus COD measurement for the ten specimens tested in the Structural Mechanics laboratory with b associated post-test images

Table 1 includes the peak force of each specimen, as well as the force and COD measurements for the load drops in the load versus COD curves associated with each crack. These load drops corresponded to audible cracking sounds and were defined as a slope in the load versus COD profile of a magnitude greater than 17.5 N/\(\upmu \)m for cracks A–C, A–D and D–C and of a magnitude greater than 4.5 N/\(\upmu \)m for the C–E crack, but did not necessarily correspond to the appearance of a crack on the surface of the specimens. The peak load of the A–C–E crack path specimen was largest of all the specimens at 8,746 N. The average peak load for the A–D–C–E crack-path specimens was 8,500 N, ranging from 8,427 to 8,627 N. The first crack from A–C for specimen D1 occurred at a load of 8,066 N and a COD of 3.543 mm; the first crack from A–D occurred at an average load of 8,290 N, ranging from 8,127 to 8,416 N, and average COD measurement of 2.424 mm, ranging from 1.976 to 2.779 mm. The second crack from D–C for nine specimens occurred at an average load of 6,812 N, with a range of 5,589 to 7,359 N, and an average COD measurement of 2.691 mm, with a range of 2.080 to 3.173 mm. The crack between holes C and E from specimen D1 occurred at a COD measurement of 5.217 mm, which is close to the average COD measurement for the other nine specimens of 5.330 mm, (ranging from 4.853 to 5.768 mm), and slightly higher load of 5,128 N as compared to the other nine specimens, averaging 5,013 N (ranging from 4,962 to 5,091 N). The range of the COD measurement for each crack in the A–D–C–E specimens was large; the range of load for each crack was small for A–D and C–E, but large from D–C.

Table 1 Summary of the peak load, load associated with the load drops, and COD measurements associated with the load drops of the ten specimens tested in the structural mechanics laboratory

Visual observations of the crack paths on the specimen surfaces

One part of the Challenge was the prediction of the load and COD measurements at crack initiation of the first and second cracks, defined as a 100-\(\upmu \)m crack on the surface of the specimen. The intention behind this definition was to allow for an unambiguous criterion for crack initiation, not necessarily related to a load drop or unspecified crack length; but, this implicitly assumed that the cracks would initiate and grow as a 2D crack, through the thickness. Unexpectedly, in the experiments, subsurface cracks would initiate at the load drop in the load-COD profile, accompanied by an audible cracking noise, but nearly every crack would not appear on the surface of the specimens until the specimen had opened to an additional COD of \(\sim \)0.2–0.35 mm. The cracks usually appeared on the surface between features, not the feature edges. The camera setup did not allow for imaging of the through-thickness edges, but only the front and back surfaces. Due to the image resolution of the cameras and often shear-dominated crack paths, the cracks on the surface were not deterministically discernible, often appearing as dark regions on the length scale close to 100 \(\upmu \)m and then as a clear crack on larger length scales. Videos of the front and back surface images and corresponding load versus COD profile for the crack path of specimens D1 and S4 are available as Supplementary Information for this article.

Tables 2, 3 and 4 list the force and COD measurements associated with the range of images where cracks greater than 100 \(\upmu \)m on the surface of the specimen were clearly not present to where cracks were plainly visible, including the length of the cracks when they were plainly visible. The tables are separated by the first crack (A–C or A–D), the D–C cracks of nine of the specimens, and the C–E cracks, also listing the load drop data and time, showing that the cracks usually appear on the surface after the load drop. It is important to note that the crack could appear on either surface and did not necessarily appear on both surfaces at the same time, highlighting the three-dimensional and stochastic nature of the crack propagation through the specimen. For D1, the A–C crack on the surface was apparent in the Canon DSLR image immediately following the load drop, though not in the image of the PGR camera after the load drop; hence the large range in Table 2 over which the A–C crack could have appeared on the surface spans the DSLR images around the load drop. For all other nine specimens, the first crack A–D appeared on the surface much later than the load drop. For the D–C and C–E cracks of the A–D–C–E crack path specimens, the appearances of the cracks were after the load drops, and at a smaller force and larger COD measurements than the load drops. For specimen D1, the load drop was within the range of images where the C–E crack may have appeared on the surface. The appearance of the C–E crack in D1 was within the range of force and COD of the C–E crack of the other specimens.

Table 2 Summary of the load and COD measurements associated with the range of images for visual observation of the first surface crack (A–C or A–D) that was greater than \(100\,\upmu \hbox {m}\) for the ten specimens tested in the structural mechanics laboratory
Table 3 Summary of the load and COD measurements associated with the range of images for visual observation of the D–C surface crack that was greater than \(100\,\upmu \hbox {m}\)
Table 4 Summary of the load and COD measurements associated with the range of images for visual observation of the C–E surface crack that was greater than 100 \(\upmu \)m

Figure 14 shows two sequences of images from the DSLR camera and PGR camera for the A–C crack on the back and front of Specimen D1, respectively. In this specimen, the interior crack nucleation event at the load drop (8,066 N, 3.542 mm COD, 294.4 s) led to an immediate surface crack on the back, a crack on the front surface was not clear until several seconds later. For both front and back camera views, the crack did not first emerge at the edge of either notch A or hole C, but rather on the surface in between these two features and then propagated outward towards both features.

Fig. 14
figure 14

Specimen D1 images for crack A–C of back and front surfaces: a larger field of view of the back surface at t \(=\) 291.0 s, before the load drop, with smaller field of view indicated by dashed white box; b inset image of the back surface, before the load drop with no surface crack; c inset image of the back surface, when a 825-\(\upmu \)m crack is clearly visible; d inset image of the back surface when the back surface crack has fully bridged A–C; e same size inset image of the front surface immediately before the load drop; f inset image of the front surface after the load drop without a front surface crack; g inset image of the front surface when a 735-\(\upmu \)m crack is just discernible; and h inset image of the front surface, when the crack fully bridged A–C

In specimen D1, the second cracking event (path C–E) first appeared on the surface sometime between t = 432.6 and 435.7 s, while the load drop (5,128 N, 5.217 mm COD) occurred before this time range at t\(\,{=}\,\)423.5 s. Hence, the surface crack appeared after the load drop, again indicating that a subsurface crack had initated and only later did it propagate to the surface. Similar to the first crack in specimen D1, the crack between C and E appeared on the surface ahead of hole C, not starting at the edges. The crack first propagated towards C on the surface before propagating back towards E. The C–E crack behavior for specimen D1 is typical of all C–E cracks, though the precise timing of the appearance of the crack on the surface relative to the load drop varied, as listed in Table 4.

Figure 15 shows a sequence of images from the PGR camera for the A–D crack in specimen S4, which had a load drop at 8,305 N, 2.497 mm COD, and t\(\,{=}\,\)204.6 s. The visual appearance of the surface crack was more than 27 s and \(\sim \)0.35 mm COD after the main load drop for the subsurface crack. The crack propagated from the area between A and D outwards towards the edges of A and D, along a jagged path. The complete bridging of A–D did not occur until after the second load drop that was associated with the next subsurface crack between holes D and C that occurred at t\(\,{=}\,\)237.0 s.

Fig. 15
figure 15

Specimen S4 Images for Crack A–D: a larger field of view with smaller field of view indicated by dashed white box; b inset image just before load drop; c inset image after load drop without any visible surface crack; d inset image just before the surface crack appears in the dark region in between A and D; e inset image when a 560-\(\upmu \)m crack appeared; and f inset image when the crack fully bridged notch A and hole D

The second surface crack between D and C was also not visually observed until 30 s after the second load drop (5,496 N, 3.292 mm COD, 267.0 s). Also, the crack between D and C appeared on the surface between the holes, not starting at the edges. Crack bridging was evident during propagation between D and C. For the crack from hole C towards the edge of specimen S4 at E, the crack appeared on the back surface at t\(\,{=}\,\)456.4 s after the third load drop (t\(\,{=}\,\)441.0 s), ahead of the edge of C, and then propagated towards C on the surface before propagating back towards E. The crack behavior for specimen S4 was typical of the A–D–C–E crack-path specimens.

Fracture surfaces of the two crack paths

The fracture surfaces of the two observed crack paths are highly three-dimensional without through-thickness uniform flat fracture, but a combination of flat fracture, V-shear fracture, and slant fracture. Figure 16 contains a 3D reconstruction of a set of top-down white-light digital microscope images of the D1 fracture surfaces and surface height profiles of the A–C crack and of the first portion of the C–E crack from a laser scanning microscope. The A–C crack has a flat fracture surface in the middle of that crack; this flat fracture is slightly sloped between notch A and hole C in the overall crack propagation direction. The A–C crack also has the shear lips with approximately 40–55\(^{{\circ }}\) slopes in the y–z plane near the surfaces imaged during in the tests and in the x–z plane at the edge of notch A and at the edge of hole C. The C–E crack has a triangular flat-fracture region just ahead of hole C, and what appears to be a V-shear fracture on either side of the flat fracture; the two sides of the V-shear fracture are angled at an approximately \(45^{{\circ }}\) angle in the y–z plane relative to the flat fracture. The V-shear fracture becomes slightly steeper to \(55^{{\circ }}\) as the crack grows, and then it transitions to a slant fracture further from hole C and has an angle of approximately 40–45\(^{{\circ }}\) in the y–z plane. This behavior in the C–E crack is similar in all of the specimens, except specimen S6, which did not have a transition between the V-shear fracture and the slant fracture, but only the flat to V-shear fracture transition. Figure 17 has an angled view of the crack path in specimen S4 and a direct view of the A–D crack in specimen S4. The A–D and D–C cracks are shear-dominated, but they are not uniform through the thickness. These cracks slant towards the front and back surfaces and are jagged through the thickness at the edges of holes D and C. These fracture surfaces are rather different than the A–C crack in specimen D1, which has prominent shear lips far into the thickness, surrounding the flat fracture.

Fig. 16
figure 16

Fracture surface of specimen D1 with A–C–E crack path from left to right: crack path is three dimensional through the thickness; (Top) complete fracture surface of both halves of the specimen, where hole B is located in the upper half and hole D is located in the lower half of this image (image taken by a Keyence VHX-1000 digital microscope with 3D image stitching); (bottom left) A–C crack surface with height profiles (image constructed from a Zeiss LSM700 laser scanning microscope with 5X objective and 0.5 zoom factor); (bottom right) First portion of the C–E crack surface with height profiles (image constructed from the Zeiss LSM700)

Fig. 17
figure 17

Fracture surface of specimen S4 with A–D–C–E crack path: (larger image) Oblique view of crack path; (inset image) through-thickness view of the A–D crack (image taken by a Keyence VHX-1000 digital microscope with 3D image stitching)

3.2 Confirmation observations from Sandia’s Materials Mechanics Lab

The purpose of a second independent test lab was to confirm the reproducibility of the primary experimental observations from the Structural Mechanics Lab, described in the previous section. For this reason, only three tests were performed, and the focus was on measuring the force-COD response of the challenge specimen to confirm the results of the structural mechanics lab. Both labs were blind of the other labs measurement approach, to avoid bias in methodology.

The Sandia Materials Mechanics lab utilized a 100-kN MTS servo-hydraulic load frame with standard clevis grips and a 22-kN load cell. The COD gage was a 0 to 5.08-mm displacement gage calibrated against a micrometer-based calibrator at the time of use. The most significant difference between the two labs was that the Materials Mechanics lab utilized a universal joint between the upper grip and the load cell to partially compensate for minor misalignments. A single universal joint was deemed sufficient because of the additional rotational degrees of freedom afforded by the clevis pins. However, the Materials Mechanics lab did not utilize extra LVDTs to monitor in-test rotations as had been used by the Structural Mechanics lab.

The core comparison between the primary results of the Structural Mechanics Lab and the confirmation results of the Materials Mechanics lab is shown in Fig. 18. Note that the Materials Mechanics lab selection of a 5.08-mm range COD gage limited observation of the final stages of crack propagation. The load drop associated with crack initiation out of hole C was not captured due to the limitations of the COD gage used by this lab. Otherwise the two labs demonstrated strikingly comparable results. While 9 of the 10 tests from the Structural Mechanics lab failed along path A–D–C–E, 2 of the 3 tests failed in this same manner in the Materials Mechanics lab. The remaining 2 tests (one from each lab) failed along path A–C–E.

Fig. 18
figure 18

Comparison of force-displacement curves measured by the two Sandia mechanical testing labs. The Materials Mechanics lab COD data is truncated at 5 mm due to sensor limitations

3.3 Further observations from the University of Texas

The University of Texas volunteered to perform additional tests that were not blind either to the test results from the Sandia Structural Mechanics Laboratory or to the predictions of all the teams. In fact, this group was motivated by the fact that two different failure paths were observed in the tests thereby implying non-uniqueness of the results. The additional observation that a rigid coupling had been used by the Sandia Structural Mechanics Lab in connecting the specimen to the test frame was used to postulate that there might have been loading imperfections that may result in nonunique response of nominally the same specimens. Therefore experiments were performed on three additional specimens S9–S11 at the University of Texas. These samples were obtained from the same sheet as the remaining specimens that were tested by the two Sandia groups and therefore are nominally the same material, with the same heat-treatment conditions.

The University of Texas experiments utilized a 100-kN Instron electromechanical load frame, with a 100-kN load cell. The crosshead rate was maintained at 12.7 \(\upmu \)m/s, the same rate used by the Sandia Structural Mechanics Laboratory. Two universal joints were placed, one each at the upper and lower grips in order to minimize the effect of loading misalignments. With two joints, the specimen can reorient itself to align with the load with a minimum of loading imperfections. In addition, the clevis holes where the pin connects the specimen to the loading frame were made to have a flat portion in order to permit large rotations that would arise in the pins; this is in accordance with the ASTM guidelines for fracture testing. Instead of using COD gages to measure the displacements of the loading points, a full-field three-dimensional image correlation (3D-DIC) method was used to determine the displacements over the entire specimen. Details of the experimental methods, sensitivity resolution, and results are described by Gross and Ravi-Chandar (2013).

The main comparison between the primary results of the University of Texas results and the results of the Sandia Structural Mechanics Laboratory is shown in Fig. 19, through the load-COD plot. The COD was determined through post-processing of the 3D-DIC data. The load-COD variation falls within the trends identified by the two Sandia groups. Two of the three samples (S09 and S10) failed along the path A–C–E while the third sample (S11) failed along A–D–C–E. Failure occurred abruptly with two audible ‘pops’ for specimen S11 and with an initial audible ‘pop’ and then a somewhat more gradual growth of the crack for specimens S09 and S10. It was also noted that in specimen S11, hole A was significantly misaligned with respect to the flat portion of the notch and made the ligament A–D smaller in this specimen than in the other two. These results suggest that while loading misalignments may be one contributing factor to the crack path selection, geometric imperfections may also play a significant role; these aspects are examined further in Sect. 6.1.3 in the present article, and through additional simulations by Gross and Ravi-Chandar (2013).

Fig. 19
figure 19

Comparison of the load-crack opening displacement curves measured in the University of Texas tests (red lines) with the data obtained from the Sandia Structural Mechanics Laboratory tests (grey lines). The COD in the UT tests was obtained from 3D DIC measurements rather than clip gages

4 Brief team-by-team synopsis of modeling method

The following is a brief overview of the team-by-team modeling approaches; see “Appendix” for more detailed descriptions of each team’s approach and their respective references. Also, several teams contributed optional companion full-length articles within this Special Issue. The majority of teams used finite element methods with the exception of one team using Peridynamics, another using the Reproducing Kernel Particle Method, and one using the Material Point Method. Most also used fully three-dimensional models for the geometry with the exception of one team which used shell elements. The methods were calibrated with either the uniaxial tension test alone or the combination of uniaxial and compact tension tests. All of the teams used plasticity models with various modifications to capture failure.

Team 1 used a standard von Mises plasticity model for metals with user-prescribed hardening as a function of equivalent plastic strain. In addition to conventional plasticity, this model has an empirical tearing parameter for crack initiation and growth. The model was calibrated based on simulations of the uniaxial tension and compact tension experiments.

Team 2 used a plasticity model with scalar damage. A unique feature of this model is that with dependence on the invariants I1, J2, and J3 this model can distinguish between pressure-dominated and shear-dominated failure. Damage rate depends on plastic strain rate and a reference strain which depends on the three stress invariants. This model was calibrated by simulating the uniaxial tension test only.

Team 3 used Hill’s anisotropy for the plasticity model, with power-law hardening and a modified version of the Johnson-Cook strain-to-failure model. When the material failure criterion of equivalent plastic strain reaching a critical level was met, element stiffness was reduced to zero. Two of the three parameters for this model were calibrated with the tensile and compact tension test data. The final parameter requires a measurement of the failure strain at low triaxiality, and since this was not available it was simply estimated based on past experience.

Team 4 used the Reproducing Kernel Particle Method which is a mesh-free method with displacement enrichments for the crack surface and crack tip. A conventional J2 plasticity model was used and calibrated based on the uniaxial tension experiment. The maximum principal tensile strain is used as the crack initiation and propagation criterion.

Team 5 used plasticity with damage based on a classical Gurson–Tvergaard–Needleman (GTN) fracture model. Failure is modeled based on a void nucleation and growth criterion. This model was calibrated using both the uniaxial and compact tension data.

Team 6 developed a two-scale plasticity model, using Multiresolution Continuum Theory, in which the macro-scale is based on a Gurson type yield surface which is coupled to a modified Fleck–Hutchinson model at the micro-scale. The micro-scale considered both plastic and gradient-plastic mechanisms. An intrinsic length scale captures the inhomogeneous deformation between micro-voids. This model was calibrated based on the tensile test data.

Team 7 took three separate approaches using both Abaqus and FRANC3D software: a damage mechanics approach in Abaqus/Explicit, a cohesive zone approach in Abaqus/Standard with the PPR model, and an explicit geometric crack growth approach in FRANC3D. The given tensile (stress–strain), fracture toughness, and necking data were used to calibrate each model’s requisite material parameters to give three separate predictions of crack growth in the challenge specimen.

Team 8 used the Material Point Method instead of a finite element model. A plasticity model was used combined with the evolution of decohesion based on a discontinuous bifurcation analysis. The model parameters were obtained from simulations of the uniaxial and compact tension experiments.

Team 9 did not use finite elements and instead used Non-local Peridynamic Theory. This method naturally enables crack initiation and growth without an external failure criterion and without remeshing. The yield stretch in the plasticity model is calibrated against the tensile test data, and the critical stretch for material failure is calibrated against compact tension test.

Team 10 used an extended finite element (XFEM) method for shell element within Abaqus’ framework (XSHELL). A plane strain core approach has been developed to capture the thickness constraint induced stress triaxility and its effect on the ductile fracture in the vicinity of the crack tip. A mesh independent kinematic description of crack initiation and propagation is accomplished through an elementwise crack insertion with cohesive injection once its accumulative plastic strain reaches a critical value.

Team 11 used a Shear Modified Gurson (SM-G) plasticity model. The model was calibrated with a simulation of the uniaxial tension test and a comparison of the predicted reduction of area on the fracture surface with the experiment.

Team 12 used a von Mises plasticity model with user-prescribed hardening and non-linear elasticity. For one approach, failure was modeled using a cohesive surface model with an exponential potential for mixed mode crack propagation with cohesive surfaces placed along expected crack paths. A second approach used a damage model with damage dependent on the hydrostatic stress.

Team 13 used a von Mises plasticity model with a three-parameter Modified Mohr-Coulomb fracture model. With this failure model the strain to failure is based on stress triaxiality and the normalized Lode angle. Model parameters were calibrated based on the uniaxial tension test only.

5 Comparison of predictions and experiments

5.1 Comparison of Scalar Quantities of Interest and Crack Path

In real-world engineering scenarios, modeling is often used to predict scalar performance metrics such as the maximum allowable service load that a component can support or how far the component can be deformed before it will form a crack. Motivated by this, the challenge scenario specified certain scalar metrics to be reported. The teams were asked to predict the force and COD when a crack first initated, and then when a crack later reinitated from a second feature. The teams were given instructions to report single scalar values for their expected outcome and were also offered the opportunity to bound their predictions with lower and upper limits. This offered teams the possibility of performing uncertainty analyses.

The challenge problem statement specified that a \(100\,\upmu \hbox {m}\) surface crack was the defining characteristic for crack initiation. As described in the Experimental section, the audible crack nucleation event and associated load drop preceded the emergence of a visible surface crack, in some cases by several seconds, suggesting that the crack initiation event occurred entirely subsurface. There was significant quantitative variability in the experimental assessment of the emergence of the visual crack. In hindsight, the load drop would have been a metric that was easier to define and measure. Moreover, the teams may have not had the fidelity to distinguish between the nucleation event and the \(100\,\upmu \hbox {m}\) surface crack. For this reason, the experimental results presented in this section include both types of observations.

Table 5 provides a numerical comparison of the experimentally observed values from the Sandia Solid Mechanics lab to each of the 13 team predictions. The experimental results include 9 observations of path A–D–C–E and a single result for path A–C–E. The single observation from the Sandia Solid Mechanics lab of fracture path A–C–E, occurred for sample D1. Of all manufactured specimens, this particular sample, D1, had actual dimensions closest to the nominal dimensions of the challenge geometry, shown in Fig. 9. In fact, only sample D1 was within the requested \(\pm \)50.8 \(\upmu \hbox {m}\) machining tolerance for the placement of holes C and D relative to notch A. Samples S09 and S10, tested in the UT-Austin lab, were out of specified machining tolerance but had ratios of A–D to A–C ligament lengths closest to the nominal geometry. Samples S09 and S10 also followed crack path A–C–E. The other ten samples followed crack path A–D–C–E. The A–D–C–E crack path selection may be due, at least in part, to geometric deviations from the nominal dimensions. Material variability may also play a role in crack path selection, as well as its obvious role in causing scatter in the forces and displacements required for crack initiation. Based on the current experimental observations, it is not reasonable to eliminate the possibility that some subset of geometries manufactured within machining tolerances may still fail along path A–D–C–E.

Table 5 Comparison of blind predictions to the experimental values observed by the Sandia Solid Mechanics lab

Figure 20 provides a graphical representation of the comparison between computational predictions and the experimentally observed range of force and displacement values. This graph deviates slightly from the numbers reported in Table 5 in that the figure also includes the non-blind experimental data from the UT-Austin lab. The UT-Austin lab provided two additional observations of specimens that failed by the A–C–E crack path. These combined three observations help to set a more realistic range for the experimental scatter associated with the A–C–E crack path. Due to the differences in experimental results regarding the crack path selected, it may be more useful to compare predictions for crack path A–D–C–E to the experimental scatter for samples that followed crack path A–D–C–E, and likewise compare predictions of A–C–E to observations of A–C–E. For this purpose, both the experimental ranges and numerical predictions were color-coded in Fig. 20: red for crack path A–D–C–E and blue for crack path A–C–E.

Fig. 20
figure 20

Comparison of blind predictions to the experimental range of combined observations from the Sandia Solid Mechanics lab and the UT-Austin lab. Red points and lines correspond to observations and predictions of path A–D–C–E, whereas blue points and lines correspond to observations and predictions of path A–C–E. Data points represent the blind predictions for the expected outcome of the challenge and vertical bars represent each teams’ bounds on their predictions. The range of experimentally observed values are bounded by upper and lower horizontal lines

5.2 Comparison of force-COD curves

While the scalar metrics discussed in the previous section may provide the most realistic representation of common engineering problems, the force-displacement curve may provide the most insight into the efficacy of the various modeling approaches. Eachteam was asked to report their best prediction for the force-displacement behavior. The blind predictions for force-displacement behavior are compared to the experimentally observed force-displacement curves in Fig. 21. A detailed discussion comparing predictions to experiments is contained in the Sect. 6.

Fig. 21
figure 21

Comparison of force-COD predictions (colors) to experimental observations (gray lines). The solid gray lines represent the 10 experimental observations from the Sandia Structural Mechanics lab, and the dashed gray lines represent 3 non-blind experimental observations from the UT-Austin group. Path A–D–C–E experimental results are shown in lighter gray and the teams that predicted this path are underlined. Path A–C–E are shown in darker gray lines, and the corresponding team numbers are not underlined

6 Discussion

The goal of the present study was to evaluate ductile fracture prediction methods under pseudo-real-world conditions, replicating the conditions that are typical in an engineering environment. The challenge was open to the public so that a large number of participant teams would help represent the breadth of state-of-the-art capabilities across the mechanics community. As a collective effort, this body of work can be used to draw general conclusions about the fidelity of failure prediction, and the specific topic areas that require further investment.

6.1 Assessing agreement and discrepancy between predictions and experiments

6.1.1 Generic categorization of potential sources of discrepancy

Several sources of uncertainty and variability have been identified and categorized (Kennedy and O’Hagan 2001):

  • Parameter uncertainty—setting an input variable to a value that does not reflect nature.

  • Structural uncertainty/model inadequacy—the form of the governing constitutive equations are inaccurate.

  • Residual variability—additional variability in a natural process that is not captured within the fidelity of the model.

  • Parametric variability—allowing a parameter to ‘float’ due to insufficient knowledge of its true value(s).

  • Experimental uncertainty/observation error—calibration based on experiments that do not correctly reflect nature, or incorrectly represent the desired scenario.

  • Algorithmic/numerical/code uncertainty—improper numerical implementation of algorithms.

Systematic isolation and evaluation of each source of discrepancy is time-consuming and not routinely performed. The teams were each given the opportunity to bound their predictions. Most often, when teams did bound their predictions, they focused on parametric variability. They typically performed a sensitivity analysis on certain parameters that were deemed to be inadequately estimated based on the provided material property information. The modeling approaches taken were largely deterministic: calibration was typically done to average material property behavior, and the observed material property scatter was rarely taken into account. Also, no team systematically varied the dimensions of the geometric specimen features across the allowable machining tolerance ranges in the blind predictions. This sort of dimensional tolerance analysis was only performed after the conclusion of the blind phase of the predictions in an attempt to understand why certain specimens would ‘choose’ a particular crack path.

It is worth noting that the range of modeling methods used by the 13 teams represent differing levels of maturity. For example, the use of a Gurson model (Gurson 1977) within a finite element framework has seen many decades of prior development, and the teams that chose such an approach may benefit from the maturity of the technique and vast available literature from which to draw additional insight that can be brought to bear on the Challenge. In contrast, numerical methods such as Peridynamics have only recently emerged, and the proper application of these techniques to problems in ductile fracture has not been fully explored.

6.1.2 An assessment of the crack path ambiguity

An important ambiguity that arose from this challenge was in the observed crack path. In the experiments performed at three independent laboratories on the same nominal geometry, fabricated from the same sheet, the failure exhibited two different paths: A–C–E and A–D–C–E. There are at least three different approaches that one might adopt in interpreting these experiments prior to embarking on a comparison with the blind predictions. The first approach is purely statistical: nine out of the ten specimens tested as the primary data for this Challenge followed the path A–D–C–E, and therefore, statistically the path A–D–C–E is a higher probability event. In the absence of any additional information, one might be forced to act on such a proposition. However, this does not examine or consider causation; in the present example, additional information is available, both within the experimental results and the underlying theoretical framework within which these experiments were performed and interpreted, that allows additional considerations. The second approach is to take an engineering point of view: both solutions (paths A–C–E and A–D–C–E) were in fact realized in experiments, and could therefore be acceptable engineering solutions to nominally the same problem. If decisions are to be made concerning the reliability of the structure, a conservative approach can be established by using the lower bounds from the measurements for both the load-carrying capacity and the load-line displacement. Such decisions are commonly made in numerous engineering applications. However, they are not predictive since, once again, the underlying causation—why does the failure follow one path or the other—is not understood or examined closely. The third approach, and one that is perhaps the most difficult, but also the most satisfying, is to probe the problem further to determine the underlying reasons for the multiple solutions to the problem. It should also be noted that the distinction between these two paths is important, because the A–D fracture was shear dominated whereas the A–C fracture was tensile dominated. Shear versus tensile fracture is a known difficulty in computational predictions, and a phenomenological topic that has been of recent interest. For this reason, it was important to delve into the crack path ambiguity in more detail.

Nine of the ten specimens tested in the Structural Mechanics Laboratory followed crack path A–D–C–E, with only one specimen following path A–C–E. The load-COD profiles for the nine A–D–C–E crack-path specimens were similar, particularly in the characteristic features of the load drop with incremental COD. The magnitudes of load drop for propagating the crack to hole C were similar, regardless of whether the crack followed path A–D–C or went directly along path A–C, although the crack for path A–C occurred at significantly higher COD values. The conditions for crack re-initiation out of hole C were similar, regardless of whether the crack had followed path A–D–C–E or A–C–E. Two other labs performed these experiments, one blind set performed in the Sandia Materials Mechanics Laboratory before the predictions were returned and one set performed in Ravi-Chandar’s laboratory at the University of Texas at Austin after the predictions had been reported. These two labs only tested a small population of samples (3 each), yet both labs observed samples failing along both crack paths.

Table 6 Select dimensions for specimens tested at Sandia Structural Mechanics laboratory and at UT-Austin: dimension V12-(V9+V10) is the vertical distance between the horizontal edge of the notch and the top of Hole D; dimension H4-(H-C)-H5 is the horizontal distance between the notch tip and the closest edge of Hole C; the dimensional tolerance on the specimen drawing was \(\pm \)0.0508 mm (\(\pm \)0.002 in)

There are three different potential experimental imperfections that are the focus of discussions regarding crack path selection: (1) material inhomogeneities such as the observed banding, (2) load train alignment issues, and (3) specimen geometry deviations off of the nominal dimensions. While each of these could bear relevance, the effect of inhomogeneities has been reduced by using the same sheet of material for all specimens, and by specifying geometric feature sizes that were over an order of magnitude larger than the length scale of the sparsest inhomogeneity (spacing between bands). Tests performed at the different labs with different types of loading arrangements indicated similar trends in the failure paths, implying that the imperfections in the loading boundary condition may not be the primary determinant of path selection. This leaves the third source—geometric imperfections as the main suspected determinant of failure path selection. In this regard, an important quantitative correlation was found between the variations in the measured sample dimensions and the observed crack path. An obvious geometric feature of potential relevance to the crack path was the ligament distance between notch A and hole D; additionally, the ratio of the vertical distance between the notch edge A and hole D to the horizontal distance between the notch tip and hole C may reveal why the crack would prefer a given crack path. Table 6 lists relevant pre-test specimen geometry measurements based on the lengths labeled in Fig. 10, with the dimensions exceeding the prescribed tolerance of \(\pm \)0.0508 mm highlighted. The notch width (V10) is larger than the drawing tolerance for nearly all of the specimens; this led to a smaller vertical ligament distance between the notch edge A and hole D, given by V12-(V9+V10), for the majority of the specimens and for all but one of the specimens with crack path A–D–C–E. The horizontal ligament distance between the notch tip and hole C, given by H4-(H-C)-H5, was within tolerance; thus, hole C was located within tolerance for all of the specimens. The ratio of the vertical ligament between the notch edge A and hole D to the horizontal ligament distance between the notch tip and hole C is supposed to be two-thirds, but most specimens had a smaller ratio. Specimens with crack path A–C–E had a percent error in this ratio from \(-\)1.3 to +1.9 %, while specimens with crack path A–D–C–E had a percent error in this ratio of \(-\)5.4 to \(-\)2.2 %. In other words, the specimens where the ligament between A and D was significantly smaller than specified (relative to the length of the ligament between A and C) tended to fail along A–D–C–E. This exploration of the imperfections appears to indicate a systematic preference for one path to the other depending on the nature of the imperfections, and hence points not to a bifurcation, but to two solutions that are in close proximity.

6.1.3 Overview of agreement between predictions and experiments

As was the intention of this endeavor, the challenge scenario offered a problem in the area of ductile fracture that was not trivial to predict. In spite of the somewhat simplistic geometry, the common loading conditions, and the wealth of material property information provided, there was a wide range of predictions reported. While there was a wide range of experimental observations, there was a much broader band of computational predictions.

Most of the teams had elements of success in their prediction. From the perspective of crack path, all teams correctly predicted one of the two experimentally observed crack paths: A–C–E, or A–D–C–E. Elasticity, yielding, and work hardening regimes were predictable for a majority of the groups. The force-displacement curve in Fig. 21, seemed to show reasonable qualitative agreement for most of the groups, at least through the initial crack initiation load drop. For both crack path A–D–C–E and path A–C–E, nearly all of the teams were able to predict the force for first crack initiation within experimental scatter. Yet only a few teams were able to predict the COD value for first crack initiation. Force was much easier to predict that COD value for two reasons: (1) in the vicinity of first crack initiation, the force-displacement curve was nearly horizontal, and the force value was insensitive to the precise point of crack initiation whereas COD was highly sensitive, (2) there was a wide range of experimentally observed force values: the force value dropped rapidly as a result of the first crack initiation, leading to broad experimental scatter in the force value at which a visual crack was detected.

The second cracking event, either out of hole D for path A–D–C–E, or hole C for path A–C–E, was more difficult to predict quantitatively. Based on Fig. 20, only seven teams were able to predict the force at second crack initiation within the experimental error bounds associated with that predicted crack path. Only one team was able to predict the COD value for second crack initiation within experimental bounds for the predicted crack path.

Did any team get the entire challenge completely correct? While Team 2 was the only team that had predictions of the scalar quantities of interest (QoI’s) that were consistently within the experimental scatter (see Table 5), Team 2 was not able to maintain good agreement with the experimental load-COD curve across all cracking events. Specifically, Team 2 did not predict the broad plateau in load at \(\sim \)5,500 N (COD \(\sim \)4-6 mm) prior to crack initiation from hole C. This plateau was observed to be nearly identical for both experimentally observed crack paths, and Team 3 (who predicted crack path A–C–E) was able to predict the load-COD curve correctly through the end of the plateau in load. However, Team 3 did not predict the COD at which the second crack is initiated within the experimental scatter. Although Team 3 did not perform as well as Team 2 in answering the scalar metrics that are representative of engineering analyses, they predicted the load-COD response within experimental scatter over the widest range of COD. Crack initiation from hole C (as inferred from the final load drop) was difficult for all of the teams to predict. This, after all, was the final significant mechanical event, and the teams had to get all previous elasticity, yielding, work hardening, necking, crack initiation, and crack propagation correct to finally predict the correct loads and displacements at which a crack would emerge from hole C.

The challenge approach presented in this work provides a reasonable benchmark of state-of-the-art in ductile fracture prediction, at least within the capabilities represented by the 13 participant teams. However, the approach is limited in its ability to single out the precise strengths and weaknesses of different approaches. The approach intends to mimic that of a real-world engineering scenario, where the challenge does not isolate specific sources of error. Only a subsequent analysis by the participants can identify the specific elements that caused poor predictivity. Likewise, the approach may be insensitive to certain sources of prediction error that would become problematic in other scenarios. Moreover, the challenge scenario only assesses predictivity within the scope of the challenge problem: quasi-static room temperature deformation and fracture of a structural alloy with moderate ductility. For example, the results of this challenge do not speak to the ability of modeling methods to address problems in the area of dynamic fracture, coupled thermomechanical fracture, environmentally-accelerated fracture, etc.

6.2 Future needs for improving predictivity of computational models in the area of ductile fracture

6.2.1 Constitutive modeling

Computational models are dependent on the material characterization experiments that are used to calibrate the constitutive model(s). While there are many handbooks and databases for material property data, these databases often only include rudimentary property information such as yield strength and ultimate tensile strength. Even full stress–strain data is sometimes difficult to obtain. In some cases, even when stress–stain curves are available, crucial details of the tensile geometry are lacking. Mode-I plane strain fracture toughness data is sometimes available, and to a lesser extent, plane strain J\(_{IC}\) data is available, especially for alloys used in high-reliability structures such as nuclear reactors and aerostructures. However, the extension of sharp-crack plane strain fracture toughness values to realistic engineering structures is not always straightforward, as demonstrated in the current Challenge. For these reasons, computational efforts always require material property experiments. While these experiments are both costly and time consuming, there is no substitute: fracture properties in structural metals can not be obtained from first principles calculations. More efficient methods to gather a sufficient amount of material property information from a minimum number of experiments is needed. What is the minimum number of calibration tests that are needed for a model? Can emerging experimental techniques, such as digital image correlation, provide a richer material property dataset from fewer tests with which to populate model calibration?

One of the difficulties in predicting the Sandia Fracture Challenge was the lack of sufficient material property data to calibrate constitutive models for failure. The Challenge intentionally provided only material property data that would typically be available in a structural analysis for engineering problems. While extensive data was provided for tensile behavior and sharp-crack fracture toughness behavior, many prediction teams would have benefited from more detailed experimental measurements (such as details of 3-dimensional deformation during necking in the uniaxial tensile experiment, and crack extension data from the fracture tests), and more importantly from additional information regarding the shear deformation and shear failure behavior of the material. Currently, the fracture mechanics community lacks a widely-accepted criterion for failure. Moreover, the mechanics community lacks a widely-accepted, standardized test method to evaluate shear deformation and failure. There are several experimental methods that have been proposed for this problem, including the Iosopescu geometry (ASTM D5379), V-notched rail shear (ASTM D7078), the Butterfly geometry (Dunand and Mohr 2011), and punch geometry (ASTM D732). In addition, some sheet forming experiments such as mandrel forming involve extensive shear deformation, but are not loaded in pure shear. While these methods each have utility, the lack of shear material property data likely stems from a lack of standardization for shear test methods. Early modeling methods for ductile failure of metals, such as the Gurson method (Gurson 1977), did not take into account low triaxiality shear failure as a distinct mode of failure.

Another deficiency that some teams revealed was the lack of a material model that captured microstructure and macrostructure of the material. Microstructure includes aspects such as grain size, grain boundary arrangements, precipitate content, crystallographic texture etc., and macrostructure includes aspects such as macroscopic anisotropy (i.e. plate anisotropy), and inhomogeneous banding of the microstructure (stringers of precipitates). While optical micrographs were provided of the grain structure, none of the teams used this information in their modeling method. Multi-scale computational methods that incorporate the effects of microstructure are under development by a number of research groups (Allison et al. 2011; Horstemeyer 2012; McDowell 2010; Emery et al. 2009). However, these models remain largely developmental, in part due to the challenges of mapping measurable properties to model parameters. Explicit representation of microstructure is computationally expensive and data management is cumbersome. Techniques are needed to connect advances in homogenization theory with characterization of micro structural detail in order to develop continuum-scale constitutive and failure models in a rational way.

6.2.2 Failure modeling

The current challenge highlighted the lack of a widely-accepted criterion for the onset of failure (e.g. void nucleation). Some teams merely used critical strain, some teams used a more complex tearing parameter, and yet others used a modified Gurson model. Teams 2 and 3 reported predictions that were among the most successful, yet their failure models differed significantly: Team 2 used a recently developed pressure-dependent damage model that can distinguish shear- and tensile-contributions to failure, whereas Team 3 used equivalent plastic strain. These differences expose a lack of maturity or consensus with regard to failure models. While detailed mechanics models exist for void growth and coalescence, there is little agreement on the micro-scale conditions for void nucleation. Failure is generally thought to initiate at pre-existing defects, voids or inclusions; and by that rationale some of the most widely accepted mechanics models require that materials contain some seed volume fraction of voids or pre-existing inclusions. Early reviews of this topic (Goods and Brown 1979), suggested that void nucleation can occur not only at inclusions or second-phase particles, but also at grain boundaries which serve as sites for dislocation pile-up. High-purity single crystalline metals fail by a void nucleation process that is similar (or identical) to the failure process observed in many ductile metals. Deformation-induced subgrain structure may facilitate the nucleation of voids (Boyce et al. 2012). There is clearly a need for continued investigation regarding the critical conditions that lead to void nucleation, especially in the absence of pre-existing defects or hard particle interfaces. Most likely, emerging models for accurate prediction of void nucleation will need to be multiscale to capture details of the evolving microstructure while also capturing the macroscale boundary conditions.

6.2.3 Computational methods

There was a striking inconsistency in each of the teams approach to uncertainty quantification (UQ). All groups were asked to report not only the expected value for the forces and displacements at fracture, but they were also asked to report lower and upper bounds for these values. Some groups reported only deterministic predictions, and other teams reported large uncertainty bands, even larger than the significant experimental scatter. While UQ is a vibrant research area (Oberkampf and Roy 2010), the current effort demonstrates that UQ is far from mature, at least in the context of ductile fracture prediction. There are several possible explanations from the inconsistency in UQ methods. Most importantly, because this was essentially a volunteer effort on the part of participants, the time needed for detailed UQ analysis was not available. To make UQ a reality, the mechanics community will have to rely not only on improved probabilistic methods, but also on computationally efficient models so that multiple scenarios can be studied in a time- and cost-effective manner. In addition, there is very little guidance or standardization to improve consistency in performing UQ analysis. Moreover, there is similarly little guidance on the appropriate number of calibration experiments needed to quantify material variability.

A difficulty discussed among teams was the challenge of scalability. Ductile fracture is known to be a scale-sensitive problem. For this reason, lab-scale test coupons, such as those used in the present study may not represent fracture behavior in large-scale structure such as ships or buildings. Real world applications span many orders of magnitude in size, from micro- and nano-electronics to civil structures. However, experimental testing and standardization in fracture, such as ASTM E399, focus on lab-scale test specimens, with little guidance to scaling for other engineering scales. Even the Sandia Fracture Challenge geometry itself evaluates fracture modeling only at the lab scale—the material property coupons were of a similar scale to the challenge geometry, and would be relevant to structures where the dimension of critical features in on the scale of a few millimeters. In many large-scale or geometrically complex engineering structures, there is a limit to the number of elements that are computationally practical. The Challenge specimen was small and simple enough that the teams could expend a large number of elements in the features of concern. Team 3 appears to have used the largest number of elements in the prediction: 2 million elements were used to predict the Challenge. Because of this, team 3 was only able to run a small number of simulations to bound their predictions. In many engineering scenarios, the analyst must carefully trade-off computational cost with spatial and geometric accuracy. For example, in some large scale welded structures, even the geometric details of the weld must be ignored or homogenized for computational practicality.

With regard to scale, a pervasive problem in fracture prediction is mesh-size sensitivity and model regularization. It is interesting to note that none of the teams performed a mesh-convergence study on their predictions. Some groups used an extremely fine mesh, and other groups selected a similar mesh size for the material property calibration coupons and the Challenge specimen. The physical mechanisms of fracture possess several intrinsic length scales governing physical phenomena (dislocation core size, grain size, plastic zone size, shear band spacing, etc), but conventional continuum analysis is a scale-invariant approach. This challenge of incorporating length-scale dependence into continuum methods has been studied extensively for many years, e.g. Chen et al. (2000), Gao and Huang (2003), Needleman (2000). However, there is yet to be a consistent method for incorporating length-scale effects in fracture. One technique employed by some prediction teams to mitigate mesh dependency was to calibrate the material property tests with the same mesh scale as was used to predict the Challenge scenario. Other techniques like cohesive zones, peridynamics, and extended finite elements provide alternative methods to manage dissipation by incorporating additional knowledge of an intrinsic scale. It is possible that the regularization technique itself might depend on the type of fracture problem that is being addressed. The difficulties associated with regularization continue to impede predictivity.

6.3 Recommendations for future challenge scenarios

6.3.1 Specific topical areas in deformation and fracture where blind assessment is needed

A single Challenge such as the present study only provides limited insight into the predictivity of ductile failure. The efficacy or deficiency of a particular team’s modeling approach should not be overstated based on this single Challenge effort. Instead, additional Challenges will help illuminate methods that consistently produce the most reasonable predictions. A limitation to the blind fracture challenge as it was issued in the current study, was the difficulty in isolating individual sources of error. Prediction errors can stem from inadequate physics models, poor numerical methods, improper boundary conditions, and several other sources already discussed. A prediction that does not match experimental observations may stem from any one of these errors, and isolation requires subsequent studies. A future challenge could be issued to isolate specific effects, such as mesh dependency.

The Challenge presented in the current work only examines one narrow aspect of failure predictivity. The Challenge represents monotonic tearing failure of millimeter-scale geometric features under slow, quasi-static loading conditions in a plate that is in between plane stress and plane strain conditions, for a material that possesses low work hardening and modest ductility. The current Challenge does not necessarily reflect predicitivity in other scenarios. For example, dynamic crack propagation was not addressed in the current challenge. The instantaneous load drop and audible pop associated with a crack forming between notch A and hole D was at first a dynamic crack propagation event, but the crack became stable even before it had propagated to the visible sidewall surfaces. A different Challenge could be devised to more carefully interrogate the prediction of dynamic crack propagation, and the transition between stable and dynamic propagation. Or, even a switch to a more brittle material with the same challenge geometry might provide a better investigation of dynamic crack propagation prediction. The current study focused only on the general material class of ductile metals, but fracture prediction is a challenge for other material classes as well (composites, hierarchical materials, foams and porous materials, metallic glasses, graded materials, etc.) and those other material classes could benefit from a similar blind assessment.

There are several other conditions that would be interesting to evaluate using such a blind assessment technique such as, (1) high temperature or low temperature fracture crossing a ductile-to-brittle transition, (2) dynamic fracture, such as under quasi-adiabatic conditions, impact loading, etc., (3) fracture of a microstructurally-sensitive or intentionally defected material such as aluminum alloys where failure initiates at precipitate phase boundaries, (4) fracture prediction of a complex large-scale structure where lab-scale material properties must be extrapolated to the length scale of the challenge structure, such as emulating prediction of fracture in an airplane wing.

In addition, a future challenge could intentionally explore methods for uncertainty quantification. Such a challenge would require statistical details provided regarding the variability in a manufacturing process or observed variability in material.

6.3.2 Guidance for execution of a future challenge

The present work revealed several pitfalls in the execution of a blind assessment in the area of solid mechanics. These ‘lessons learned’ can help foster better Challenge exercises in the future. For example, there was a significant issue raised by the manufacture of specimens that were not only deviating from the nominal sample dimension, but also deviated slightly beyond the allowable manufacturing tolerances. While none of the groups used the dimensional tolerances in their initial blind predictions, the subsequent analysis of experimental data and discrepancies with computational predictions revealed that the out-of-tolerance specimens could have emphasized a crack path solution that was not the same as the crack path associated with nominal dimensions. In the future, the actual test articles could be manufactured before the beginning of the challenge and the as-measured dimensions of the actual articles could be provided to the prediction teams. This would be especially useful for an exercise evaluating uncertainty quantification methodologies. In addition, a second deficiency in the current exercise was the selection of a scalar metric for prediction that was not easily measured. Specifically, the initial blind challenge called on the teams to predict the onset of crack initiation as defined by a 100-\(\upmu \hbox {m}\) visible surface crack. This was not only difficult to observe experimentally, but the conditions were vastly different from those of the initial crack formation event, which was better evidenced by an audible signal and a distinct load drop. If more tests had been performed prior to the issuance of the Challenge, then the ambiguities raised by this problem may have been avoided. The additional advantage to running a series of experiments before the issuance of a challenge is that the repeatability of the measurements can be confirmed. In the present experiments, the observation of the two crack paths would have been discovered before the onset of the Challenge, and might have led to a modification of the test geometry so that only one crack path was preferred. A concern of performing the experiments before the issuance of the challenge lies in keeping the results confidential so that the prediction teams are blind and unbiased.

The current blind assessment methodology did not constrain the analysts to a particular method. Instead, the participant teams could utilize whatever methodologies within their capabilities that they deemed appropriate. In this way, the current Challenge replicated a pseudo-real-world engineering problem. However, in this approach, the isolation of specific sources of discrepancy between model and experiments can be difficult. There are many areas where discrepancies can arise, such as improper calibration to material properties, improper constitutive models, or numerical algorithm computational errors, to name a few. A sensitivity analysis performed after the comparison to experiments can help to isolate the most important sources of error.

The material property data that was provided to the teams is an important factor in their predictive success. The current challenge attempted to replicate pseudo-real-world engineering conditions by providing material property data that was commensurate with typical engineering scenarios, based on ASTM standard test methods. In many real engineering scenarios, even less material information is available. For example, in the current Challenge multiple full tensile engineering stress–strain curves provided for both the longitudinal and transverse plate directions, as well as the dimensions of the tensile bar, and shape of the post-fracture neck, were provided for ease of calibration to a material constitutive model. Often, in real engineering scenarios, details such as the tensile dimensions are not available, and this lack of information can cause problems in constitutive calibration. A future challenge could even explore how engineers and analysts approach problems with less material information or different material information, and how they use this lack of information to drive uncertainty quantification.

The results of this Challenge also bring out some pertinent questions related to the development of the Quantities of Interest (QoI’s). The selection of appropriate QoI’s is extremely important in interpreting the experiments and simulations. Typical engineering practice is to reduce the complexity of the problem by reducing the results of experiments and simulations to a few scalar metrics with which higher level decisions could be made. In this spirit, the current challenge posed a few scalar QoI’s: the load and COD at the onset of the first and second failure as well as the path of the crack. Difficulties associated with identifying the onset of failure and ambiguities in crack path selection have already been discussed. These difficulties suggest that while the QoI’s may be set a priori, one must be aware of their potential limitations and have in place procedures for generating alternate QoI’s a posteriori. In this regard, it might be useful to introduce conditional QoI’s; for example, in the present challenge, considering that there are four distinct phases in the response—elastic, elastic-plastic, localization, and failure—conditional QoI’s that present ‘go–no-go’ decision points along the response could be postulated:

  • First, was the elastic stiffness of the structure captured correctly by the model?

  • If yes, then, was the prediction of the stable plastic response up to the limit load within acceptable range of the experiments?

  • If yes, then, was the onset of any localization predicted correctly?

  • Finally, were the original QoI’s based on the onset of first and second failure predicted correctly?

The reason for positing the ‘go – no-go’ decision points is that there are some aspects of modeling that are well-established and that any new model that is unable to capture the more elementary features of the response may not provide reliable predictions of more complicated, and less well-established features of the response. The advantage of this procedure is that while simple scalar measures could be used at higher level decision-making, the validation of the scalar QoI’s must pass through a much greater detailed assessment at a lower-level.

One final aspect of the comparison of the experiments and predictions involves quantitative measures of comparison. In the present study, a rudimentary statistical comparison is made by comparing the range of the upper and lower bound predictions to the scatter in the experimentally measured response. More sophisticated measures based on Bayesian statistics have been developed in recent years to handle verification, validation and uncertainty quantification. Future challenges should consider implementing such measures to perform quantitative comparison.

7 Summary and conclusions

Sandia proposed a double blind fracture challenge to the international engineering community and thirteen teams submitted blind computational results, representing contributions from 22 institutions. The intent was to assess the predictive accuracy of current methods. It is clear that this blind assessment effort has helped make each of the modeling teams more acutely aware of some of the weaknesses of their methods. Many of these weaknesses are discussed in detail in the Appendices, and as a result of the present effort, many of the teams are working to address these weaknesses. One surprising source of error that became apparent through an honest evaluation of the capabilities was ‘operator’ error, such as misinterpreting the desired prediction quantities, misreporting the results, or making dubious assumptions. It appears that these mistakes can overwhelm any predictivity (or errors in numerics/physics/co) that may be present in the models. Even transcription errors can present real hurdles to reporting accurate predictions. These ‘simple’ mistakes are often quickly discounted after the fact. Yet they can have a quantitatively large effect on blind predictivity.

One common theme that appears to affect all of the modeling methods is the availability of calibration data on the particular alloy of interest. The current effort was restricted to readily available data, which included tensile and fracture test data. All of the methods would benefit from more extensive calibration data beyond traditional material property tests. For example, a suite of test geometries spanning different degrees of stress concentrations, stress state, mode mixity, post-necking behavior, etc. could be useful to calibrate models prior to using them on an ‘unknown’ problem. There already appear to be early discussions regarding the development of such a test suite. Nevertheless, it is important to remember that reliance on such a test suite would mean that each alloy of interest would require extensive experimental evaluation prior to modeling. At a time when high-throughput experimentation, data management, data mining, and exascale computing are becoming commonplace in other fields, it is important that the structural failure prediction community also develops new approaches to take advantage of these emerging capabilities.

In addition, there were several other known but unresolved issues in fracture prediction that were highlighted by the Challenge exercise: (1) in this specific Challenge, geometric uncertainties were shown to have a huge impact on crack path predictions, (2) mesh convergent methods remain an open issue, (3) effects of microstructure may be important but were not included by any of the teams, (4) improved physical descriptions of fracture are necessary to reduce dependence on empirical material testing, (5) there is a trade-off or balance that is necessary between physical realism and computational efficiency. This list only highlights those issues in fracture prediction that were brought to light by this particular Sandia Fracture Challenge. There are other known gaps in failure prediction, such as the need for microstructurally-informed models, which were not elucidated by the present exercise, but may be exposed by future challenges.