This study shows how different FE brain models and kinematic-based metrics rank and rate a large number of bicycle helmets. Seventeen different helmets available on the Swedish market (2015) were ranked and rated based on three oblique impacts that produce rotations about the three anatomical axes of the head. The results from the eight different brain models, with multiple outputs from some models, and eight different kinematic-based metrics showed that the choice of metrics could influence the ranking and rating as well as the linear correlation. Comparing the ranking using Kendall’s tau showed a high correlation (above 0.8) for 30 of the 55 model-to-model comparisons (Table 3). Pearson’s r2 showed a correlation above 0.8 for 45 of the 55 model-to-model comparisons. Thus, there was generally a good correlation between different models using the bicycle helmet oblique impact dataset. It is important to note that the lower correlation between models is not necessarily a measure of the quality or accuracy of individual models. To be able to evaluate the quality and accuracy of the models further specification concerning quality measures is required as well as further validation, good experiments to validate the models against and an objective evaluation of the validation. In fact, it is an ongoing effort to understand how best to validate a model and rate its quality, as discussed recently.26,54,56
The rating of the helmets from 1- to 4-star were broadly similar. Some helmets had a difference in rating, depending on the choice of brain model. However, two helmets had larger differences. Helmets I and N were ranked with 1-star for some brain models, and either 2- or 3-star for some other brain models. For helmet N, most of the models placed it among the highest-ranked 1-star helmets or among the lowest-ranked 2-star helmets. The peak metrics for the different models were rather close, so differences in star rating were more or less dependent on the percentile boundary values that define each rating group with two exceptions: the UCDTBM and IC models, both rated the helmet as a 3-star helmet. The rating presented in Figure 5 was based on the average value of the three loading conditions. As can be seen in the supplementary materials, the difference for these two models compared to the other models was mainly due to the difference in the ranking of the helmets for the Xrot loading condition and especially for Zrot loading condition. For Yrot, the ranking position for helmet N was almost the same for all models. For helmet I, the difference was mainly for the Zrot loading conditions. Zrot was also the direction that had the smallest coefficient of variation, which could influence the fact that lower correlation was seen for this loading condition. Also, helmet D had some more variation in rating, particularly with the UCDTBM. The UCDTBM ranked the helmet with 2-star, whereas the other brain models ranked the helmet as a 4-star helmet. The ranking of the UCDTBM differed mostly for the Xrot loading condition (10–13 positions difference) compared to Yrot (2–4 positions) and Zrot (6–7 positions).
It is not clear what factors related to the construction of one brain model determine its difference from the others. IC, KTH, PIPER and WHIM use the same material model and model parameters derived by Kleiven,33 which account for tension-compression asymmetry. Still, the linear correlation and correlation of ranking were not always highest between these models. Other models such as GHBMC, SIMon, THUMS, and UCDTBM have used a linear viscoelastic material model for the brain tissue, which does not capture the full nonlinear response observed in some tissue studies.17,39 However, differences in the material models and properties do not seen to be a major factor affecting the correlation of model responses and the correlation of ranking. For instance, the KTH (which uses a nonlinear material model for the brain) and GHBMC (which uses a linear model for the brain) show high correlation in ranking compared to models with either linear or nonlinear material models. When comparing the UCDTBM to the other models in the context of the number of elements and nodes, brain volume, material properties, etc. there is no significant difference compared to other models. The UCDTBM is in the medium range of the selection of models when it comes to the number of nodes and elements. Abaqus is used to solve the UCDTBM, nonetheless, the same software is also used for WHIM. It should be mentioned that the IC and UCDTBM models were the only models that showed a higher correlation to the PAA compared to PAV, which could make a difference in ranking.
Besides material properties, brain element mesh density, mesh quality, element formulation, solver, and hourglass control could all significantly affect strain predictions. Earlier studies have shown models with finer mesh would lead to large brain strains when other modeling parameter are the same.27,53 Similarly, the variations in mesh density among the eight brain models may contribute to the difference in strain predictions. However, it is difficult to isolate and quantify the effect that the differences in numerical approaches has on the correlations between models and their rankings, because these factors are often interactive. Future studies may investigate the interactive effects of key modelling choices on predictions of the human head FE models, as a step towards providing guidance on using such models for ranking head protection systems.
IC, GHBMC, and SIMon models were evaluated for different local metrics since these various metrics have been used to evaluate the effect on brain tissue in previous studies.19,45 For both GHBMC and SIMon models, MPS and CSDM were evaluated, and small differences were seen in the correlation between these two metrics (Kendall’s tau of 0.98 and 0.94, respectively). A slightly larger difference was seen for the IC model when evaluating strain and strain rate with Kendall’s tau of 0.84. These differences in the ranking only had a small influence on the helmet rating, which most often is the only information that is provided to consumers. For the GHBMC model, two helmets had different star rating (Helmet B and N), and for IC and SIMon models, four helmets had different star rating (Helmet E, I, N, and P for IC model, and E, H, I, O for SIMon model).
This study focused on the helmet ranking and rating using different brain models rather than evaluating the biofidelity of the models, e.g., through comparing their predictions with data from experiments on post-mortem human subjects (PMHS) or human volunteers. Most of the models have been evaluated against different PMHS experiments, and in some cases also against volunteer data (see the Supplementary material Table S3). However, it is difficult to compare the different validation results between the models due to differences in the validation process. There are some exceptions, e.g., Giordano & Kleiven26 evaluated the THUMS, isotropic KTH, and GHBMC models with the same methodology. In total, 40 experiments were evaluated. They found a biofidelity rating derived from correlation and analysis (CORA) score between 5.80 and 6.23, which was indicative of fair biofidelity for all three models. The ranking in this study showed that Kendall’s tau varied between 0.90 and 0.95 and only with a minor difference in helmet rating between these models, but this is not necessarily due to the fact that they have similar correlation against PMHS. Trotta et al.50 used the same evaluation protocol for one set of experiments with the UCDBTM and found higher scores compared to the GHBMC, and THUMS models, but when comparing the correlation of ranking to other models, UCDBTM showed the lowest values. Nevertheless, a recent study by Zhao and Ji54 suggests that CORA may not be effective in discriminating brain injury models in terms of whole-brain strain, after all, as two models with similar CORA scores could produce whole-brain strains 2–3 times difference in magnitude.
The ranking of helmets was also evaluated using some kinematic-based metrics. The rating of the helmets was vastly different for the metrics based on linear acceleration compared with the metrics predicted by the brain models. In terms of PLA, helmet Q was rated as a 1-star helmet, and all brain models rated that helmet as a 4- star helmet. PLA and HIC have shown better correlation to skull fracture than brain injuries.23,34 In this study, all brain models apply the dummy headform kinematics through a rigid skull, and only the response of the brain tissue is included in the comparison. In this sense, the models are only able to assess the risk of diffuse-type brain injuries (e.g., concussion, diffuse axonal injury), which arise primarily from brain deformation mechanisms22,40 resulting from head rotation. For future test standards and rating methods, it may be necessary to evaluate both the risk of skull fracture and a broader spectrum of brain injuries.
The kinematic-based metrics based on the angular motion had the highest correlation with the metrics predicted by the brain models, but metrics with the highest correlation were dependent on the model used. In some cases, the different metrics had a rather similar correlation coefficient both for ranking, and linear correlation. For most models, PAA showed a lower correlation compared to the other angular metrics with the exception of the IC model with strain and the UCDTBM model. Some previous studies21,30,32,48 have proposed that angular velocity shows a better correlation to brain responses for short duration pulses (10–20 ms), that are characteristic of helmet pulses, while angular acceleration plays a larger role for pulses with longer durations, e.g., automotive collisions. These present results are only for short duration helmet impact pulses.
Different rating methods have been presented previously, e.g., Deck et al.11 and Stigson.46 Those two studies were based on brain model response, but they rated the helmets based on the injury risk functions. In the present study, we rated the helmets based on the MPS/strain rate or CSDM directly without any assessment against injury risk curves. There were two reasons for this. Firstly, not all brain models used in this study have had injury risk functions developed specifically for them. Secondly, the data and methods for developing injury risk functions are changing rapidly with ongoing research. In the literature, some model developers use different types of brain injuries to create AIS2+ risk curves based on simulations of various accident situations.42 Others have developed the risk functions from reconstructions of concussion cases,5,33 a combination of football reconstructions and volunteer sled test data43,44 or scaled animal data.49 In future, it would also be wise to explore in more detail what is required from these injury risk functions and the underlying cases on which the risk functions are based. Whether and how risk functions should be used ought to be discussed in the research community, in addition to what particular inclusion criteria should be used when choosing the uninjured and injured populations. Ideally, this should be available to the scientific research community through an open-access database.
Since no injury risk functions were used when rating the helmets, the rating should not be interpreted as the absolute real-world performance of the helmet, rather the performance of that helmet with the conditions imposed to that particular brain model using that injury metric. As mentioned above, in the present study, the helmet rating was based on the mean value of three different impact conditions, and the star ratings were distributed depending on the 25th, 50th, and 75th percentile values of the seventeen different helmets. With this system the helmets are forced into four different categories. It could be so that all helmet had a low risk of injury and should be rated high or had a high risk of injury and should be rated low, but now the helmets are even distributed over the four categories of stars. For Zrot the coefficient of variation between the helmets was relatively small, the mean values of the metrics used to determine the boundaries of the ratings were also relatively close (see supplementary materials). If injury risk functions would have been used, it is possible that all the helmets would have been ranked in only one or two categories. With the system used in this study, in some cases, the response of two helmets was similar in their performance, but they were rated differently because their performance lay on opposite sides of the border between two rating categories. This would, of course, influence Kendall’s tau, and the rating methods.
Another proposed rating method for bicycle helmets is the STAR rating by Virginia Tech.4 They use a combination of peak linear acceleration and peak change in angular velocity for six impact locations using two impact velocities to calculate the STAR value. Since this study only included one impact velocity and three different impact locations, a modified STAR value (STAR*) was used. STAR* shows a lower correlation against the models both for linear correlation and ranking compared to the kinematic-based metrics that only depended on angular motion.
This study included eight different brain models, which to the authors’ knowledge is the most extensive comparison study to date. However, other brain models do exist and are used to assess safety products, although they have not been included in this study. In addition, there are also other metrics based on global kinematics that were not evaluated in this study. For the brain models, the effect on brain tissue has only been evaluated on whole-brain level, but there are studies suggesting that metrics on subregion levels would be a better predictor of brain injuries (e.g., Wu et al.52).
This study is based on experiments of bicycle helmet impacts that were performed without the neck and the rest of the body. It is unclear how important the neck and body are. Previous studies have shown divergent results when it comes to headfirst helmet impacts.14,16,25 The results have been influenced by the impact conditions such as impact point and impact surface. By including the neck and body, a biphasic acceleration with an acceleration phase and a deceleration phase could, which could amplify the brain strain.38,55 The deceleration phase is not included in the current study since all tests are performed without the neck. This is a limitation of the current study and different helmet rating programs,4,11,47 which may be addressed in future by the development of a neck that is more biofidelic in head first impacts.
From the models and kinematic-based metrics included in this study, the results show that the ranking and rating can be influenced by the choice of the assessment metric. There is a potential risk if different rating methods present different results depending on which FE model or kinematic-based metric inform their rating method. This is likely to cause confusion among consumers rather than provide constructive advice regarding the relative safety performance of helmets. Rating methods are best used to allow consumers to distinguish between a safer and less safe helmets, whereas test standards are intended to exclude unsafe helmets from the marketplace. We strongly suggest that the biomechanics community work collaboratively to reach consensus on a validation procedure for FE models of the head. This procedure should involve both validation against experimental data and comparison to real life accident so that derive trustworthy injury risk functions. However, as discussed above, we do not recommend at present that injury risk curves be used in helmet rating methods because the data and methods for developing injury risk functions are changing rapidly with ongoing research.
Nevertheless, in order to provide specific recommendations, further knowledge and technology developments are necessary. For example, more data based on real-world accidents are required to evaluate the performance of the injury metrics. A consensus on a standardized procedure to validate brain injury models and rate the performance is also needed to establish the confidence in their practical applications. In addition, injury risk functions based on real bicycle accidents with injured and non-injured casesare also needed for applications specific to bicycle helmets. At present, depending on which model or injury metric that is chosen to evaluate the helmet performance, the ranking and rating can differ. We suggest that all rating organizations should provide clear information regarding the uncertainty in the rating depending on the metric used.