# Longitudinal Monitoring of Athletes: Statistical Issues and Best Practices

## Abstract

Athlete monitoring utilizing strength and conditioning as well as other sport performance data is increasing in practice and in research. While the usage of this data for purposes of creating more informed training programs and producing potential performance prediction models may be promising, there are some statistical considerations that should be addressed by those who hope to use this data. The purpose of this review is to discuss many of the statistical issues faced by practitioners as well as provide best practices recommendations. Single-subject designs (SSD) appear to be more appropriate for monitoring and statistically evaluating athletic performance than traditional group statistical methods. This paper discusses several SSD options available that produce measures of both statistical and practical significance. Additionally, this paper discusses issues related to heteroscedasticity, reliability, validity and provides recommendations for each. Finally, if data are incorporated into the decision-making process, it should be returned and utilized quickly. Data visualizations are often incorporated into this process and this review discusses issues and recommendations related to their clarity, simplicity, and distortion. Awareness of these issues and utilization of some best practice methods will likely result in an enhanced and more efficient decision-making process with more informed athlete development programs.

## Keywords

Continuous monitoring Athlete tracking Data return Data visualization## Athlete Monitoring

A desirable result of any strength and conditioning program is an improved level of preparedness and improved ability to perform [28, 31, 54, 68]. Typically this can be evaluated through some form of testing, but continual maximal effort performance testing may not be practical for athletes, especially those in season. As such, regular testing of physical performance at submaximal levels or during regular practice or competition may be a better approach allowing for more frequent measurement such as in an athlete monitoring program [32, 50]. Regardless of the type of testing or the variable of interest, measurement must be completed if one is to evaluate an athlete’s level of preparedness [28, 38, 50].

Along with aiding practitioners in the evaluation of athlete readiness, athlete monitoring also helps in the evaluation of strength and conditioning programs [28]. Data from monitoring programs may substantiate or contradict a strength and conditioning program. This goes beyond simply evaluating a program based on a win/loss record for a season and provides objective data on the success of a particular program for a team or an individual athlete. Direct objective feedback on an athlete’s progression can be given to coaches and other decision makers [48]. This data can also be used to help coaches and practitioners make data driven decisions for program improvement at the team or individual level [28, 50, 54].

Understanding the demands of a particular sport is an ongoing venture in sport performance. The relationship between competition performance data and collected monitoring data will likely help answer some questions and potentially bring about new questions [28, 38]. Data from athlete monitoring may also aid in talent identification or the identification of variables that contribute to optimal performance of a particular sport or task [60].

Data from athlete monitoring programs may also help clarify some of the ambiguity of the training process from a dose response perspective at the individual athlete level [28, 38, 50, 54]. Not all athletes will respond to training the same way and there should be a focus on individual responses to training. While standard pre/post testing may explain the success of a program with a sufficient sample size, that model does not work with individual athletes. Fortunately, athlete monitoring utilizes more frequent data collection, providing many data points for each athlete [50, 54].

Monitoring for the purpose of understanding training can be broken down into two primary areas: (1) dosage or input and (2) response or output. All of the training sessions, practice sessions, competitions, and anything else that results in a reaction of the athlete can be considered dosage [54]. Much of this can be quantified, but there may be issues with differing units of intensity across the different types of dosage [17]. The response or output is often more difficult to quantify, but a change in performance, whether an improvement or a decrement, is a good signal of a response. It is important to note, that a performance decrease may not always be visible by a single marker such as amount of weight lifted and measures in different areas may be required to show the response [18, 19, 20]. It is also important to note that some short-term performance decrement due to fatigue should be expected during training, but that should be logically planned as part of a functional overreach [24].

While testing and monitoring variables of potential enhanced performance is important, monitoring the recovery is just as important for athletes. Each session, whether it be a training session, practice, or competition, can be considered a stimulus for adaptation. Recovery is necessary if adaptation or some form of supercompensation is to occur [31, 54, 68]. Even if adaptation has not occurred, at least returning to normal homeostatic levels of preparedness is desirable for athletes in season [30]. Recovery occurs when the amount of training stressors and stimuli are reduced. Unfortunately, practitioners may often forget or not be able to quantify all stressors and stimuli [31]. An athlete may have an issue in their social life that is resulting in a reduction in sleep or a student athlete may be sacrificing sleep to study for a exam. Each of these examples reduces the amount of sleep an athlete might get, and there are many other opportunities that might alter optimal recovery that sport performance coaches may not be aware of [30]. There are consequences for athletes who have a less than adequate amount of recovery for a given amount of stress/stimuli that are likely to occur if this continues. Short term underrecovery may result in fatigue and decreased motivation. Long term underrecovery may result in performance decreases, overtraining, and athlete burnout [40].

There are a lot of areas and variables that can potentially be monitored. Selection of what measures to monitor should be done with caution and they should be relevant. Some caution should be given as practitioners should remember that monitoring data serves as a representation of performance, but it is not the actual performance itself [50]. Goodhart’s law states that “when a measure becomes a target, it ceases to be a good measure” [55]. This law should also be applied to athlete monitoring. Once an individual monitoring variable becomes the objective, it can no longer be considered an adequate monitoring variable. The objective of monitoring should be to predict some future performance, not of itself. This adage is borrowed from economics, but it seems useful for athlete monitoring as well.

While athlete monitoring is increasing in practice and research, it appears that much of the research has focused on specific areas to monitor and not as much on the statistical procedures involved [67]. This may lead to an accumulation of data, but no plan as to what to do with it [50]. A handful of papers have discussed some statistical techniques, but they do not focus much attention on data preparation and evaluation of the quality of data [12, 32, 39, 50]. The assumptions of normality, homogeneity of variance, and reliability of data collection methods can prove problematic if they are not evaluated. If violated, they can often render other statistical significance tests useless, so data screening prior to other analysis is necessary [64]. Even fewer articles discuss the return of the monitoring data to coaches and the visual display of the information [38, 39, 50]. The purpose of this review is to discuss many of the statistical issues and provide some best practices information. If literature is lacking in our field, best practices information from other fields will be borrowed and adapted.

## Statistical Concerns and Best Practices

### Regular Testing

### Singe-Subject Designs

### Statistical and Practical Significance

Single-subject designs seem to be more appropriate for athlete monitoring as they focus on each athlete individually. A common statistical concern of single-subject designs is the sample size. A sample size of one is likely disastrous for group designs in terms of achieving statistical significance. Single-subject designs overcome this by utilizing repeated measures of the same subject [32, 50]. Statistical significance is similar between group statistics and single-subject designs in that they are both attempting to quantify the probability that some treatment will reliably produce the same result [41, 58].

Even though single-subject designs have a small sample (*n* = 1), tests of statistical and practical significance still exist, but repeated measures are necessary to establish statistical significance. Instead of using a large sample of measures from different athletes, a single-subject design will likely use numerous data points over a period of time from the same subject. Then individual phases may then be identified, and measures of those phases can be compared. The extended celeration line (ECL), improvement rate difference (IRD), percentage of all nonoverlapping data (PAND), nonoverlap of all pairs (NAP), and Tau-U are examples of these procedures [36, 44, 62]. While all are different techniques, each is essentially evaluating the amount of data in a phase that overlaps with the data in the other phase or phases. The selection of a particular technique will likely come down to the specific situation encountered. For example, during sport performance assessments, practitioners may be concerned with a potential learning effect. Increases in performance could be due to increases in physical preparedness or due to getting better at the test. The presence of a learning effect in data could lead to a misinterpretation. Furthermore, many of the phase comparison techniques assume that the baseline phase data are not trended. Fortunately, the ECL can control for linear trends and the Tau-U can control for non-linear trends [44]. If practitioners suspect a learning effect might be present, one of the aforementioned options may be sought. Single-subject design phase comparison techniques are similar to group means comparison techniques with larger samples. Practitioners looking for a single-subject design alternative to a regression or prediction technique should consider the Theil–Sen slope. To the current author’s knowledge, the Theil–Sen slope has not been used in sport performance research but has been used to in student academic progress monitoring in a similar manner [61]. It is worthwhile to note that these procedures may not be widely available in statistical software, as single case research does not make up a large portion of the market share. That being said, these procedures are easy to do by hand and free web-calculators using these methods have been created that will generate *P* values, confidence intervals, and effect sizes [62].

It should be noted that statistical and practical significance are not the same. Statistical significance is generally expressed as a *P* value and provides information about the reliability of a finding, or the probability that the finding is by chance alone [41, 58]. Practical significance, often referred to as meaningfulness, is generally reported via some type of effect size estimate [14]. Much of scientific writing and publications depends heavily on *P* values, but there is a movement to rely less on them [1]. The justification for this is that findings are generally accepted as “significant” if a *P* value of less than 0.05 is achieved, but *P* values do not indicate the size of the effect. They are also heavily influenced by sample size. So much so, that a small effect with a large enough sample may produce a statistically significant *P* value. This may lead to practitioners making misinformed decisions. This has become such an issue in larger fields of science that over 800 researchers are now promoting the complete abandonment of *P* values in a recent publication in *Nature* [1].

Specifically concerning athlete development, measures of practical significance may be of primary concern. In order to enhance performance, coaches, athletes, and scientists alike are mainly concerned with meaningful change [14, 32]. Furthermore, even if a group statistical method is chosen, sample sizes in sport performance are generally dictated by the size of the team one is working with. Small samples more often than not result in a low likelihood of achieving statistical significance even if meaningful change has occurred [14]. That being said, this does not necessarily mean that the reliability of a finding should be ignored and only the magnitude of difference or relatedness should be considered. Whenever possible, both *P* values and effect size estimates should be reported [14].

### Pre-analysis Data Screening

The current environment in sport performance and sport science fields provides ample opportunity to utilize “Big Data” techniques within athlete development programs. This is especially true with elite sports where ample data is available publicly [29, 34, 52]. This data can be explored and manipulated to evaluate relationships and produce predictive models. This information can then be used to make more informed decisions about player development. The data collected in strength and conditioning and sport science programs can often be used in the same way as an abundance of data is collected through monitoring programs [3, 9, 11, 56]. Unfortunately, all data and all data collection instruments are not always useful. As such, there are some concerns that need to be addressed and considered prior to including any data in the decision making process.

### Reliability

Testing and monitoring the development of an athlete’s bio-motor abilities is vital to determine the progress, maintenance, or regression associated with training. There are numerous instruments and methods of assessing performance [23, 33, 42]. Sport scientists and strength coaches should be concerned with the reliability and validity of these measurement devices and protocols. Reliability concerns the consistency of results of multiple tests while validity concerns the similarity between the measured value and the actual value [2, 27, 65]. While both can be largely affected by the quality of instrumentation, reliability is also affected by the subject and test protocol. Thus, the standardization of testing protocol is an essential component of reliability.

Although investigations of reliability in sport and exercise science are relatively common, the methods of reliability assessment may be quite diverse. Methods include intraclass correlation coefficients (ICC), coefficient of variance (CV), standard error of measurement (SEM) and limits of agreement (LOA) between trials. ICCs provide a relative value of reliability, presenting the amount which subjects maintain their rank in the distribution of scores in repeated measures or trial assessments. Investigators should avoid employing ICC values from previous research to current investigations as ICCs are specific to the sample tested. The range of data is not accounted for in ICC assessment. Therefore, the range of measures could change dramatically while each subject maintains their rank in the sample resulting in a high ICC measure. This would potentially mislead the investigator as to the reliability of the measure if ICC is the only measure used [2].

CV, SEM, and LOA are absolute measures of reliability, in that they measure the level of variability of repeated measures. Formulas for calculating CV, SEM, and LOA include the standard deviation (SD) which somewhat illustrates the disparity between subjects. CV is commonly derived from the SD with the formula: \(CV = SD/mean*100\). SEM and LOA can also be derived from the SD (\(SEM = SD * \sqrt {1 - ICC}\) and \(LOA = 1.96 * \sqrt {2 * SD}\)) [2]. Previous authors have justified the usefulness of CV, SEM, and LOA for the exercise or sport scientist as they provide implications for measurement precision and the improved ability to infer results to other samples [2, 65].

CV, SEM and LOA are not equal measures of absolute reliability and distinction of the appropriate measurement is critical. SEM and LOA both assume the data is homoscedastic, meaning that every data point has the same chance of variance regardless of magnitude. CV, on the other hand, assumes the data are heteroscedastic, where there is an unequal chance of variance based on measure magnitude. Thus, if heteroscedasticity is present, a CV may be more useful. Heteroscedasticity is commonly present in measurement of sport science data, but one should conduct a test of hetero/homoscedasticity prior to application of a measure of reliability to ensure they are using the appropriate measure [2]. Therefore the use of ICC is still recommended when appropriate, and the proper usage of a measure of absolute reliability should also be considered [2, 65].

### Validity

Validity is the similarity between the measured value and the actual value. A measure must first be reliable before it can be considered valid. In fact, validity is dependent on reliability along with relevance [41, 58]. As such, it is possible for a measure to be reliable, but not valid if the measure is not relevant to its objectives. For example, a measure can be incorrect, but consistently incorrect leading to an acceptable level of reliability.

There are may different types of validity. Logical, ecological, and criterion validity are likely the ones most relevant to athlete monitoring. Logical or face validity refers to the way a test looks on the surface and it should logically measure what it is claiming to. This is quite important for coaches and athletes alike, as they may not fully participate in or support a test that does not show immediate perceived value [38, 58]. Ecological validity is concerned with the application of the findings to actual competition scenarios. Ecological validity is very important in athlete monitoring as the application of the findings is desired in a very short period of time [38]. Criterion validity utilizes scores on some criterion measure to establish either concurrent or predictive validity. Concerning the data collection and instrumentation part of athlete monitoring, concurrent validity can be established by examining the measures obtained via a specific method along with those simultaneously measured by a previously validated “Gold Standard” device [58]. For example, the force plate might be considered the criterion measure for analyzing jump performance as many variables can be attained from force–time data collected at high sampling frequencies. But, depending on software, force–time curve analysis may take a significant amount of time and there may be difficulties with portability. A switch mat may be a more practical way to measure jump performance, but it should be validated against a force plate, as has been done in research [10]. Predictive validity is concerned with the predictive value of the data obtained with a test. Concerning athlete monitoring, this refers to the ability to predict future sport performance. Sport scientist may be concerned with the predictive validity of a single measure or a combination of measures in a model [38].

*r*value. This can prove problematic if this is the only method of measurement of validity. Consider the two data sets in Table 1 of theoretical jumping peak power values. Running a PPM correlation yields a perfect 1

*r*value, but these data are not the same. A paired samples

*t*test reveals a

*P*value of < 0.000 and a Cohen’s

*d*effect size estimate reveals a value of 1.43 indicating that these are both statistically and practically different. While these circumstances may be unlikely, it is possible that two measurement devices can be highly correlated but statistically and practically different as has been seen previously in research [4, 5]. As a result, statistical validation should include multiple methods such as a PPM correlation and some form of means comparison (ANOVA,

*t*-test, limits of agreement, etc.) [6, 41].

Two data sets of countermovement jump peak power (PP) represented in watts (W)

PP1 | PP2 |
---|---|

2917.97 | 4376.95 |

3000.35 | 4500.52 |

3397.00 | 5095.50 |

3349.39 | 5024.09 |

3755.57 | 5633.36 |

3530.48 | 5295.72 |

3227.84 | 4916.76 |

4077.68 | 6116.52 |

3206.73 | 4810.10 |

3774.15 | 5661.23 |

### Heteroscedasticity and Measurement Error

It is important to remember that all measurement will contain some error. Even the most valid methods will have some amount of error. Theoretically we can consider the observed value as the sum of the true value and the error value (observed value = true value + error value). The true value is what one should strive to measure, but is not actually possible to attain [41, 58]. There are many sources of potential error and some such as methodology and instrumentation based error have been mentioned already. One area of potential error that is often neglected is the magnitude of the measure itself. If athletes who produce extreme values (very high or very low) have a greater chance to vary or produce error, the data are described as heteroscedastic. It is generally desired that measures be homoscedastic, meaning everyone has the same chance for measure variance regardless of measure magnitude [2, 27, 65]. Many statistical tests have the assumption of homoscedasticity. Along with SEM mentioned above, linear and nonlinear models also assume homoscedasticity [57]. Heteroscedasticity influences data by increasing the rate of type I errors rendering the results of statistical significance tests invalid [64].

Heteroscedasticity can be particularly troubling when dealing with sports performance data where extreme values may be seen on a regular basis. If data are heteroscedastic, practitioners should not be very confident in the reliability or validity of their data or their predictive models. As such, evaluation of heteroscedasticity needs to be completed [2, 27, 64, 65]. This can be evaluated in terms of reliability or validity. Either way, the difference (between trials for reliability or between the measured value and the value of the criterion measure for validity) is compared to the means. If the variance is uniform regardless of the means, it is considered homoscedastic. If there is a trend where more variance occurs at either end, it is considered heteroscedastic [2].

*x*axis and the between trial differences on the

*y*axis. It appears there is a relationship between residual size, or variance, and the measure magnitude. This would indicate the presence of heteroscedasticity. The visual inspection should be aided by statistical inspection whenever possible. Statistically, this can be evaluated by the Levene’s test or the Breusch–Pagan test [8, 37]. Both of these test the null-hypothesis that the data are homoscedastic, thus a

*P*value of less than 0.05 is necessary to indicate heteroscedasticity. These tests are routinely included in statistical software and programming languages. A less formal way to statistically test for the presence of heteroscedasticity is to run a correlation between the residuals and the means and to interpret the

*r*value. Heteroscedasticity is indicated if a relationship is present. If not, the data are likely homoscedastic [35].

While the assumption of homoscedasticity is not often evaluated in research, it is likely that much of sport performance data violates that assumption [2]. This seems particularly true in accelerometers and inertial measurement units where several studies have presented heteroscedastic data [3, 13, 42, 45]. Given the increase in usage of wireless sensors, some attention and concern should be given to this area and practitioners should evaluate their own data for the presence of heteroscedasticity.

## Data Return and Visualization

Much of the data collected, the indicators of performance, and the opportunity to adjust a training program based on it is time sensitive [7, 50]. As a result, it is imperative that data collected be utilized and returned to decision makers rapidly. As such, practitioners will be aided by software that can analyze and represent data quickly. There are many software programs that perform data analysis and produce data visualizations and dashboards (single-screen presentation of the most meaningful information) and do so quickly [16], but not all organizations and institutions will be able to afford such programs. As a result, Microsoft Excel may be the solution of choice for many, but it is limited in its ability to complete complex data analysis and creating numerous data visualizations for each athlete and team results in time-consuming redundant behaviors. Free, open-source programming languages such as R and Python may be a solution for this, but users must learn the syntax of the languages and additional packages (R Core Team, [46]; vanRossum, [63]. R and Python do offer more freedom in analysis and data visualization over other programs, but the initial coding may be time consuming. The ability to loop over, or iterate over a sequence of data instead of repeatedly completing similar tasks for every athlete or test, greatly reduces the time required to complete this [66]. This will save time over many of the “out of the box” software programs in the long run, but considerable time must be invested early on.

Regardless of the method chosen, the data given back to coaches or other decision makers likely comes in the form of a collection of charts, plots, or dashboards for each athlete. There are many factors that should be considered when returning data in the form of a visualization. In Edward Tufte’s landmark work on the subject, he promotes several tenets of graphical excellence and best practices. The main ones discussed in this paper will be that practitioners should represent as much data as possible in as little space as possible, graphics should not distort what the data is saying, and be fairly clear in its purpose [59].

### Efficient Data Display

Fundamentally, a radar plot is just a line graph with multiple data series’ that have been formed into a round shape [16]. Some potential concerns to the radar plot are that if one is using different measurement scales [e.g. peak force in Newtons (4982 N) and jump height in meters (0.51 m)] data will have to be normalized, otherwise the smaller numbers will not be visible on the shared axis when potted. The most common way to do this is with the *z* score or *t* score [38, 41]. Most statistical software has formulas built in for each, but they are easy to calculate by hand if not. The decision about which standardized score to use is generally based upon sample size. Both formulas use the standard deviation, but the *z* score is supposed to use the standard deviation of the population that it is representing, not necessarily the standard deviation of the sample being tested. Thus, the general recommendation is to use *z* scores with sample sizes greater than or equal to 30 and *t* scores with smaller samples [41]. That being said, one could argue that their team of 22 athletes is the population as well as the sample, so a *z* score may still be appropriate. A second concern is the desired magnitude direction. For example, Fig. 5 displays baseball monitoring data. For several of the variables it is desirable that the data points be further away from the center [jump height (JH), rate of force development (RFD), peak power (PP)]. There may be other variables where a smaller value is desired, such as the time it takes to reach first base. This may lead to some confusion if not explained well. The final concern is that once data are converted into standardized scores, the units are no longer necessary and magnitudes may be difficult to interpret. All of these concerns should be considered and addressed, but if the graphic causes too much confusion, then it may be time to simplify [16, 59].

### Misrepresenting the Data

Unfortunately, data visualizations can be misleading. For the purposes of athlete monitoring, misleading the viewer is likely an accident, but it can lead to making an incorrect decision. It is important to follow some best practices or guidelines when producing visualizations so this can be avoided.

One potential way for this to happen in athlete monitoring is by not displaying all the data or by not collecting enough data. If only preseason and postseason data are displayed, one might be mislead about what happened along the way or some effect might seem magnified as was the case in Fig. 1. Assuming regular testing is occurring, enough data should be available to produce time-series plots that are easily readable for viewers [50].

*y*axis may be the most common issue in data visualizations. For example, in Fig. 6 the same athlete’s countermovement jump data is used to create both plots. The plot on the left looks highly variable and seems to show a dramatic increase after the first two measurements for those not paying attention to the

*y*axis tick marks. The magnitude of difference is misleading here. Standardizing the

*y*axis will help us avoid this mistake. Fixing the

*y*axis at zero illustrates the difference in Fig. 6 and it is a good form for all plots to start y axes at zero as a result [16, 53, 59].

Data visualizations can distort effects in both directions, so a Lie Factor can be above or below 1.0. According to Tufte, anything outside of the range of 0.95–1.05 is substantial distortion. Thus the example shown in Fig. 6 is substantial distortion.

*x*or

*y*axis, perception is weakened further. Looking at the pie chart in Fig. 7, it may be relatively easy to determine that Catchers (C) represent 25% of the data because both sides of its slice are on the

*x*and

*y*axis. Turning your head or the image slightly increases the difficulty of determining its value [51]. Determining the value of any of the other positions is likely much more difficult. Bar charts are easier to interpret and can illustrate the same information, leading to the recommendation to replace pie charts with bar charts or a simple table whenever possible [15]. Speedometer or gauge plots are popular in performance based dashboards, but they are fundamentally just a different version of a pie chart as they represent fractional components. These are often more complicated to produce and are extremely poor performers in Tufte’s Data-ink ratio as they only represent one value (e.g. 85% of total) [16, 59]. While some plots may appear elegant or visually appealing, if they offer little information relative to the amount of ink required to create the graphic, space and time are not being used efficiently. Finally, some attention should be paid to the choice of color palette used. The ‘viridis’ color palette (available in R and Python, used in Fig. 7) is accessible to those with different types of colorblindness, so it will be clear to most who view graphics with it [21].

## Conclusion

Data collection during strength and conditioning and sport performance is on the rise and its use in athlete monitoring is also increasing. While the usage of this data for purposes of creating more informed training programs and potential performance prediction is promising, there are some statistical concerns that should be addressed by those who use this data. At minimum, analyses of reliability and the assumption of homoscedasticity should be evaluated. This should be done by all practitioners with their data, not relying on published findings of other samples. If possible, concurrent validity of devices should also be evaluated. Following any analysis, the data return process should not be overlooked. Data should be visualized in a simple and clear manner that does not result in distortion. This will likely result in an efficient decision making process and more informed athlete development programs.

## References

- 1.Amrhein V, Greenland S, McShane B. Retire statistical significance. Nature. 2019;567(7748):305–7.PubMedCrossRefPubMedCentralGoogle Scholar
- 2.Atkinson G, Nevill AM. Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Med. 1998;26(4):217–38.PubMedCrossRefPubMedCentralGoogle Scholar
- 3.Bailey C, McInnis T, Batcher J. Bat swing mechanical analysis with an inertial measurement unit: reliability and implications for athlete monitoring. J Trainol. 2016;5(2):43–5.CrossRefGoogle Scholar
- 4.Bampouras T, Relph N, Orne D, Esformes J. Validity and reliability of the myotest pro wireless accelerometer. Br J Sports Med. 2010;44(14):i20.CrossRefGoogle Scholar
- 5.Batcher J, Nilson K, North T, Brown D, Raszeja N, Bailey C. Validity of jump performance measures assessed with field-based devices and implications for athlete monitoring. J Strength Cond Res. 2017;31(p):s82–162.Google Scholar
- 6.Bland J, Altman D. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1(8476):307–10.PubMedCrossRefPubMedCentralGoogle Scholar
- 7.Bosco C, Colli R, Bonomi R, Atko Viru SVD. Monitoring strength training: neuromuscular and hormonal profile. Med Sci Sports Exerc. 2000;32(1):202–8.PubMedCrossRefPubMedCentralGoogle Scholar
- 8.Breusch T, Pagan A. A simple test for heteroskedasticity and random coefficient of variation. Econometrica. 1979;47(5):1287–94.CrossRefGoogle Scholar
- 9.Bricker J, Bailey C, Driggers A, McInnis T, Alami A. A new method for the evaluation and prediction of base stealing performance. J Strength Cond Res. 2016;30(11):3044–50.PubMedCrossRefPubMedCentralGoogle Scholar
- 10.Buckthorpe M, Morris J, Folland J. Validity of vertical jump measurement devices. J Sports Sciences. 2012;30(1):63–9.CrossRefGoogle Scholar
- 11.Camp C, Tubbs T, Flesig G, Dines J, Dines D, Altcheck D, Dowling B. The relationship of throwing arm mechanics and elbow varus torque: within-subject variation for professional baseball pitchers across 82,000 throws. Am J Sports Med. 2017;45(13):3030–5.PubMedCrossRefPubMedCentralGoogle Scholar
- 12.Clubb J, McGuigan M. Developing cost-effective, evidence-based load monitoring systems in strength and conditioning practice. Strength Cond J. 2018;40(6):7–14.Google Scholar
- 13.Driggers A, Bingham G, Bailey C. The relationship of throwing arm mechanics and elbow varus torque: letter to the editor. Am J Sports Med. 2018;47(1):1–5.CrossRefGoogle Scholar
- 14.Ellis P. The essential guide to effect sizes: Statistical power, meta-analysis, and the interpretation of research results. 1st ed. Cambridge: University Press; 2010.CrossRefGoogle Scholar
- 15.Few S. Save the pies for dessert. Visual business intelligence newsletter;2007.Google Scholar
- 16.Few S. Information dashboard design: displaying data for at-a-glance monitoring. 2nd ed. Burlingame: Analytics Press; 2013.Google Scholar
- 17.Foster C. Monitoring training in athletes with reference to overtraining syndrome. Med Sci Sports Exerc. 1998;30(7):1164–8.PubMedCrossRefGoogle Scholar
- 18.Fry A, Kraemer W, Borselen F, Lynch J, Marsit J, Roy E, Knuttgen H. Performance decrements with high-intensity resistance exercise overtraining. Med Sci Sports Exerc. 1994;26(9):1165–73.PubMedCrossRefPubMedCentralGoogle Scholar
- 19.Fry A, Kraemer W, Lynch J, Triplett NT, Koziris L. Does short-term near maximal intensity machine resistance exercise induce overtraining? J Strength Cond Res. 1994;8(3):75–81.Google Scholar
- 20.Fry A, Webber J, Weiss L, Fry M, Li Y. Impaired performance with excessive high-intensity free-weight training. J Strength Cond Res. 2000;14(1):54–61.Google Scholar
- 21.Garnier S. Viridis: Default color maps from ‘matplotlib’. 2018. https://CRAN.R-project.org/package=viridis. Accessed 9 Jul 2019.
- 22.Gonzalez-Badillo J, Gorostiaga E, Arellana R, Izquierdo M. Moderate resistance training volume produces more favorable strength gains than high or low volumes during a short-term training cycle. J Strength Cond Res. 2005;19(3):689–97.PubMedPubMedCentralGoogle Scholar
- 23.Haff G, Carlock J, Hartman M, Kilgore J, Kawamori N, Jackson J, Stone M. Force-time curve characteristics of dynamic and isometric muscle actions of elite women olympic weightlifters. J Strength Cond Res. 2005;19(4):741–8.PubMedPubMedCentralGoogle Scholar
- 24.Halson S, Jeukendrup A. Does overtraining exist? An analysis of overreaching and overtraining research. Sports Med. 2004;34(14):967–81.PubMedCrossRefPubMedCentralGoogle Scholar
- 25.Hickey J, Shield A, Williams M, Opar D. The financial cost of hamstring strain injuries in the australian football league. Br J Sports Med. 2014;48(8):729–30.PubMedCrossRefGoogle Scholar
- 26.Hoffman J, Kaminsky M. Use of performance testing for monitoring overtraining in youth basketball players. Strength Cond J. 2000;22(6):54–62.CrossRefGoogle Scholar
- 27.Hopkins W. Measures of reliability in sports medicine and science. Sports Med. 2000;30(1):1–15.PubMedCrossRefPubMedCentralGoogle Scholar
- 28.Joyce D, Lewindon D. High-performance training for sports. 1st ed. Champaign: Human Kinetics; 2014.Google Scholar
- 29.Kagan D. The anatomy of a pitch: doing physics with pitchf/x data. Phys Teacher. 2009;47(7):412.CrossRefGoogle Scholar
- 30.Kellman M. Enhancing recovery: Preventing underperformance in athletes. 1st ed. Champaign: Human Kinetics; 2002.Google Scholar
- 31.Kellman M, Beckmann J. Sport, recovery, and performance: interdisciplinary insights. 1st ed. New York: Routledge; 2018.Google Scholar
- 32.Kinugasa T, Cerin E, Hooper S. Single-subject research designs and data analyses for assessing elite athletes’ conditioning. Sports Med. 2004;34(15):1035–50.PubMedCrossRefPubMedCentralGoogle Scholar
- 33.Krustrup P, Mohr M, NYbo L, Jensen J, Nielsen N, Bangsbo J. The yo-yo ir2 test: physiological response, reliability, and application to elite soccer. Med Sci Sports Exerc. 2006;38(9):1666–73.PubMedCrossRefPubMedCentralGoogle Scholar
- 34.Lage M, Ono J, Cervone D, Chiang J, Dietrich C, Silva C. StatCast dashboard: exploration of spatiotemporal baseball data. IEEE Comput Graph Appl. 2016;36(5):28–37.PubMedCrossRefPubMedCentralGoogle Scholar
- 35.Lani J. Heteroscedasticity. 2019. https://www.statisticssolutions.com/heteroscedasticity/. Accessed 09 Jul 2019.
- 36.Lee J, Cherney L. Tau-u: a quantitative approach for analysis of single-case experimental data in aphasia. Am J Speech Lang Pathol. 2018;27(1S):495–503.PubMedCrossRefPubMedCentralGoogle Scholar
- 37.Levene H. Robust tests for equality of variances. In: Olkin I, editors. Contributions to probability and statistics: essays in honor of Harold Hotelling. Palo Alto, CA: Stanford University Press; 1960. p. 278–292.Google Scholar
- 38.McGuigan M. Monitoring training and performance in athletes. 1st ed. Champaign: Human Kinetics; 2017.Google Scholar
- 39.McGuigan M, Cormack S, Gill N. Strength and power profiling of athletes: selecting tests and how to use information for program design. Strength Cond J. 2013;35(6):7–14.CrossRefGoogle Scholar
- 40.Meeusen R, Duclos M, Foster C, Fry A, Glesson M, Neiman D, Urhausen A. Prevention, diagnosis and treatment of the overtraining syndrome: joint consensus statement of the european college of sport sciences (ecss) and the american college of sports medicine (acsm). Med Sci Sports Exerc. 2013;45(1):186–205.PubMedCrossRefPubMedCentralGoogle Scholar
- 41.Morrow J, Mood D, Disch J, Kang M. Measurement and evaluation in human performance. 5th ed. Champaign: Human Kinetics; 2016.Google Scholar
- 42.Nuzzo J, Anning J, Scharfenberg J. The reliability of three devices used for measuring vertical jump height. J Strength Cond Res. 2011;25(9):2580–90.PubMedCrossRefPubMedCentralGoogle Scholar
- 43.Ozturk S, Kilic D. What is the economic burden of sports injuries. Jt Dis Relat Surg. 2013;24(2):108–11.CrossRefGoogle Scholar
- 44.Parker R, Vannest K, Davis J. Effect size in single case research: a review of nine nonoverlap techniques. Behav Modif. 2011;35(4):303–22.PubMedCrossRefPubMedCentralGoogle Scholar
- 45.Perez-Castilla A, Piepoli A, Delgado-Garcia G, Garrido-Blanca G, Garcia-Ramos A. Reliability and concurrent validity of seven commercially available devices for the assessment of movement velocity at different intensities during the bench press. J Strength Cond Res. 2019;33(5):1258–65.PubMedCrossRefPubMedCentralGoogle Scholar
- 46.R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R foundation for statistical computing. 2017. https://www.R-project.org/. Accessed 9 Jul 2019.
- 47.Rose T. The end of average. 1st ed. New York: HarperOne; 2016.Google Scholar
- 48.Sands W. Monitoring the elite female gymnast. Natl Strength Cond Assoc J. 1991;13(4):66–72.CrossRefGoogle Scholar
- 49.Sands W, Stone M. Monitoring the elite athlete. Olymp Coach. 2005;17(3):4–12.Google Scholar
- 50.Sands W, Kavanaugh A, Murray S, McNeal J, Jemni M. Modern techniques and technologies applied to training and performance monitoring. Int J Sports Physiol Perform. 2017;12(Suppl 2):S263–72.PubMedCrossRefPubMedCentralGoogle Scholar
- 51.Schwabish J. An economist’s guide to visualizing data. J Econ Perspect. 2014;28(1):209–34.CrossRefGoogle Scholar
- 52.Sikka R, Baer M, Raja A, Stuart M, Tompkins M. Analytics in sports medicine: implications and responsibilities that accompany the era of big data. J Bone Jt Surg. 2019;101(3):276–83.CrossRefGoogle Scholar
- 53.Smith M. Conversations with data #31: bad charts. 2019. https://datajournalism.com/read/newsletters/bad-charts. Accessed 21 Jul 2019.
- 54.Stone M, Stone M, Sands W. Science and practice of resistance training. 1st ed. Champaign: Human Kinetics; 2007.Google Scholar
- 55.Strathern M. Improving ratings: audit in the British University system. Eur Rev. 2019;5(3):305–21.CrossRefGoogle Scholar
- 56.Suchomel T, Bailey C. Monitoring and managing fatigue in baseball players. Strength Cond J. 2014;36(6):39–45.CrossRefGoogle Scholar
- 57.Tabachnick B, Fidell L. Using multivariate statisitcs. 5th ed. Boston: Pearson; 2015.Google Scholar
- 58.Thomas J, Nelson J, Silverman S. Research methods in physical activity. 7th ed. Champaign: Human Kinetics; 2015.Google Scholar
- 59.Tufte ER. The visual display of quantitative information. Cheshire: Graphic Press; 2001.Google Scholar
- 60.Vaeyens R, Lenoir M, Williams A, Philippaerts R. Talent identification and development programmes in sport: current models and future directions. Sports Med. 2008;38(9):703–14.PubMedCrossRefPubMedCentralGoogle Scholar
- 61.Vannest K, Parker R, Davis J, Soares D, Smith S. The Theil-Sen slope for high-stakes decisions from progress monitoring. Behav Disord. 2012;37(4):271–80.CrossRefGoogle Scholar
- 62.Vannest K, Parker R, Gonen O, Adiguzel T. Single case research: web based calculators for scr analysis. 2016. http://www.singlecaseresearch.org/. Accessed 05 Jul 2019.
- 63.Van Rossum G. Python tutorial, technical report cs-r9526. Amsterdam, Netherlands: Centrum voor Wiskunde en Informatica (CWI). 1995. http://www.python.org/. Accessed 9 Jul 2019.
- 64.Vincent W, Weir J. Statistics in kinesiology. 5th ed. Champaign: Human Kinetics; 2012.Google Scholar
- 65.Weir J. Quantifying test–retest reliability using the intraclass correlation coefficient and the sem. J Strength Cond Res. 2005;19(1):231–40.PubMedPubMedCentralGoogle Scholar
- 66.Wicknham H, Grolemund G. R for data science. 1st ed. Sebastopol: O’Reilly Media; 2017.Google Scholar
- 67.Wing C. Monitoring athlete load: data collection methods and practical recommendations. Strength Cond J. 2018;40(4):26–39.Google Scholar
- 68.Zatsiorsky V, Kraemer W. Science and practice of strength training. 2nd ed. Champaign: Human Kinetics; 1995.Google Scholar