Individualization of osteoporosis risk
Anti-fracture therapy is often pursued in individuals with osteoporosis or low bone mineral density with or without a prior fracture [1, 2, 3], because they are at high risk of subsequent fracture [4, 5]. In recent years, there has been a proliferation of clinical prediction rules, including clinical risk indices and scores, for identifying individuals with osteoporosis. The main problem with these clinical prediction rules is that they categorize individuals into low-risk versus high-risk groups based on some arbitrary threshold. As a result, when this risk stratification-based approach is applied to an individual, its prognostic performance is often poor.
Among the clinical indices for predicting osteoporosis, the Osteoporosis Self-Assessment tool (OST)  is perhaps the simplest score and has perhaps been the most widely studied. However, the clinical usefulness of this score in different populations has not been systematically studied. In this issue, Rud and colleagues  systematically summarize some key prognostic properties of the OST score across different Asian and Caucasian populations. They especially focus on the prognostic value of OST in ruling out osteoporosis (i.e., LR−). Their meta-analysis showed that the overall LR− was 0.19 (95% confidence interval: 0.17 and 0.21) with considerable heterogeneity between populations.
Summary of diagnostic values for Asians and Caucasians
Prevalence of osteoporosis (%)
0.90 (0.88, 0.92)
0.94 (0.93, 0.95)
0.57 (0.56, 0.58)
0.37 (0.36, 0.37)
Likelihood ratio positive (LR+)
2.06 (1.70, 2.51)
1.66 (1.51, 1.82)
Likelihood ratio negative (LR−)
0.19 (0.13, 0.28)
0.19 (0.17, 0.21)
Diagnostic odds ratio (DOR)
11.2 (7.3, 17.1)
7.83 (7.08, 8.67)
Summary AUC (mean, SE)
Several factors contribute to the relatively poor performance of OST. First, the prevalence of osteoporosis in the populations under review, and the relationships between age, weight and BMD is not constant across the population. For example, the prevalence of osteoporosis varied between 7 to 21%, with and the LR− being between 0.02 and 0.49 among the Asian populations under review. Indeed, a regression analysis of log DOR against the measure of threshold  among the Asian populations yielded a slope of 0.41 (with standard error = 0.14, p = 0.014), suggesting that the performance of OST varies with threshold across populations.
Second, the correlation between age or weight and BMD is moderate, and when these variables are dichotomized there is a loss of information. In the Dubbo Osteoporosis Epidemiology Study , for example, age and body weight—in their continuous measurements—collectively account for 36% of total variance in femoral neck BMD. When BMD is dichotomized into osteoporosis or non-osteoporosis , and this dichotomous variable is modeled as a function of age and weight in their continuous scales in a logistic regression model, the pseudo R2 value reduced to 24%. Furthermore, when the dichotomous variable is modeled as a function of OST category (i.e., “low risk” versus “high risk”) in the logistic regression model, the pseudo R2 value was further reduced to 13%, a reduction of 23% from the R2 value of the original continuous model. Therefore, the dichotomization of OST scores into a “low risk” and “high risk” category can result in a considerable loss of information. With such a double loss of information (from the dichotomization of BMD and OST scores), the predictive power of discrimination of the OST score is thus expected to be low.
Third, BMD is measured with random error and random fluctuation within individuals. This variation, albeit small relative to the mean, can nevertheless result in a significant misclassification of osteoporosis versus non-osteoporosis . An empirical analysis suggested that in individuals aged between 60 and 69 years, among whom the prevalence of osteoporosis is estimated to be 10.5%, the false positive rate could be as high as 12.4% . Therefore, when correlating two imperfect and dichotomized measurements (OST scores and BMD), the discrimination and calibration can never be expected to be perfect.
Fourth, the derivation of the OST score was based on the concept of risk stratification, in which individuals were classified into subsets using some cut-off values, despite the fact that there is no threshold between high and low scores, but rather a progressive increase in the risk of osteoporosis with lowering OST scores. For example, the OST is commonly calculated by the following formula: OST=0.2×(weight − age), and by which, individuals with OST scores being less than or equal to −1 were classified as “high risk”, and otherwise, a “low risk” classification was made. Thus, by the OST classification, a 50 year-old woman with 55 kg is classified as having the same risk of osteoporosis (e.g., high risk) as a 60 year-old woman with 65 kg. On the other hand, a 55 year-old woman with 50 kg is classified as “high risk”, whereas a woman with the same age but 51 kg is classified as “low risk”. It is therefore not surprising that when such a risk stratification is applied to an individual, the calibration and discrimination can not be expected to be high. Indeed, even if the OST score can perfectly resolve the population into two distinct groups of osteoporosis and non-osteoporosis by the cut-off value of −1, the calibration is still not perfect, because the probability of osteoporosis associated with an OST score of −1 means exactly the same as the probability of osteoporosis associated with an OST score of −5!
Prognosis is about imparting information of risk to an individual, and each individual is a unique case, in the sense that any two individuals are unlikely to have the same risk profile. Therefore, the risk of osteoporosis should ideally be individualized. One approach to increase the uniqueness of prediction is to consider the risk in its continuous scale, rather than in dichotomized scale based on a cut-off value. The uniqueness of prediction, or individualization of risk, can also be increased by considering multiple risk factors in a multivariable model, since the more factors considered, the greater likelihood of uniqueness of an individual is defined. Several studies in the cancer field have suggested that such a nomogram-based risk prediction performs better than risk-grouping approach [13, 14], because the nomogram recognizes and can define the unique risk profile for an individual. In osteoporosis, it has recently been shown that a multivariable-based nomogram incorporating quantitative ultrasound measurement and the OST components (i.e., age and weight) could significantly increase the predictive value of osteoporosis .
An important weakness of all current prognostic models is that they are based on a single measurement, with the underlying, but not stated, assumption that the measurement does not change with time. Obviously, this assumption is not true in many measurements such as BMD and body weight that are known to decline or change with time, and that the rates of change varied substantially among individuals . Therefore, an important perspective in clinical risk score development should take the time-varying nature of risk factors into account to achieve a better prediction of risk for an individual.
Osteoporosis as operationally defined by BMD is one of many biologic, genetic and environmental risk factors that affect the likelihood of fracture in an individual. While osteoporosis is the best predictor of fracture risk, it can not account for all fractures in the elderly population. Indeed, in individuals aged 60+ years, 55 and 74% of fracture cases occurred in non-osteoporotic women and men, respectively . Thus, from a clinical point of view, the prediction of osteoporosis is not as useful as the prediction of fracture risk, and to this end, the OST does not perform well .
The goal of a clinical risk score or any clinical prediction rule is to suggest a prognosis or therapeutic action for an individual. However, due to its modest predictive properties, the OST score is far from suitable for clinical use on a regular basis, but it may be used for ruling out BMD scans or for counselling purposes. Nevertheless, there is room for further refinement of the score by making use of its continuous measurements and other clinical measurements to better characterize the risk of osteoporosis and/or fracture for an individual.
- 1.National Osteoporosis Foundation (2003) National Osteoporosis Foundation physician’s guide to prevention and treatment of osteoporosis. National Osteoporosis Foundation, Washington, DCGoogle Scholar
- 7.Rud B, Hilden J, Hyldstrup L, Hrobjartsson A (2007) Performance of the Osteoporosis Self-Assessment tool in ruling out low bone mineral density in postmenopausal women: a systematic review. Osteoporos Int (18). DOI 10.1007/s00198-006-0319-3