Introduction

One of the main purposes of identifying biomarkers is the diagnosis of diseases. In case–control studies, the potential relationship between a biomarker and the disease is examined by comparing the frequencies of the biomarker in diseased and non-diseased subjects, and the efficacy of a biomarker is usually described in terms of change in consistency, which is indicated by the Youden index (J)1,2,3. In a cohort study, a suspected biomarker is considered as an exposure factor, and exposed and unexposed subjects are observed until they develop the disease. The difference in the disease’s incidence between an exposed and non-exposed group, which is referred to as the consistency rate in cohort studies (Crc), indicates the role of the observed factor in the disease’s pathogenesis4,5,6. This type of research design is chronologically consistent in that the biomarker is the starting point for the diagnosis of the disease; therefore, a cohort study is probably more apt for identifying and analyzing biomarkers7,8,9. However, most studies that identify biomarkers use a case–control design rather than a cohort design1,2.

The relationship between the results of a case–control study and a cohort study is represented by the following formula10.

$$PPV = 1 - \frac{(1 - Spe)*(1 - m)}{{(1 - Spe)*(1 - m) + Sen*m}};\quad NPV = \frac{m*(1 - Sen)}{{Spe*(1 - m) - Sen*m}}$$
$$Crc = PPV - NPV$$
(1)

PPV and NPV represent the disease’s incidence (the frequency with which disease occurs) in the exposed and non-exposed (biomarker) group, respectively; Sen and Spe represent the positive rates of the biomarker in the disease group and the negative rates of the biomarker in the control group, respectively, in the case–control study; “m” represents the incidence in the total population; and Crc represents the consistency rate in the cohort study, which is differences in incidence between the two groups and also mean probability of incidence for a biomarker11.

The results of a case–control study and a cohort study are not always parallel. For example, if the occurrence probability of a biomarker is assumed to be 0.85 in the disease group and 0.05 in the control group, then the J value would be 0.80 (0.85–0.05) and the Crc would be 0.145 (m=0.01). When the cardinal number, that is, the probability in the control group, is relatively large, for example, 0.90 in the disease group versus 0.10 in the control group, then J is 0.8 and Crc is 0.082. This means that in case of a low-probability event (for example, m = 0.01), the difference between J and Crc would be significant. The occurrence of a disease is a low-probability event; therefore, J would be significantly larger than Crc. This indicates that the overestimation of J in case–control studies is a serious problem in determining the efficacy of a biomarker.

In the present study, I propose a comprehensive index for biomarkers, namely, CIB, that is based on a combination of consistency determined through both case–control and cohort studies, that is, J and Crc. CIB could overcome the limitations of J in low-probability events and have potential for determining the diagnostic efficacy of a biomarker and the difference between its diagnostic efficacy and screening efficacy.

Materials and methods

Calculation of CIB

The principle of the current analysis is to comprehensively evaluate the consistency of a biomarker in a case–control study and a cohort study in order to determine its efficacy. The efficacy of a biomarker is normally described in terms of J, which is the sum of the positive rates of a biomarker in the disease group (referred to as sensitivity or Sen) and the negative rates of the biomarker in the control group (referred to as specificity or Spe) minus 13.

$${\text{J}} = {\text{Sen}} + {\text{Spe}} - 1$$

The consistency in a cohort study (Crc) is the sum of the incidence in the exposure group (positive group for a biomarker) (PPV) and the non-diseased rate (percentage of healthy individuals) in the non-exposure group (negative group for a biomarker) (NPV) minus 1 as follows11:

$${\text{Crc}} = {\text{PPV}} - ({\text{NPV}} - 1) - 1 = {\text{PPV}} - {\text{NPV}}$$

Using J and Crc, CIB is calculated as follows.

$${\text{CIB}} = \left( {{\text{J}} + {\text{Crc}}} \right)/2$$
(2)

In fact, CIB comprehensively incorporates Sen, Spe, PPV, and NPV.

When evaluating the diagnostic efficacy of a biomarker, its incidence in the total population (m) is assumed to be 0.50 because patients are typically symptomatic. For evaluating screening efficacy, including predictive power, the incidence (m) is assumed to be 0.05, because the subjects are usually healthy individuals without any symptoms. Thus, the range of CIB is 0–1, with a greater CIB value implying stronger predictive power of the biomarker.

Evaluation of data from a case–control study

The basic principle of the analysis is to determine whether J can accurately reflect CIB.

Evaluation of J in a case–control study based on CIB calculated from both the case–control study and the cohort study was performed using Eq. (1) (which represents a definite relationship between the outcomes of a case–control study and a cohort study) and Eq. (2). The data for the test set were generated based on J, with large and small cardinal numbers in the control group and CIB calculated as shown in Table 1. The data in Table 1 show that the incidence of the disease influences the relationship between J and CIB. When the incidence is 0.50, the value of J is similar to (but not equal to) that of CIB. Therefore, in the case of a high-probability event (probability = 0.50), the efficacy of a biomarker can be described in terms of J. However, there was a significant difference between J and CIB in the case of a low-probability event (probability = 0.05).

Table 1 Relationship of comprehensive index of biomarker (CIB) with Youden’s index (J) according to incidence in the total population.

Evaluation of sensitivity and specificity

In case–control studies, biomarkers are assessed in already diseased individuals, and the power of a biomarker is typically expressed as the positive rates of the biomarker in the disease group (Sen) and the negative rates of the biomarker in the control group (Spe)3. As explained in the previous subsection, the diagnostic power of J may differ from that of CIB in the case of low-probability events. In this analysis, we examined whether Sen or Spe is more relevant with regard to CIB for biomarkers with the same J value for low-probability events. Evaluation of Sen and Spe in a case–control study based on CIB values showed that the J value differed for different Sen and Spe values. A scatter diagram was plotted with J on the X-axis and CIB on the Y-axis.

Receiver operating characteristic analysis of CIB

Receiver operating characteristic (ROC) analysis is a common method used to evaluate the effectiveness of a diagnosis made using a biomarker12,13. The present study is to determine whether the ROC analysis was still available or not with using CIB instead of J.

A model comprising four sets of simulated data was established. Four sets of normally distributed random numbers (100 ± 20, n = 5000; 115 ± 20, n = 5000; 125 ± 20, n = 5000; 140 ± 20, n = 5000) were generated using the SPSS statistical software (IBM Corp., Armonk, NY, USA). Model A consisted of the datasets 100 ± 20 and 115 ± 20; Model B consisted of the datasets 100 ± 20 and 125 ± 20; and Model C consisted of the datasets 100 ± 20 and 140 ± 20. ROC analysis was performed as shown in Fig. 1.

Figure 1
figure 1

Receiver operating characteristic (ROC) analysis of simulated data in Model A (A), Model B (B) and Model C (C).

When the cardinal number (frequency in the control group) is relatively small (and Spe is higher), Crc could be infinity (Crc = 1). Therefore, if the frequency of a biomarker is less than 0.05 in the control group, it should be assigned a value of 0.05.

Efficacy of CIB based on an actual dataset

Our previous research found that the tumor marker index (TMI) calculated from serial tumor markers can be considered as a simple tool for the diagnosis of gastric cancer1, so these results were considered to be apt for comparing the diagnostic and screening efficacy of J and CIB.

Results

The relationship between J and CIB is shown in Fig. 2. A plotted scatter diagram revealed that when the CIB level was 0.90, CIB was only 0.70 for an incidence rate of 0.05 in the total population.

Figure 2
figure 2

Relationship between Youden index (J) and comprehensive index of biomarker (CIB) (incidence = 0.05). CIB showed an unsteady increase with J for low-probability events.

The Sen and Spe of biomarkers in a case–control study were evaluated based on the CIB values, as shown in Fig. 3 and Table 2. There was a significant difference in J for different Sen and Spe values and CIB for a low-probability event (m = 0.05). As shown in Table 2, higher Spe (or a lower false-positive rate) could indicate better power of CIB for biomarkers with the same J.

Figure 3
figure 3

Relationship between comprehensive index of biomarker (CIB) and Youden index (J) for different sensitivity (Sen) and specificity (Spe) values (incidence = 0.05). There was a difference in CIB between biomarkers with the same J that had higher Spe and higher Sen.

Table 2 Evaluation of sensitivity (Sen) and specificity (Spe) in a case–control study based on comprehensive index of biomarker (CIB) (Incidence = 5%).

For ROC analysis, the simulated sample size was 5000, and the results for the case–control study are shown in Table 3. The results showed that the optimum cut-off values of J and CIB were different when the incidence was 0.05.

Table 3 ROC analysis of J and CIB (incidence = 0.05) to determine the optimum cut-off value in a simulated dataset.

Actual data from our previous research were used for evaluating biomarker efficacy. In our previous research, TMI derived from serial tumor markers was found to be useful for the diagnosis of gastric cancer based on ROC analysis (Fig. 4 and Table 4). As shown in Fig. 4, the optimum cut-off values for diagnosis (incidence = 0.50) and for screening (incidence = 0.05) were different. The results indicate that if the cardinal number (value in the control group) is very small (and Spe is much higher), there could be an unsteady increase in CIB. Therefore, this frequency should be considered as 0.05 to calculate CIB, as shown in Table 4.

Figure 4
figure 4

ROC analysis of tumor marker index (TMI) for the diagnosis and screening of gastric cancer. (A) optimum cut-off value for diagnosis (incidence = 0.50); (B) optimum cut-off value for screening (incidence = 0.05).

Table 4 Valuating the diagnostic and screening efficacy of tumor marker index (TMI) for gastric cancer with ROC analysis (incidence = 0.05).

Discussion

In the present study, we have proposed and evaluated an index for evaluating the diagnostic and screening efficacy of biomarkers for specific diseases. This index, CIB, is calculated using the consistency rate determined from case–control studies (J) and cohort studies (Crc). In fact, CIB comprehensively incorporates Sen, Spe, PPV, and NPV.

Our results show that when the incidence is 0.50, the J score is similar to CIB. As the subjects considered for diagnosis are usually symptomatic, the occurrence of a disease can be assumed to be a high-probability event for which the incidence can be set as 0.50. Therefore, for determining the diagnostic efficacy of a biomarker, J has similar power as CIB. In contrast, there is a significant difference between J and CIB in a low-probability event (probability = 0.05). As the subjects considered in screening for a disease are usually healthy and asymptomatic, the occurrence of a disease is assumed to be a low-probability event for which the incidence can be set as 0.05. Therefore, for determining the screening efficacy of a biomarker, J may not have as much power as CIB. Overall, our findings indicate that CIB may have potential for evaluating the screening efficacy of disease biomarkers.

For determining the screening efficacy based on CIB, the incidence (m) should be considered as 0.05 because test indicators usually include a 95% population interval as a reference range, with 5% of the population outside the normal reference range. The results showed that at an incidence of 0.05, ROC analysis of CIB showed an increase in the area under the curve. Thus, ROC analysis could be used to determine the cut-off values for screening purposes. The results indicated that higher Spe at a similar J value could indicate better power (and higher CIB), as shown in Table 2. Thus, CIB could increase unsteadily with J. Therefore, if the cardinal number (frequency in the control group) is very small (and Spe is much higher), this value should be assumed as 0.05 to calculate CIB.

Because the CIB range is typically 0–1, we propose that a CIB value of > 0.50 be considered to have clinical value3. However, diagnostic value is not necessarily equivalent to screening value, as shown in Table 3. Evaluation of biomarker efficacy using actual data from our previous also showed that TMI, which is derived from serial tumor markers, was more suitable for diagnosis than screening (Table 4). From analysis of the actual data, we also found that the J value from the case–control design was significantly larger than the CIB value for a low-probability event. This confirms the overestimation of J in low-probability events. Another example is the analysis of genetic associations (screening based on genetic markers), which has been successful in mapping genes, but is clinically inefficient because of inconsistent findings that have been partly attributed to overestimations in case–control studies. With the exception of Mendelian diseases, significant associations are difficult to detect because genetic diagnosis is usually used to screen healthy individuals for a disease, few genes have a CIB over 0.5, it might be misleading to pay attention only to the results for J from case–control studies. A statistical difference does not necessarily represent strong clinical effects, and diagnostic value does not always imply screening value.

It should be pointed out that to simplify the calculation, the incidence value in the present study was assumed to be 0.50 for diagnosis and 0.05 for screening. However, a more accurate estimation of CIB could be obtained based on the actual incidence of a disease. This is a line of investigation to pursue in the future.

In conclusion, CIB, which combines the consistency rates obtained from both case–control and cohort studies, could be more useful than J for determining the efficacy of a biomarker for screening purposes. It was also found that the efficacy of a biomarker could differ for diagnostic, screening, predictive, and prognostic purposes, and it would be better to evaluate the efficacy of biomarkers for specific systems or contexts.