Introduction

\(\beta\)-thalassemia is clinically and epidemiologically one of the most significant hemoglobinopathies prevalent in the Indian sub-continent and results from insufficient (\(\beta +\)) or no (\(\beta _0\)) production of \(\beta\)-globin polypeptide chains caused by a mutation in the \(\beta\)-globin gene [1,2,3]. The clinical manifestations of \(\beta\)-thalassemia are diverse, ranging from asymptomatic microcytic hypochromic red cells in the heterozygous state, known as \(\beta\)-thalassemia minor or \(\beta\)-thalassemia trait (BTT) to profound anemia in the homozygous stage (\(\beta\)-thalassemia-Major or \(\beta\)-TM), which is fatal in the first few years of life if not supported by regular blood transfusions [4,5,6]. Occasionally, under conditions of hematopoietic stress, e.g., during pregnancy or intercurrent infections, persons with BTT may also develop anemia and require blood transfusions. But in most instances, these asymptomatic parents are unaware of their carrier status and thus serve as a reservoir of the disease [7,8,9]. Therefore, carrier screening is inevitable to reduce the burden of the disease [10, 11].

Approximately 5% of the world’s population are carriers of \(\beta\)-thalassemia genes, particularly in the Mediterranean countries, south-east Europe, Arab nations, Asia, and parts of Africa [12,13,14,15]. In India, approximately 10,000 children are born with \(\beta\)-TM every year, and there are nearly 42 million carriers of BTT, with some communities like Sindhis, Gujaratis, Mahars, Kolis, Saraswats, Lohanas, and Gaurs exhibiting high prevalence [16, 17]. Although the Government of India had included the care and management of patients with thalassemia syndrome, the existing resources and infrastructure remain insufficient [2, 15, 16, 18]. Methods for differential diagnosis between BTT and Iron Deficiency Anemia (IDA) include quantitative detection of Hemoglobin A2 (HbA2) by High-Performance Liquid Chromatography (HPLC) and DNA studies [18, 19]. However, HPLC and DNA tests are expensive, and tests for a large population at risk can make a substantial healthcare burden. Therefore, developing cost-effective screening formulae remains a priority research focus. Over the years, more than forty formulae based on RBC parameters have been developed, as shown in Table 1.

Table 1 Forty-two discriminant formulae proposed in the literature

In recent years, several authors independently evaluated the efficiency of the above formulae [48,49,50,51] and reported that most of the formulae suffer from interference with iron and other nutritional deficiency anemia [48, 52]. The sensitivity (SE) and specificity (SP) values of some formulae varied considerably [19, 53,54,55]. Therefore, it remains critical to understand the strengths and limitations of all the existing discrimination formulae in a heterogeneous data set consisting of different hemoglobinopathies [56, 57]. Performance measures such as SE and SP or PPV and negative predictive value (NPV) represent a trade-off. The SE (or SP) enumerates the ability of a screening formula to identify subjects with (or without) the disease condition correctly. The critical perception is that focusing on only sensitivity when the consequence of a false negative rate is exceptionally high means applying such a formula has an adverse effect. Similarly, if we compare screening formulae directly in terms of area under the curve (AUC), a test with a smaller AUC might also be acceptable in certain circumstances [58]. Consideration of PPV and NPV might remove confusion somewhat, but not all [59, 60]. Consequently, many other performance measures, such as accuracy (ACC) and F1 score, are also recommended. Therefore, a trade-off among possible performance matrices is necessary when recommending the best-performing formula [61].

We aimed to rank forty-two formulae based on eight performance measures: SE, SP, Youden Index (YI), AUC-ROC, PPV, NPV, accuracy, and false omission rate (FOR). Three MCDM methods: (i)Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS), (ii) COmplex PRoportional Assessment (COPRAS), and (iii) Simultaneous Evaluation of Criteria and Alternatives (SECA) were employed. Therefore, the approach we proposed for evaluating formulae can minimize bias. Additionally, as shown in Table 1, some researchers recommend the use of a combination of formulae, e.g., [47] developed Janel 11T formula by aggregating the performance of eleven different formulae. The authors suggested a cut-off of 8 out of the 11 existing formulae in favor of BTT (at least 8 formulae recommended BTT). But formulae include some well-performed formulae (e.g., Shine & Lal), thereby questioning the necessity of aggregation and such rigorous evaluation.

The RBC-based discrimination formulae are cost-effective and applicable in low-resource settings but have yet to make the transition to an effective mass-screening decision support system. In this study, first, we evaluated the performance of forty-two formulae on a heterogeneous set of data based on eight different performance measures through MCDM methods. We found that SCS\(_{BTT}\) [18] ensured a higher rank in two of the three MCDM methods. With this encouraging result, a modification of SCS\(_{BTT}\) was proposed and validated the modification with 939 samples collected from the Nil Ratan Sircar Medical College and Hospital, Kolkata, India. It was found that SCS\(_{BTT}\) and its web application, SUSOKA, can be used for mass screening in the Indian context to reduce the cost of a molecular diagnosis for a heterogeneous set of populations.

Methods

Population evaluated

This retro-prospective laboratory-based study was conducted in the Hemolytic and Nutritional Anemia Laboratory, Department of Hematology, PGIMER, in collaboration with the Departments of Obstetrics and Gynecology, Clinical Hematology and Medical Oncology and Pediatrics (Pediatric Hematology-Oncology Unit). Active patient recruitment was done between January 2020 to March 2022. Retrospective record mining was done for cases tested between January 2015 to December 2020. The test was conducted on 6,388 samples (5,035 normal subjects NS, 65 HbE, 169 HbD-Punjab/Los Angeles, 203 sickle cell traits (SCT), 194 iron deficiency anemias (IDA) and 722 BTT). Out of the 722 subjects identified as BTT carriers, 40 also had HbE traits (double heterozygous E\(\beta\)), 17 had concomitant IDA, and 4 were identified as HbDβ. We excluded samples from the following subjects: (i) recently transfused subjects, (ii) subjects in whom the complete hemogram and HPLC data were not available, (iii) subjects who did not have a clear-cut diagnosis, and (iv) subjects with an acute bleeding episode within last three months. Complete blood count (CBC) data were collected during routine diagnostic analysis, and no additional information or extra experiments, such as Vitamin B12 studies, were performed for this study protocol. The laboratory at PGIMER is under the United Kingdom National External Quality Assessment Service (UK NEQAS) Hematology program and BioRad HbA2 EQA program.

Validation set

The validation set consisted of 939 samples (490 normal individuals, 170 HbE, 4 HbD, and 275 BTT). Out of the last 275 samples, 213 were identified as BTT carriers, 11 had \(\beta\)-TM, 3 had \(\beta\) trait with high fetal hemoglobin, 44 had HbE trait (double heterozygous E\(\beta\)), and 4 had HbS trait (double heterozygous S\(\beta\)) collected from the Nil Ratan Sircar (NRS) Medical College and Hospital, Kolkata, India. Active patient recruitment was done between January 2020 to December 2021, and similar exclusion criteria were used.

Diagnostic performance

Based on the literature, 42 discrimination formulae were considered for evaluation by using the following eight measures: \(\text {Sensitivity (SE)}= \frac{TP}{TP+ FN};\) \(\text {Specificity (SP)}= \frac{TN}{TN+ FP};\) \(\text {Youden's Index (YI)} = TPR+TNR -1\); AUC-ROC = \(\frac{1}{2}\)-\(\frac{FPR}{2}\)+\(\frac{TPR}{2}\)=\(\frac{1}{2}- \frac{FP}{2(FP+TN)}+\frac{TP}{2(TP+FN)}\); \(\text {Accuracy }(ACC) = \frac{TP + TN}{TP+ TN+ FP+ FN};\) \(\text {Positive predictive value (PPV)} = \frac{TP}{TP+ FP};\) \(\text {Negative predictive value (NPV)}= \frac{TN}{TN+ FN}\); \(\text {False omission rate (FOR)} = \frac{FN}{TN + FN}\) where TP, true positive; TN, true negative; FP, false positive; and FN, false negative. Therefore, FOR has a negative impact (i.e., the lower value represents the excellent indicator), and all others have a positive effect (i.e., the higher value means the excellent indicator).

Statistical analysis

Descriptive statistics, Kruskal Wallis test, Student’s t-test, and ANOVA were conducted where the significance level was set at p<0.01. Statistical analyses were performed using IBM SPSS-27 for Windows (IBM Corp, NY and USA). The SECA method’s optimization problem was solved using Wolfram Mathematica, and the final ranking for all three MCDM methods was done in Python.

MCDM methods

We present the complete methodology for three MCDM methods, namely, TOPSIS, [62], COPRAS [63, 64], and SECA [65] with a detailed explanation in the Supplementary file (Section B). Notably, the characteristics of the three methods are different [66]. A final rank obtained by the COPRAS method is based on the ratios to the ideal and the anti-ideal solutions, whereas Euclidian distance is considered in the TOPSIS method. The SECA method is characteristically different from the previous two methods as weights for each criterion are determined by solving a non-linear multi-objective optimization problem. We refer to the work by [67], where the authors proposed a framework for formal guidelines for the selection of MCDM methods.

Results

In this study, we used seven parameters: hemoglobin (Hb), hematocrit (HCT), mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), red blood cell (RBC), and red cell distribution width (RDW) of 6,388 subjects for the performance evaluation of forty-two formulae (Table 1). Samples were divided into multiple groups, and an overview of each parameter in each group is presented in Supplementary file (Section A in Table A1). Applying the Kruskal-Wallis test, we found that all the parameters significantly differed between the groups (p<0.001). The mean values of Hb, MCV, and MCH were higher, and RBC and RDW were lower for the normal subjects (NS) compared to IDA, BTT, SCT, and HbE samples. BTT subjects showed a lower value of Hb, MCV, and MCH and a higher value of RBC and RDW of all the groups. The performance of the formulae on the test data set is presented in Table 2.

Table 2 Performance analysis of discriminant formulae

The results demonstrate that the best-performing formula is Kurman-II in terms of ACC; Zaghloul-1 and Hameed in terms of SE and FOR; Janel 11T in terms of SP and PPV; SCS\(_{BTT}\) in terms of NPV; and S &L for YI and AUC-ROC. However, the formula Zaghloul-1 shows the worst performance regarding ACC and SP. RDW indices appear as poor performers in terms of SE and PPV. It is somewhat expected that the index with higher sensitivity might lead to lower SP [68]. However, some indices with higher SP lead to significantly lower specificity. Interestingly, YI for Zaghloul-1 is negative, along with formulae such as RDW, Huber-Herklotz, Sirachainan, Hameed, Zaghloul-2, and Kandhro-1. We removed seven such indices from the final analysis. Note that SP is also an important measure to exclude the samples. If specificity is too low, the samples not having the BTT are often recommended for further evaluation, which contradicts the objective of mass screening to save over-utilization of resources and reduce financial burden. The critical insight is that a single formula fails to ensure all higher performance measures. A trade-off among performance measures is required before recommending any specific formula due to the diversity in performance measures. Consequently, we applied three MCDM methods to obtain the ranking for the formulae as presented in Table 3.

Table 3 Final ranking for 35 formulae

We refer to the Supplementary document for step-wise details on each method. It was found that the S &L formula ensured a higher rank by the TOPSIS method, and SCS\(_{BTT}\) ensured a higher rank by the COPRAS and SECA methods. The rationale for such selection is that S &L shows a considerably higher measure of YI, AUC-ROC, NPV, or FOR. Similarly, SCS\(_{BTT}\) shows a relatively higher performance measure regarding SE, YI, AUC-ROC, or NPV. We computed Spearman’s rank correlation (see Table B5) among the MCDM methods, which indicated a strong correlation among methods [66] and ensure consistency. Next, we pinpointed the parameter range where the best two formulae miss-classified BTT subjects. We refer to Table B6 for the details of ranges shown in Fig. 1.

Fig. 1
figure 1

Ranges of seven parameters that lead to false negative measures in discriminating BTTs and implementation scheme

Since none of the formulae showed 100% sensitivity (Table 2), we found 26 BTT subjects miss-classified by SCS\(_{BTT}\), and among those nine subjects were identified as double heterozygous of HbE & BTT. For S &L, fourteen HbE & BTT subjects appeared amongst the 53 miss-classified samples. However, it appears that SCS\(_{BTT}\) can potentially discriminate all BTTs if individual samples have the MCV \(\le\)80 fl. Note that in Table B6, it is shown that the lower limit of Hb or MCH found for both formulae is relatively low. However, the lower limit for MCV for SCS\(_{BTT}\) is above the recommended lower bound in the literature [69, 70]. Therefore, a modification was proposed to the formula SCS\(_{BTT}\) to ensure more precise recommendations through a two-step procedure. We hypothesized that SCS\(_{BTT}\) can provide 100% sensitivity if \(MCV\le 80\) fl. To validate the hypothesis, we used a separate data set, and descriptive statistics for RBC parameters for that validation set are presented in Table A2 in Supplementary file. The results of the validation revealed that only three out of 275 BTT samples were missed by SCS\(_{BTT}\) and these three samples had MCV\(\ge 80\) fl; one of these samples had double heterozygous \(E\beta\). Noticeably, all four that had the HbS trait (double heterozygous \(S\beta\)) were recommended as BTTs.

Discussion

The results indicate that a single formula fails to ensure the highest performance with respect to all eight measures. We found five formulae that exhibited the best result for one or two performance measures. This result is consistent with the recent evaluation studies; for instance, [52] reported that the S &L formula ensured the highest SE (we observed the best YI and AUC-ROC for this S &L), the E &F formula presented the highest SP and PPV. The lowest NPV was obtained with the RBC formula. Similarly, [71] reported that the E &F formula showed the highest SE and SP. In this regard, we introduced MCDM methods for ranking the best-performing formula based on eight relevant measures. Since MCDM methods can establish the trade-off among the multiple criteria while determining the final ranking, it helps decision-makers obtain the final Pareto solution. Moreover, the ranking shows the S &L formula is one of the best performers, which is also in line with some evaluation studies [72,73,74]. The performance of SCS\(_{BTT}\) is also consistent with a recent evaluation study from a data set of 2,942 antenatal females samples [72]. Note that while setting weights for TOPSIS and COPRAS, the Shannon-Entropy method is used to assign the related weights to eradicate the bias of the decision marker. The SECA methods are developed so that the weight can be set automatically. The Spearman rank correlation establishes that the final ranking is also almost aligned, demonstrating the utility of the MCDM method’s application in selecting the best formula. Moreover, [75] proposed clinical utility index and recommend the use of Sensitivity \(\times\) PPV and specificity \(\times\) NPV, respectively, when positive and negative test results are under scrutiny. Note that the ranking under MCDM methods is also consistent with the newly proposed measure as S &L and SCS\(_{BTT}\) both appear within the first quarter.

Notably, discrimination formulae for BTT screening developed based on the principle of binary categorization, and the diagnostic classification of patients depends on whether the measurement of a trait is above or below some specific cut-off point. The rationale behind such variations in recommendation is that individuals with actual levels close to that cut-off point are more likely to be misclassified than intra-individual variability of the underlying traits or due to the influence of uncontrolled covariates. In that sense, the evaluation conducted in this study is exhaustive in terms of the total number of formulae we included and the inclusion of different variants in the test and validation sets. For instance, [48] reported \(Index\hspace{3.0pt}26\) is one of the best-performing formulae. However, we find a lower performance measure; this might be due to the variations of samples and the trade-off of multiple measures under consideration. From the perspective of clinical implementation, the validation of \(Index\hspace{3.0pt}26\) also needs intense effort. Additionally, the formula might be biased due to the duplication within the twenty-six formulae as reported by [76], Kandhro-II formula [46] is identical to the Ricerca formula [26], and the Keikhaei formula [35] is a duplication of the Jayabose RDW formula [30]. Similarly, \(Janel\hspace{3.0pt}11T\) aggregated the performance of eleven indices. Although it outperformed \(Index \hspace{3.0pt}26\) in the final ranking, such aggregation introduces complexity in the mass screening process.

Discrimination of BTT in mass screening has been a research priority in recent years. Formulae based on RBC parameters have several advantages, the most important being that they are less expensive for mass screening because RBC parameters are generated automatically during hemogram testing, independent of clinical suspicion or requisition. We found as many as seven CBC parameters used to construct the discrimination formula, as shown in Table 1. In addition, formulae such as \(M/H ratio=\frac{\%MicroR}{ \%HypoHe}\) [77] and \(MSI=\frac{\%MicroR}{MCV}\times MCHC \times Hb\) [19], and the authors used some parameters that required more costly analyzers. We excluded these formulae from the present evaluation. However, the consensus is that a formula must contain MCV, RDW, and RBC to ensure the best outcomes [76]. According to WHO, if MCV<80 fl, then the samples are to be considered as microcytic anemiaFootnote 1. In India, where anemia is an epidemic, the discrimination of BTT is a challenging task. In this context, the recommended range for MCV regarding BTT screening is 50 - 80 flFootnote 2. Similarly, [78] recommend the range for MCV as < 80 fl for adults; < 70 fl for children six months to six years of age; and < 76 fl in children seven to 12 years of ageFootnote 3. [10] recommended that the MCV range should be 60 - 70 fl for BTT carriers. Note that cut-off value for MCV in defining the screening strategy for BTT is widely used; for example, we found the following recommendations for the upper thresholds: \(<79\) [79]; \(<80\) [69]; \(<76\) [80]; and < 76 [70]. Additionally, researchers highlighted the importance of MCV for discriminating BTTs along with other RBC parameters [81,82,83]. Our study identifies the range as MCV \(\le\) 80 fl, also within the recommended bounds. The validation data also support the fact. In the original study [18] to derive SCS\(_{BTT}\), the authors emphasize securing 100% sensitivity and considered the infimum and supremum measures for each parameter while defining the cut-off values while implementing the machine learning algorithm. In this study, we undertook extensive validation and found that the formula could apply to heterogeneous samples within thresholds of MCV. Additionally, in the Indian context, individuals with MCV>80 fl but \(\beta\)-thalassemia carriers are not exceptional [84]. The authors also reported that twenty out of 149 \(\beta\)-thalassemia carriers in their samples showed HbA2< 3.5%. We also found some samples with similar characteristics (6 samples with HbA2 \(\le\)4) in our data set. Therefore, excluding samples with HbA2\(\le\)4 or MCV>80 might mislead the outcome. Accordingly, as we presented in Fig. 1b, we recommend further examination for those individuals to eliminate the risk of spreading.

Future research

Some limitations of the present study should also be considered. First, because thalassemia data were collected from two hospitals, we did not obtain sufficient data with demographic variables for evaluation. Second, the data available mainly focuses on particular age groups, which could impact the results of our study. Therefore, we need to use sufficient data as proposed by [85] for further evaluation. Further research is warranted to validate its diagnostic value in a population consists of microcytic anemia and various types of anemia.

Conclusion

Therapies for BTT management not only cost a lot but also need lifelong commitment to sustain life. Early detection through screening based on RBC parameters is thus a feasible cost- and resource-saving option. The results of the present study showed that in a two-step procedure, SCS\(_{BTT}\) can classify all the BTT samples with 100% sensitivity when MCV\(\le\) 80 fl, even if the sample included borderline cases for HbA2 measure or double heterozygotes. The key conclusions from this study are as follows: a single formula fails to ensure all higher performance measures for screening BTT. Therefore, applying MCDM methods to obtain the final ranking can be an excellent solution to select formulae. Second, formulae-based RBC parameters have several advantages independent of clinical suspicion or requisition. This study observed that an MCV value \(\le\) 80 fl can be an essential cut-off for discriminating BTT, but it is insufficient. Therefore, SCS\(_{BTT}\) along with the condition MCV\(\le\) 80 fl was recommended after conducting validation with data collected from a different institute, and it was found that SCS\(_{BTT}\) can classify all the BTT samples with 100% sensitivity when MCV\(\le\) 80 fl.