Introduction

Beta thalassemia, the most common inherited monogenic disorder, is a major public health issue in the tropics and is being increasingly recognized worldwide. [1, 2] Considerable expenditure of public money is needed for prevention, control and treatment of the disease. [3, 4] India has the largest number of thalassemia major children worldwide, nearly 150,000, with addition of 10,000–15,000 new cases every year. In addition, there are around 42 million ß-thalassemia carriers in the country with an average prevalence rate of 3–4%. [5] Although certain haemoglobin (Hb) variants are predominantly found in particular areas of the country (for, e.g. HbE and HbE-beta thalassemia is mainly found in West Bengal and north-eastern parts of the country), due to population migration and admixture, the entire spectrum of thalassemia and Hb variants are now found all over the country. [6] In an inherited disorder with variable genotype-phenotype associations, the only way to eradicate the disease is to have a robust screening program in place, one which can conclusively detect carrier state and help genetic counselling of a couple planning their family. Thus, carrier detection has a major impact in the control of this disease. The current population screening programs are based on Hb high-pressure liquid chromatography (HPLC) of haemoglobin samples. [7] The aim of this paper was to analyse the data collected at thalassemia screening camps and determine carrier prevalence rates, the demographic profiles and the distribution of carrier state across various communities in the region. RBC indices were analysed to generate a novel formula to distinguish between normal individuals and β-thalassemia carriers.

Material and methods

Study design

This was a retrospective data analysis of 21,695 individuals screened for thalassemia. Duration of the study was over a period of 5 years (January 2014–December 2019). Informed consent was taken from all participants. Demographic data, HPLC data and haemogram parameters were noted for analysis. The individuals in the study population were screened at camps, outreach centres, schools and antenatal clinics organised by the Thalassemia Control Unit of the Institute of Haematology and Transfusion Medicine (IHTM), Medical College Hospital, Kolkata, West Bengal. IHTM, which is the nodal centre for thalassemia control in the state, performs routine antenatal screening for thalassemia in the form of pre-conceptional testing, prenatal testing at 8–12 weeks of gestation and chorionic villus sampling (CVS) in applicable cases. However, this study was done to determine the prevalence of thalassemia carriers and generate a novel screening formula only; thus, prenatal diagnosis and CVS data have not been included here. Thalassemia major patients detected by HPLC were also excluded.

Haemogram and HPLC analysis

Blood samples were run on the Sysmex KX 21 haematology analyser [Sysmex Corporation, Japan] and HPLC was done on the Bio-Rad Variant Haemoglobin Testing System (Beta-thalassemia short program) [Bio-Rad, California, USA]. Peripheral blood smear examination was done for validating the haemogram results. For each of the RBC parameters, viz. haemoglobin, mean corpuscular volume (MCV), mean corpuscular haemoglobin (MCH), mean corpuscular haemoglobin concentration (MCHC), RBC count and red-cell distribution width (RDW), mean values with ± 2 standard deviations were recorded. Two groups were made for the subsequent analysis, viz. group A with normal Hb HPLC and group B with β-hemoglobinopathy carrier by Hb HPLC.

Statistical analysis

The data was collected, compiled and analysed using SPSS version 19.0 [IBM Inc., Armonk, New York, USA]. The qualitative variables were expressed in terms of percentages. The quantitative variables were both categorised and expressed in terms of percentages or in terms of mean and standard deviations. Difference between two proportions was analysed using Chi square or fisher exact test. All analysis was two-tailed and the significance level was set at 0.05. Stepwise discriminant analysis was applied to RBC indices to generate a formula to distinguish between thalassemia carriers and normal individuals.

Results

Demographic data

A total of 17,764 female and 3931 male individuals were screened. Mean age of the population was 22 years (age range: − 1 to 75 years). Most of the respondents in the study were antenatal mothers (n = 14,891; 68.6% of the total number screened). Out of the total of 21,695 screened individuals, 19,432 individuals had normal Hb HPLC and 2263 were β-thalassemia/hemoglobinopathy carriers (by HPLC) and comprised beta thalassemia trait (941/2263; 41.5%), HbE trait (1128/2263; 49.84%), HbS trait (100/2263; 4.41%), HbD trait (18/2263; 0.79%) and delta beta trait (18/2263; 0.79%). Other rare types constituted 58/2263 (2.56%) cases and included Hb Lepore and HPFH trait. HbE trait and beta thalassemia trait comprised 5.2% and 4.3% of the total individuals, respectively, making HbE trait the most common hemoglobinopathy prevalent in this region. Combining the above two most common traits, the average thalassemia carrier rate in the screened population was around 10%. Carrier rate of all hemoglobinopathies in females was 9.2% while in males, it was 15.9%. The major districts covered by the camps were those in the vicinity of the hospital and included Kolkata (11,017/21,695—50.8%) and South-24-Parganas (9160/21,695—42.2%). The other districts, viz. North-24-Parganas, Coochbehar, Birbhum, Bankura, Hooghly and Darjeeling to name a few, comprised the remaining 7% of the total population. There was no significant difference in the prevalence of carrier state in the two major religious communities (Hindus and Muslims) in the region, and they comprised 47.68% (n = 1079/2263) and 52.01% (n = 1177/2263) of the total carrier population, respectively (Tables 1 and 2).

Table 1 Demographic details of the study population
Table 2 Hb and RBC indices of the study population

Analysis of haemogram parameters

The means of all the haemogram parameters were compared between the normal individuals and thalassemia carriers. The difference of means was found to be statistically significant for all parameters (p < 0.001) (Table 3). As expected, the mean haemoglobin, MCV, MCH, MCHC and haematocrit were higher in normal individuals while the RDW and RBC counts were higher in the carriers (Table 2).

Table 3 Analysis of haemogram parameters in the two groups

Development of the novel index

After the analysis of haemogram parameters, stepwise discriminant analysis was applied to the RBC indices. The outcome was then expressed as an equation:

$$ \mathrm{Outcome}=\mathrm{RBCx}6.59+\mathrm{MCHx}0.527-\left(\mathrm{HCTx}0.782+\mathrm{MCHCx}0.395+\mathrm{RDWx}0.02+1.365\right) $$

An integer value more than 0.5 meant the probability of the individual being a thalassemia carrier while value of 0.5 or less implied a normal individual. The formula was validated on a validation dataset of 1200 individuals, also collected through screening camps and found to have a sensitivity 74.51%, specificity of 90.82%, positive predictive value of 47.99%, negative predictive value of 96.91%, and an accuracy of 89.2%.

Discussion

There are some studies on the prevalence and distribution of thalassemia traits in India and the numbers depicted closely match those of ours. Sachdev et al. found a beta-thalassemia trait prevalence rate of 8.9% in northern India while Dolai et al. found a prevalence rate of 10.38% and 4.30% of β-thalassemia trait and HbE trait, respectively. [8, 9] β-thalassemia trait appears to be the most common hemoglobinopathy, while in our study, the prevalence of HbE trait was found to be slightly higher as compared to β-thalassemia trait. [8,9,10] This is in-sync with the increased regional prevalence of HbE in Eastern India. There were no community specific or regional variations in the prevalence of thalassemia carriers in the study population. Consanguinity is prevalent in parts of the country where this study was done and becomes an important issue while evaluating a couple for prenatal/pre-conceptional testing. Since prenatal diagnosis was not the primary focus in this study, we did not evaluate this aspect. Contrary to the principles of thalassemia genetics and inheritance patterns [11], a slightly higher carrier frequency was noted in males (15.9% versus 9.2% in females) and is due to the fact that they were husbands of the women detected to be thalassemia carriers at antenatal clinics.

In this study, we used discriminant analysis similar to Zaghloul et al. and Schoorl et al. and generated the formula utilizing most of the RBC indices. [12, 13] The formula has a very good specificity, negative predictive value and accuracy and can differentiate true carriers from normal individuals in the general population. It was generated from a large dataset, representing all age groups, sexes, haemoglobin levels and ethnicities. It is a more inclusive formula, taking into consideration a large spectrum of haemogram values and a wide variety of thalassemia carrier states. Previously, numerous attempts have been made to distinguish normal individuals, individuals with iron deficiency and thalassemia carriers based on RBC parameters. Of the available formulae/indices, Shine and Lal et al. provided an index which utilized MCV and MCH to discriminate carriers from normal individuals. Their index gave a sensitivity of 98.74% and a specificity or accuracy of 88%. [14] HbA2 ≥ 4% is the usual cut-off used for detection of β-thalassemia trait while values ranging between 3.6 and 3.9% are considered borderline and the individuals with these values are advised partner screening as part of pre-natal counselling. It is also known that the HbA2 levels are lower in individuals with concomitant iron deficiency and it is recommended that IDA be treated before performing HPLC. [15] However, Khera et al. in 2015 reported that, although HbA2 values could be low in iron deficiency anaemia, they were discriminatory enough to clearly identify true carriers. [16] Roth et al., in 2018, published an excellent review of all the available literature on this topic and generated a formula using a unique algorithm called support vector machine (SVM). Sensitivity using the SVM approach was > 99% with an accuracy of 88%. They used only MCV and MCH in the above-mentioned mathematical model to derive results. It is interesting to note that the study analysed data after excluding cases with haemoglobin levels less than 9 g/dl, as a lower Hb value was thought to interfere with accuracy of Hb HPLC. [17] Das et al. have recently published a decision tree to distinguish β-thalassemia trait and HbE from normal individuals using scores developed by artificial neural networks. They used all the RBC indices and achieved very good sensitivities and negative predictive values of 100% with false positive rates ranging from 20 to 40% in validation cohorts. [18] We also used all the RBC indices and wide range of haemoglobin levels (7 g/dl to 15 g/dl) to obtain satisfactory results. Our dataset also included age groups in the range of 1–75 years and is therefore more inclusive and considers the age variations of RBC indices.

This study has few limitations. A major one is the lack of iron profile data of the individuals screened. This precluded the stratification of the patients into iron deficient and iron replete groups to allow a more meaningful evaluation of RBC indices and generation of the formula. Another limitation is the skewness of the respondents towards predominantly antenatal mothers and the apparent increase in the prevalence of male carriers due to reasons discussed above. In addition, the stepwise discriminant analysis did not give results comparable to methods such as artificial neural networks used by other authors to achieve the same goal.

Conclusion

Having knowledge of the demographics of thalassemia carrier state in the regions where the control projects are running is of utmost importance and help in identifying endemic zones, changes in prevalence, common variants, effectiveness of the screening and control programme and above all help in formulating health-care, administrative and clinical practice guidelines. In addition, specialized tests like pre-natal screening for mutations can be designed in a much more cost-effective way, to specifically pick up the common mutations or haemoglobin variants in that region. For example, regional differences exist in India and Eastern India with a higher prevalence of HbE than the national average is a case in point. The only method of eradicating thalassemia is by proper screening and genetic counselling of couples. The future of screening for thalassemia would be to develop a robust computer algorithm (using machine learning and artificial intelligence) which would utilize RBC parameters to distinguish thalassemia carriers from normal individuals with considerable accuracy. Such software could be incorporated within haematology auto-analysers, mobile phone and web-based applications etc. It would be much cheaper than Hb HPLC, and HPLC could then be used only for confirmation of carrier status in the flagged individuals.