Abstract
Background
While precision medicine algorithms can be used to improve health outcomes, concerns have been raised about racial equity and unintentional harm from encoded biases. In this study, we evaluated the fairness of using common individual- and community-level proxies of pediatric socioeconomic status (SES) such as insurance status and community deprivation index often utilized in precision medicine algorithms.
Methods
Using 2012–2021 vital records obtained from the Ohio Department of Health, we geocoded and matched each residential birth address to a census tract to obtain community deprivation index. We then conducted sensitivity and specificity analyses to determine the degree of match between deprivation index, insurance status, and birthing parent education level for all, Black, and White children to assess if there were differences based on race.
Results
We found that community deprivation index and insurance status fail to accurately represent individual SES, either alone or in combination. We found that deprivation index had a sensitivity of 61.2% and specificity of 74.1%, while insurance status had a higher sensitivity of 91.6% but lower specificity of 60.1%. Furthermore, these inconsistencies were race-based across all proxies evaluated, with greater sensitivities for Black children but greater specificities for White children.
Conclusion
This may explain some of the racial disparities present in precision medicine algorithms that utilize SES proxies. Future studies should examine how to mitigate the biases introduced by using SES proxies, potentially by incorporating additional data on housing conditions.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Background
Precision medicine, also called personalized medicine, involves the collection, integration, and analysis of multiple sources of health data to develop individualized insights about health and disease [1]. It has the potential to improve health outcomes by enhancing clinical decisions and treatment plans [1]. Ideally, precision medicine tools can avoid the biases inherent with the current practice of using heuristics and prior experiences to make medical decisions. In reality, however, these algorithms can recapitulate longstanding health disparities and unintentionally perpetuate unequal distribution of resources [1, 2]. There are growing concerns about racial equity in precision medicine algorithms because race is often used as a predictor even though inequities can also arise due to historical and systemic disparities between racial groups, including access to and utilization of healthcare [2, 3]. Misclassification and misprediction of disease among marginalized racial groups could exacerbate racial disparities by programming further inequities in healthcare access [1, 3]. Biases can be reflected in various stages of algorithm development, from collecting data to designing and implementing algorithms in clinical practice. Machine learning algorithms that are trained on datasets lacking racial and ethnic diversity can learn to associate certain characteristics with specific racial or ethnic groups, leading to biased predictions and recommendations. This can result in inaccurate assessments and suboptimal treatment for patients from underrepresented racial and ethnic backgrounds.
Using community-level socioeconomic status as a proxy for individual-level socioeconomic is common when conducting research using only electronic health records because individual-level socioeconomic data may not be readily available or easily obtained from electronic health records alone. However, in more detailed cohort or registry studies, additional data on individual-level socioeconomic factors like education, income, and wealth can be collected and used in combination with community-level proxies to better understand the relationship between socioeconomic position and health outcomes. The choice of defining, operationalizing, and using individual- and area-level measures ultimately depends on the research question, the available data, and the level of detail needed to address the research objectives. Regardless, systemic problems with data systems and institutional racism mean that race, income, wealth, and education are highly correlated [4, 5].
Indeed, prediction algorithms often incorporate race as a proxy for otherwise unmeasured sociodemographic and environmental conditions such as socioeconomic status (SES) and air pollution that mediate most of the racial disparities in asthma [6] and a range of child health outcomes [7]. Birthing parent education or family income are ideal measures of individual SES, but many studies are retrospective and can only access information collected at the time of the study or available in existing electronic health records. As a result, researchers often use insurance type or community-level measures of SES in precision medicine algorithms. This may cause race-based differences in the accuracy of pediatric precision medicine tools. In this study, we evaluate the accuracy and fairness of using individual- and community-level proxies of SES such as insurance status and community deprivation index.
Methods
For this study, we obtained vital records from the Ohio Department of Health for children born in Ohio from 2012 to 2021, detailing information such as birthing parent age, child race as reported by birthing parent, insurance type, birthing parent education, and birthing parent’s residential address. This study was approved with a full board review by the Ohio Department of Health Human Subjects Institutional Review Board (IRB 00002180, protocol number 2023–10).
Because our gold standard of birthing parent education level used adult educational attainment, we excluded observations if the birthing parent age was less than 18, if insurance status or education level or deprivation index were missing, or if insurance was classified as “Other.” Insurance status was dichotomized into “private” and “non-private,” where “non-private” included those individuals on Medicaid or self-pay insurance. To perform pairwise comparisons between the different measures of SES, we dichotomized birthing parent education level into “low” (less than 12 years) and “high” (12 years or greater). We used an existing categorical measure of birth parent education that was defined as “Less than 12 years (did not complete HS),” “High school graduate or GED,” and “More than high school.” Similarly, we used a racial category designed for tabulation in the Ohio Department of Health’s Public Information Warehouse that consisted of five groups: “White,” “Black,” “Native American,” “Asian,” and “Pacific Islander/Hawaiian.” For binary comparisons, we considered “Black” and “White” racial groups in order to capture the social context of membership in a historically marginalized group, specifically with respect to housing and employment opportunities that uniquely define a person’s multifaceted socioeconomic status.
Each residential birth address was geocoded to a census tract and linked to a community deprivation index value to quantify community-level SES. Six different census tract-level measures derived from the American Community Survey (fraction of the population with income in the past 12 months below the poverty level; fraction of the population ages 25 and older who have attained an education level of at least high school graduation or GED; fraction of the population with no health insurance coverage; fraction of households receiving public assistance income, food stamps, or SNAP in the past 12 months; and fraction of houses that are vacant) were reduced via principal component analysis to a single continuous variable that ranges between 0 and 1. High deprivation index was defined using a threshold equal to the 75th percentile of 2018 tract-level deprivation indices as weighted by their populations under the age of 18 (0.43) [8].
We created tables and plotted individual- and community-level SES proxies alongside birth parent education level, both overall and by race to visually detect patterns. We compared the binary classification of SES proxies (community deprivation index and insurance status) against birth parent education level to quantify their sensitivity and specificity with respect to identifying a child with a lower socioeconomic status. Sensitivity is the fraction of children correctly identified with lower SES among all children with lower SES. Specificity is the fraction of children correctly identified with high SES among all children with high SES. Sensitivity and specificity were calculated and plotted per racial group and overall to detect differences. We also compared community material deprivation between birthing parent education level when defined as a three-level category (“Less than 12 years,” “High school graduate or GED,” “More than high school”). All data analyses were conducted in R, version 4.2.3.
Results
Between 2012 and 2021, 1,257,391 children (93.8% of all birth records) met study inclusion criteria (Table 1). Newborns were primarily Non-Hispanic (94.4%) and White (76.0%), but 17.7% of newborns were Black and 6.3% were some other race. Most birthing parents had the highest level of education (60.2%, more than high school), but 13.0% had less than 12 years of education. Insurance status was roughly equal between private (51.1%) and non-private (48.9%) payors. Almost a third of all newborns (30.8%) were living in a community with high levels of material deprivation.
We first evaluated the relationships between birthing parent education level, the material community deprivation index, and non-private insurance status (Table 2). When defined as a three-level category (“Less than 12 years,” “High school graduate or GED,” “More than high school”), increasing birthing parent education level was associated with decreasing median community deprivation index among all births (0.47, 0.40, and 0.32, respectively). Although this trend is present within the subgroups of Black and White children, the median community deprivation index among Black children born to parents with more than a high school level of education (0.44) was equal to the median community deprivation index among White children born to parents with less than 12 years of education (0.44). Similarly, the fraction of births with non-private insurance decreases as birthing parent education level increases (Table 2); however, even among those with more than a high school-level education, a much larger percentage of Black children, compared to White children (62.5% versus 21.5%), had non-private insurance. Figure 1 displays the distribution of the community material deprivation index across birth parent education level when stratified by insurance status. Although community deprivation index decreases with increasing birthing parent education level and with private insurance status, Black children tend to be born into families living in communities with higher levels of material deprivation regardless of birthing parent education level or insurance status.
When evaluating SES proxies by racial group, they were consistently more sensitive among Black children and more specific among White children (Fig. 2). For example, the community deprivation index was more sensitive among Black children (81.1%) compared to White children (53.7%) but was less specific (40.4% versus 81.0%). Likewise, insurance status was more sensitive among Black children (94.0%) compared to White children (90.4%) but was less specific (27.0% vs 67.0%). Overall, this means that using these proxies can better identify low-SES Black children compared to White children but come with a cost of not being able to identify high-SES Black children compared to White children.
Discussion
In this study, we found that community deprivation index and insurance status, alone or in combination, fail to accurately represent individual-level SES, as measured by birthing parent education level, and that inconsistencies differ by race. Specifically, these SES proxies tend to overestimate the amount of socioeconomic deprivation experienced by Black children as compared to White children. This means epidemiologic studies that utilize these SES proxies could be introducing exposure assessment error that is differential by race.
Because asthma is a leading cause of pediatric health disparities [9], one area of concern is racial equity in pediatric asthma prediction algorithms [10]. The goal of precision medicine in pediatric asthma is to identify children at risk earlier so that they can receive necessary preventative care. As this is contingent on accurate prediction, studies have begun investigating the reliable prediction of asthma risk to understand biological mechanisms, inform early interventions, and improve long-term respiratory outcomes [4, 11, 12]. Among the available prediction tools for childhood asthma, we previously found that racial disparities existed in three of four criteria for both the Asthma Predictive Index (API) and Pediatric Asthma Risk Score (PARS) [10]. Race is often incorporated as an approximation of not only environmental factors such as air pollution but also unmeasured sociodemographic factors such as poverty and income, which are considered as proxies for SES and therefore risk factors for the development of pediatric asthma [6, 13]. Clinically, the disparities in inaccuracies we found in this study mean that using proxies for SES will overpredict morbidity among Black children, as Black children from high SES backgrounds will more often be misclassified as low SES and therefore at higher risk. Furthermore, the misclassification of Black children from high SES backgrounds and the increased rate of false positives mean that future health outcomes for Black children who are erroneously believed to be from low SES backgrounds may appear to be less related to SES. In health policy, underestimating the relationship between SES and health outcomes among Black children may lead to redistribution of resources away from Black children truly in need.
One limitation of our study is the restriction to records from the State of Ohio, which means that our results may not be transportable to other settings or states; however, our population-level coverage of all births in Ohio means that selection biases related to our target population of children in Ohio were likely minimal. Future research could examine how to identify and mitigate bias caused by using SES proxies in epidemiologic studies. Other sources of extant data, including information about property value and tenure at an address-specific level, could be used to supplement SES information for clinical and research use [13, 14]. Additionally, there are numerous other individual SES indicators that should be tested in the future. Regardless, clinicians and researchers should recognize the challenges that arise with measuring SES using community- and individual-level proxies and interpret findings accordingly. In conclusion, considering fairness in precision medicine should be a routine part of tool development and validation in order for healthcare professionals to be able to accurately screen at-risk children and provide more personalized care, thereby helping reduce disparities in health outcomes.
Data Availability
Vital records are housed at the Ohio Department of Health and are not available to non-study personnel.
Code Availability
The material deprivation index is openly available at https://geomarker.io/dep_index. Code for comparing racial fairness of socioeconomic measures is available upon request.
References
Ferryman K, Pitcan M. Fairness in precision medicine. Data Soc Res Inst. 2018;1:1–54.
Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447–53. https://doi.org/10.1126/science.aax2342.
Vyas DA, Eisenstein LG, Jones DS. Hidden in plain sight - reconsidering the use of race correction in clinical algorithms. N Engl J Med. 2020;383(9):874–82. https://doi.org/10.1056/NEJMms2004740.
Braveman PA, Arkin E, Proctor D, Kauh T, Holm N. Systemic and structural racism: definitions, examples, health damages, and approaches to dismantling: study examines definitions, examples, health damages, and dismantling systemic and structural racism. Health Aff. 2022;41(2):171–8.
Cheng TL, Goodman E. Race, ethnicity, and socioeconomic status in research on child health. Pediatrics. 2015;135(1):e225. https://doi.org/10.1542/peds.2014-3109.
Biagini Myers JM, Schauberger E, He H, Martin LJ, Kroner J, Hill GM, et al. A Pediatric Asthma Risk Score to better predict asthma development in young children. J Allerg Clin Immunol. 2019;143(5):1803-1810.e2. https://doi.org/10.1016/j.jaci.2018.09.037.
Biagini JM, Martin LJ, He H, Bacharier LB, Gebretsadik T, Hartert TV, Jackson DJ, Kim H, Miller RL, Rivera-Spoljaric K, Schauberger EM. Performance of the Pediatric Asthma Risk Score across diverse populations. NEJM Evidence. 2023;EVIDoa2300026.
Brokamp C, Beck AF, Goyal NK, Ryan P, Greenberg JM, Hall ES. Material community deprivation and hospital utilization during the first year of life: an urban population–based cohort study. Ann Epidemiol. 2019;30:37–43. https://doi.org/10.1016/j.annepidem.2018.11.008.
Urquhart A, Clarke P. US racial/ethnic disparities in childhood asthma emergent health care use: National Health Interview Survey, 2013–2015. J Asthma. 2020;57(5):510–20. https://doi.org/10.1080/02770903.2019.1590588.
Pennington J, Rasnick E, Martin LJ, Biagini JM, Mersha TB, et al. Racial fairness in precision medicine: pediatric asthma prediction algorithms. Am J Health Promot. 2023;37(2):239–42. https://doi.org/10.1177/08901171221121639.
Savenije OEM, Kerkhof M, Koppelman GH, Postma DS. Predicting who will have asthma at school age among preschool children. J Allerg Clin Immunol. 2012;130(2):325–31. https://doi.org/10.1016/j.jaci.2012.05.007.
Castro-Rodriguez JA. The Asthma Predictive Index: a very useful tool for predicting asthma in young children. J Allerg Clin Immunol. 2010;126(2):212–6. https://doi.org/10.1016/j.jaci.2010.06.032.
Nkoy FL, Stone BL, Knighton AJ, Fassl BA, Johnson JM, Moloney CG, Savitz LA. Neighborhood deprivation and childhood asthma outcomes, accounting for insurance coverage. Hosp Pediatr. 2018;8(2):59–67. https://doi.org/10.1542/hpeds.2017-0032.
Cusick MM, Sholle ET, Davila MA, Kabariti J, Cole CL, Campion TR Jr. A method to improve availability and quality of patient race data in an electronic health record system. Appl Clin Inform. 2020;11(5):785–91. https://doi.org/10.1055/s-0040-1718756.
Acknowledgements
The data used in this study were obtained from the Ohio Department of Health (ODH) Bureau of Vital Statistics. Use of these data does not imply ODH agrees or disagrees with any presentations, analyses, interpretations, or conclusions.
Funding
This work was supported by the National Institutes of Health [award numbers 2UL1TR001425-05A1, 2T35HL113229-01].
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. The first draft of the manuscript was written by Amisha Saini, Harsimran Makkad, and Cole Brokamp. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics Approval
This study was approved with a full board review by the Ohio Department of Health Human Subjects Institutional Review Board (IRB 00002180, protocol number 2023–10).
Consent to Participate
This study used data previously collected on research participants. This study did not interact with, collect data from, or consent any study participants.
Consent for Publication
We consent to publishing this manuscript. It has not been published elsewhere and is not under consideration for publication elsewhere.
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Makkad, H., Saini, A., Manning, E.R. et al. Racial Fairness of Individual- and Community-Level Proxies of Socioeconomic Status Among Birthing Parent–Child Dyads. J. Racial and Ethnic Health Disparities (2024). https://doi.org/10.1007/s40615-024-02050-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40615-024-02050-9