Background

A growing body of literature has documented the disproportionate impact of the coronavirus disease 2019 (COVID-19) pandemic on underserved communities in the USA [1, 2]. In the first year of the pandemic, however, due to limitations in access to testing and the high frequency of asymptomatic infections, official reports underestimated actual infection rates with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus that causes COVID-19. As a result, public health officials were left with an incomplete picture of viral spread, associated risk factors, and potential disparities [3, 4]. To address these limitations, seroepidemiologic prevalence studies (serosurveys), which measure the presence of anti-SARS-CoV-2 antibodies as a marker of prior infection, are an important public health tool to estimate incidence and guide public health responses [5, 6].

In the USA, the Commonwealth of Massachusetts (MA), fueled by a super-spreader event of SARS-CoV-2, became one of the early epicenters of the pandemic [7]. Early indications were that Black, Hispanic, and other vulnerable communities were being disproportionately hospitalized, but robust data on the extent to which this was happening and whether it was related to higher infection rates, disparities in testing, social vulnerability, or other unknown factors was missing [8,9,10,11]. Holyoke is a post-industrial, majority Hispanic/Latino/Latina city in Massachusetts (53.9%) with high levels of socio-economic disadvantage [12]. Based on 2018 census data, the poverty rate and median household income were 29.7% and $40,656, respectively, compared to 10.8% and $79,835 for MA [13]. The city was disproportionately affected by the first surge of COVID-19 as evidenced by a high initial case count relative to the rest of MA [14, 15]. To fill the gap in our knowledge regarding risk factors for and disparities in SARS-CoV-2 infection and inform local public health responses, we conducted a serosurvey of SARS-CoV-2 antibodies in Holyoke, MA, between November 2020 and January 2021, shortly prior to the second wave of the pandemic in Massachusetts, which at the time was the largest and second deadliest wave in terms of incident cases and deaths, respectively.

Methods

Study Design and Sampling

We conducted a representative, cross-sectional household-based SARS-CoV-2 serosurvey in Holyoke, MA. We obtained an official list of the 17,828 addresses in the city’s 11 census tracts from city records. We identified and removed listings corresponding to congregate living facilities, vacant buildings, duplicate entries, and empty listings. The remaining addresses were considered eligible for sampling. We then randomly sampled 2000 addresses from this list expecting a 25% response rate (Supplementary material) [16].

Participant Recruitment

All study protocols were implemented using Spanish and English materials by bilingual study staff. Participants were enrolled from November 5, 2020, to December 31, 2020. The study team mailed an invitation letter to sampled addresses that contained a QR code and unique ID for an online bilingual survey hosted on REDCap (Research Electronic Data Capture) at Massachusetts General Hospital [17]. Among selected addresses, all individual household members aged ≥ 6 months and residing at the address were eligible. A household was defined as a group of persons who slept under the same roof most nights. Households were given a period of approximately 5 days to respond by either taking the online survey or contacting the study team. Households could opt out of future communication by calling a study phone number. Following that period, data collectors made reminder follow-up calls, where participants had the option of completing the survey over the phone. They also conducted home visits to follow up with households that lacked a telephone number, were unresponsive to calls, or requested a visit from the team. To raise awareness about the study and increase community engagement, the study team distributed fliers and made announcements via Facebook and local media in both English and Spanish.

Following the first month of recruitment, we mailed a second wave of invitation letters to the originally sampled households, excluding households that had either already completed their surveys or opted out of the study. Data collectors made follow-up phone calls to households with incomplete surveys and those who had not mailed back their samples.

Data Collection

The survey tool consisted of an eligibility and informed consent form, one household-level survey, and individual adult and child surveys for each consenting adult and child household member. Surveys included questions regarding sociodemographic characteristics, occupation and employment history, clinical history, symptoms, COVID-19 testing, and exposure history. Upon completion of these surveys, blood test kits were either mailed to participant addresses or physically provided during home visits. We provided a $25 gift card to each household in which at least one person completed the entire study.

Sample Collection, Transportation, and Laboratory Testing

Test kits contained supplies and instructions (in English and Spanish) for the self-collection of dried blood spot (DBS) samples. Individuals were instructed to perform a pinprick on the finger using a lancet and apply it to Whatman® Protein Saver 903 filter paper collection cards (https://www.cytivalifesciences.com). Once obtained, samples were return-mailed using a pre-addressed envelope to Massachusetts General Hospital (Boston, MA, USA) where they were stored at − 20°C with desiccant until tested. DBS sampling has been validated for use in antibody testing of SARS-CoV-2 and other pathogens [18,19,20].

DBS were eluted and tested for the presence of SARS-CoV-2 IgG and IgM receptor-binding domain of the spike protein of SARS-CoV-2 using a quantitative ELISA previously developed and validated using reverse transcriptase polymerase chain reaction (RT-PCR)-positive mild and severe SARS-CoV-2 infections and pre-pandemic samples at Massachusetts General Hospital [20,21,22]. The test specificity and sensitivity estimates were respectively 99.5% (95% CI 99.0–99.8) and 70.6% (95% CI 61.2–79.3%) for IgG antibodies and were calculated using samples from the Boston area (Supplementary Fig. 1). Further details are provided in Sect. 2 of the  Supplementary Materials (Supplementary Tables 2 and 3).

Statistical Analysis

Our main SARS-CoV-2 seroprevalence estimate was the proportion of the population that had IgG antibodies detected as this aligns with prior studies [23]. We also calculated seroprevalence estimates with corresponding 95% credible intervals (CI) for the following combinations of IgG and IgM antibody positivity: any IgG, any IgM, IgG only, IgM only, and IgG or IgM.

We used a modified version of the Bayesian procedure proposed by Stringhini et al. [24] to calculate prevalence estimates with corresponding 95% CI’s. The procedure accounted for uncertainty in the antibody test sensitivity and specificity. Within the procedure, a random intercept was used to account for clustering by household, and weighting was applied to ensure the seroprevalence estimates were representative of the population of Holyoke. Details on the procedure are provided in Sect. 1 of the Supplementary Materials.

A raking procedure was used to construct weights based on the distribution of age, race and ethnicity, gender, and census tract in Holyoke from the 2019 American Community Survey (ACS) (Table 1). Notably, gender is reported as the percent of “Female persons” in Holyoke in the ACS survey results, such that we only have two categories for gender: Female and non-Female, which includes male, transgender, and non-binary persons. For the purposes of constructing weights, we collapsed sparse categories of race and ethnicity into a “Grouped category.” The census tracts were also collapsed into “high vulnerability” and “low vulnerability” groups based on the Centers for Disease Control and Prevention (CDC) social vulnerability index (SVI) [25]. The CDC SVI uses 16 US census indicators characterizing four domains (socioeconomic status, household characteristics, racial and ethnic minority status, and housing type/transportation) to generate domain-specific and overall social vulnerability rankings of census tracts relative to each other. These rankings can then be used by local officials to identify communities that may need support during emergencies such as pandemics. For this study, “high vulnerability” was defined as having an SVI greater than the 75th percentile of census tracts in Massachusetts—9 of the 11 census tracts were considered highly vulnerable.

Table 1 Unweighted demographic characteristics of survey participants compared with 2019 American Community Survey estimates for the city

We compared seroprevalence estimates and 95% CI’s across subgroups. To investigate disparities in known risk factors for SARS-CoV-2 infection, we reported sociodemographic and clinical factors stratified by race and ethnicity. Analyses were conducted in R V4.0.0 using survey and rstan packages [26, 27].

Patient Consent Statement

Written informed consent was obtained from adults 18 years or older. Informed parental consent and assent was obtained for children ages 14–17. Parental consent was obtained for children under the age of 14, with documented verbal assent by the caregiver sought for minors between the ages of 7 and 13.

Ethics Approval Statement

This protocol was reviewed by the Mass General Brigham Human Research Committee Institutional Review Board (Protocol ID: 2020P002560, November 2nd, 2020).

Results

Study Population

From the final list of 17,204 addresses, we randomly sampled 2000 and mailed invitation letters and followed up recruitment as described above. Two hundred eighty households (14%) with 472 individuals agreed to participate and completed household and individual-level questionnaires. Figure 1 demonstrates a complete flow diagram of participant progress through study phases. Supplementary Fig. 3 demonstrates timing of survey completion and dried blood spot sample collection. Mean household size was 2.28 individuals (Standard deviation [SD]: 1.2). Within participating households, an average mean of 1.77 household members (SD: 1.03) consented to participate. Of these, 197 households (70.4%) and 330 individuals (69.9%) completed and returned a DBS sample for analysis. A total of 328 samples from 195 households were analyzed. The mean household size for households with individuals that provided a DBS sample and were analyzed was 2.09 individuals (SD: 1.13), and the mean number of individuals per household that consented to participate was 1.71 (SD: 0.96).

Fig. 1
figure 1

Study participant flow diagram

Among the 280 households that completed the survey, 37 reported hospitalization of a household member since February 2020 (13.3%, 1 missing response). Only one hospitalization was reported to be a result of COVID-19. There were 2 reported deaths since February 2020 among 277 households (3 missing responses). Two hundred forty-eight of 471 (52.7%, 1 missing) respondents reported being tested for COVID-19 at some point prior to the survey. Twenty-five of 471 respondents (5.31%, 1 missing response) described being diagnosed by a healthcare provider with pneumonia or other respiratory infection (may include COVID-19 diagnosis) since February 2020. Twenty of the 468 respondents (4.3%, 4 missing response) reported testing positive for COVID-19 at least once.

Demographic characteristics including gender, age group, and race and ethnicity breakdown for the entire study population are listed in Table 1. Individuals from younger age groups (0–19 and 20–44 years of age) and individuals identifying as Hispanic/Latino/Latina were underrepresented in the study population relative to the population of Holyoke. We addressed this with a second round of invitation letters and by weighting the Bayesian model.

Citywide Seroprevalence of SARS-CoV-2 Antibodies

Of 328 individual samples tested, 27 individuals from 20 households were positive for SARS-CoV-2 IgG or IgM antibodies; after adjusting for clustering, differential response rates, and imperfect test sensitivity, this corresponded to a citywide seroprevalence estimate of 13.6% (95%CI 6.7–23.7). Twenty-five individuals were positive for SARS-CoV-2 IgG antibodies, corresponding to an adjusted citywide seroprevalence estimate of 13.1% (6.9–22.3%). Seroprevalence estimates for individuals with IgG only, IgM only, any IgG, any IgM, and IgG or IgM are listed in Table 2. Supplementary Fig. 2 provides seroprevalence estimates under various sensitivity estimates ranging from 60.2 to 78.8%, corresponding to overall IgG seroprevalence estimates ranging from 15.3 to 11.7%, respectively.

Table 2 Seroprevalence by antibody positivity profile

Seroprevalence of SARS-CoV-2 Antibodies by Sociodemographic Characteristics and Exposure History

Seroprevalence estimates were calculated for the following subgroups: age groups, gender, race and ethnicity, social vulnerability index, primary language spoken in the household, and other sociodemographic characteristics (Table 3). Seropositivity varied across multiple subgroups, though credible intervals tended to be wide. The seroprevalence estimate was highest among individuals 20–44 years old (17.6%; 95% CI 7.5–32.4) and decreased with age for ages > 44. The seroprevalence estimate was higher at 16.1% (95% CI 6.2–31.8) among individuals identifying as Hispanic/Latino/Latina compared to a seroprevalence estimate of 9.4% (95% CI 4.6–16.4) among individuals identifying as non-Hispanic white, corresponding to a risk difference of 6.6% (95% CI − 4.3 to 21.8). The seroprevalence estimate among Spanish-speaking households was 21.9% (95% CI 8.3–43.9) compared to 10.2% (95% CI 5.2–18.0) among English-speaking households, with a risk difference of 11.6% (95% CI − 2.3 to 32.2). Individuals living in high vulnerability areas (14.4%; 95% CI 7.1–25.5) had a higher seroprevalence estimate than individuals living in low vulnerability areas (8.2%; 95% CI 3.1–16.9), corresponding to a risk difference of 6.0% (95% CI − 3.6 to 17.5). The seroprevalence among individuals reporting an exposure to a household member was 72.4% (95% CI 32.6–99.7), while only 21.3% (95% CI 8.0–41.5%) among those reporting an exposure to a non-household member and 5.8% (95% CI 1.9–13.4%) among those reporting no known exposure to someone with COVID-19.

Table 3 Seroprevalence1 by sociodemographic characteristics

Sociodemographic, Symptom Testing, and Exposure History by Race and Ethnicity

Compared to non-Hispanic white individuals, individuals identifying as Hispanic/Latino/Latina were younger and had attained lower education levels (Supplementary Table 1). They had higher rates of unemployment at the start of the pandemic and were more likely to have their salary impacted by COVID-19. They were more likely to work at a place that offered no benefits such as paid sick leave or work-from-home and were more likely to use the bus as a means of transportation. Housing conditions were also different, with Hispanic/Latino/Latina individuals more likely to live in apartments or condominiums, rent rather than own their homes, and report a higher density of individuals living in the household (Supplementary Table 1).

Discussion

We estimated the citywide prevalence of SARS-CoV-2 IgG antibodies to be 13.1% in Holyoke at the end of a second surge of the pandemic in the Commonwealth of Massachusetts between November 2020 and January 2021, prior to widespread vaccination in this community, and shortly before the second wave of the pandemic, which at the time was the largest and second deadliest wave in terms of incident cases and deaths. Several groups demonstrated higher risk of prior infection than their counterparts in the city, based on the presence of SARS-CoV-2 antibodies.

Individuals identifying as Hispanic/Latino/Latina had a higher seroprevalence estimate compared to those identifying as non-Hispanic white, suggesting that these members of the community were at high risk of SARS-CoV-2 infection. Although credible intervals around effect estimates were wide, this finding is consistent with prior studies documenting racial and ethnic disparities in SARS-CoV-2 metrics affecting minoritized communities nationally and in Massachusetts, including disparities in ability to follow non-pharmacologic interventions, testing, infections, hospitalizations, and deaths [2, 9, 28, 29]. Additionally, a nationwide SARS-CoV-2 serosurvey of blood donations around this time period demonstrated that individuals identifying as Hispanic had the highest seroprevalence of SARS-CoV-2 antibodies of all racial and ethnic groups [30]. There are multiple potential mediators of these disparities, lending support to a true difference in risk. In Holyoke, most Hispanic communities live in census tracts characterized by high socioeconomic deprivation [25]. In our study population, compared to non-Hispanic white individuals, individuals identifying as Hispanic reported lower education levels, higher unemployment rates, had lower access to benefits such as paid sick leave or work-from-home, and were more likely to live in high-density housing. Public health responses to COVID-19 and future pandemics should be designed to directly mitigate these risk factors. For example, the provision of financial and social supports would aid individuals in adhering to public health efforts that mitigate disease spread.

Individuals from predominantly Spanish-speaking households, almost all of whom identified as Hispanic/Latino/Latina, had a seroprevalence estimate higher than individuals from English-speaking households. Our experience suggests that the availability of Spanish-language-concordant public health outreach was limited in Massachusetts during the early phases of the pandemic. The absence of linguistically concordant public health interventions may directly impact an individual’s ability to understand and apply preventive guidance and thus mediate SARS-CoV-2 infection risk [31, 32]. Further studies are needed to identify the key mediators of the relationship between race and ethnicity, social vulnerability, language, and risk of SARS-CoV-2 infection, which can then inform public health interventions tailored to these populations.

The overall seroprevalence measured in this study, when interpreted in the context of other seroprevalence studies and routine case-surveillance data, provides insight into the dynamics of the pandemic in the region. In April 2020, a serosurvey using convenience sampling of asymptomatic individuals in the predominantly Hispanic community of Chelsea, MA, demonstrated a seroprevalence of 31.5% [33]. This serosurvey was limited by non-representative convenience sampling that likely resulted in a biased estimate, and the use of a rapid lateral flow immunoassay. Later, between July and August 2020, a university-related population and their household members in Massachusetts demonstrated a lower seroprevalence of 4–5.3% [34]. Our findings in this study conducted several months later are consistent with the increasing number of reported cases during the second surge of COVID-19 in MA. However, the seroprevalence measured in this study was not as high as might be expected approximately 10 months into the pandemic, especially since by various metrics and media reports, Holyoke was one of the most COVID-19-impacted communities in the commonwealth early on in the pandemic [14, 15]. This may be in part due to the impact of decaying antibody titers, the kinetics of which vary depending on the population [35].

We also identified an important testing gap. Overall, the prevalence of any anti-SARS-CoV-2 antibodies (measured by IgG or IgM) in this survey corresponds to a cumulative case count of 5593 compared to the city’s actual case count of 3963 on January 28th, 2021 based on testing by RT-PCR [36]. By this estimate, nearly one third of all SARS-CoV-2 infections in Holyoke were undetected by existing surveillance and screening mechanisms. This level of underascertainment is lower than that demonstrated in other serosurveys throughout the USA, a finding that may be explained by the high availability of testing throughout Holyoke, where two public testing sites were established [37, 38]. However, this discrepancy highlights an important testing gap that should be addressed as we continue to respond to ongoing outbreaks of SARS-CoV-2. Missed infections, especially in a community that is already socially vulnerable, can result in delays in testing and appropriate care, and individuals being overlooked when public health resources are distributed.

In our study, seroprevalence was higher among individuals reporting a COVID-19 exposure that was a household member compared to a COVID-19 exposure to a non-household member. This finding corroborates the importance of intra-household exposures in the control of COVID-19 [39]. Given the role of intra-household transmission, public health interventions and resources should be targeted to preventing transmission in the household, such as providing isolation and quarantine sites outside the home, PPE for individuals taking care of sick family members, timely and sequential testing for exposed household members, and guidance on how to safely distance within the home.

This study has several limitations including the small sample size leading to estimates with wide 95% CI’s, limiting multivariable analyses. Second, our analysis does not account for waning antibody levels, which decay over time meaning that our estimation of prior infection may not include points in time early in the pandemic [20]. Third, because we did not interview individuals that declined to participate or did not respond to survey invitation, it is possible that non-response bias may be affecting our findings. For example, it is possible that response rates could be different between individuals that had already previously tested positive for COVID-19 and those that had not. To address non-response bias, we proactively followed up with households that did not respond to the study via telephone calls and home visits. Fourth, this study was conducted using a single assay. Studies have shown that variation in the sensitivity and specificity of the serologic test used in a serosurvey can affect seroprevalence estimates [40]. To limit this effect, we used a test that was previously validated in Massachusetts and conducted a sensitivity analysis using other plausible test sensitivity and specificity values.

Conclusion

In conclusion, in Holyoke, Massachusetts, a post-industrial, majority Hispanic/Latino/Latina city with high levels of socio-economic disadvantage in MA, USA, seroprevalence of SARS-CoV-2 IgG antibodies was 13.1% at the end of the first year of the pandemic. The risk of infection was higher among the Hispanic/Latino/Latina community, Spanish-speaking households, and communities with high social vulnerability. This knowledge contributed to the city’s targeted public health responses to ensure that high-risk groups would be equitably served. In Holyoke, MA—and in other areas of the USA—disparities in SARS-CoV-2 risk must be addressed through proactive public health interventions that respond to disparities in socially vulnerable communities. These efforts can be supported by rapid, methodologically robust seroprevalence studies undertaken by local boards of health.