Item Response Methodology
To inform the design of our unit self-response analysis, we investigate whether households with noncitizens, in particular, exhibit behavior consistent with citizenship question sensitivity by examining citizenship question nonresponse among households that returned the questionnaire and the consistency of answers with ARsFootnote 14 when the person being reported about (hereafter, person of interest) is an AR citizen versus an AR noncitizen. If only households containing noncitizens have concerns about the citizenship question, then we should see a higher incidence of problematic responses (skipping the question or providing an answer inconsistent with ARs) when respondents are asked about AR noncitizens, controlling for other relevant factors. This will help determine whether it is useful to compare all-citizen households with those potentially containing at least one noncitizen in our unit (household) self-response analysis.
Respondents could skip a question or provide an inconsistent response for other reasons, such as lack of knowledge regarding the person of interest’s characteristics or record linkage errors (the AR is for a different person) (see Tourangeau and Yan 2007). We control for these other reasons in several ways. First, we conduct the difference-in-differences analysis comparing a problematic response for the citizenship question with that of the age question for the same person of interest, separately for AR citizens and AR noncitizens. Problematic responses could occur for the same reasons for age and citizenship, with the exception that age responses are less likely to be related to citizenship question sensitivity. We classify age as being inconsistent in the survey and ARs if the values differ by more than one year.
Second, we control for other relevant factors that could explain differences in problematic responses to age and citizenship by estimating multivariate regressions with controls that proxy for such factors. Then, we conduct a Blinder-Oaxaca decomposition (Blinder 1973; Oaxaca 1973)Footnote 15 of the differences between AR citizens and AR noncitizens into differences between the groups’ observed characteristics (explained portion) and other unobserved factors (unexplained portion). The explained portion includes differences in incidence across AR citizens and noncitizens of factors such as linguistic isolation, which may be associated with both citizenship status and a problematic response (via ability to understand the question). We attribute the unexplained portion to citizenship question sensitivity.
Before conducting the Blinder-Oaxaca decomposition, we estimate regressions for age and citizenship item nonresponse and age and citizenship status disagreement between the 2017 ACS and contemporaneous ARs. The regressions are of the following form:
$$ {Y}_{G_j AGE}={\mathbf{X}}_{G_j}^{\prime }{\upbeta}_{G_j AGE}+{\upvarepsilon}_{G_j AGE}. $$
(1)
$$ {Y}_{G_j CITIZENSHIP}={\mathbf{X}}_{G_j}^{\prime }{\upbeta}_{G_j CITIZENSHIP}+{\upvarepsilon}_{G_j CITIZENSHIP}. $$
(2)
Person of interest j belongs to one of two groups G ∈ (N, C), where the N group (AR noncitizens) could be harmed by confidentiality breaches regarding a citizenship question or are otherwise sensitive to the question, while the C group (AR citizens) could not be. Eqs. (1) and (2) are estimated separately for the N and C groups. Y is the dependent variable for person j in group G, X is a vector of characteristics, β contains the slope parameters and intercept, and ε is a regression error term with a conditional mean of 0, given X.
In the item nonresponse regressions, Y is equal to 1 if there is no response for the question for person of interest j in group G, and 0 otherwise (even if the response was later edited or allocated). In the ACS–AR age disagreement regressions, Y is equal to 1 if the difference in age across sources is more than one year, and 0 otherwise. Persons who have age in AR data and reported age in the 2017 ACS are included in these regressions. For the ACS–AR citizenship disagreement regressions, Y is equal to 1 if the two sources indicate different citizenship statuses, and 0 if both sources agree. Persons who have AR citizenship and reported citizenship in the 2017 ACS are included in the citizenship disagreement regressions.
The X variables include person of interest j’s relationship to the reference person,Footnote 16 working in the last week, searching for a job in the last four weeks, race/ethnicity, and an indicator for better- or worse-quality person linkage;Footnote 17 reference person sex and educational attainment (less than high school, high school but less than bachelor’s degree, bachelor’s degree, and graduate degree); six household income categories; a household linguistic isolation indicator with three categories, including linguistically isolated households (no person 14 years or older speaks only English or reports speaking it “very well”), not linguistically isolated households (at least one person 14 years or older speaks another language at home, and at least one person 14 years or older speaks only English or reports speaking it “very well”), and only English (all persons 14 years and older speak only English at home); an indicator for self-response (equal to 1 for mail or Internet response, and 0 for in-person or telephone interview); share of households by block group with at least one noncitizen in the 2012–2016 five-year ACS; and share of households below the poverty level by block group in the 2012–2016 five-year ACS.
Relationship may proxy for the amount of knowledge the reference person has about the person of interest. If so, less item nonresponse and disagreement would be expected when respondents report about themselves than about others, especially nonrelatives.Footnote 18 Alternatively, respondents may feel they have less right to disclose sensitive information about others. Social desirability could also lead to discrepancies with administrative data, and it is likely to be more of a factor when respondents report about themselves
Linguistic isolation could be associated with misunderstandings from translation or interpretation, leading to item nonresponse and inconsistent reporting.Footnote 19 It could also proxy for how well the household is integrated into U.S. society. Households that are less well integrated may have less understanding about the survey, for example, leading to a less complete and accurate response. Reference person education and household income may also be associated with question comprehension. Reference person sex and person of interest race/ethnicity may be associated with different sensitivity to questions not specific to citizenship. Person of interest labor market activity could be associated with greater reference person knowledge about the person of interest’s citizenship status because the status may affect the person’s employment eligibility. Record linkage errors could cause inconsistent reporting because the AR and ACS persons would be different.
As mentioned, Tourangeau and Yan (2007) reported that studies have found less item nonresponse and inconsistent reporting about sensitive questions in self-responses (as opposed to interviewer-administered surveys), consistent with social desirability being a factor in interviews. McGovern (2004), however, reported item allocation rates for citizenship and other related questions that are twice as high in mail responses compared with telephone or personal interviews in the ACS.
Neighborhood shares of households below the poverty line or with noncitizens could be associated with different levels of openness on government surveys.
For the Blinder-Oaxaca decomposition, we create summary measures of problematic response to the age and citizenship questions. Each variable is set to 1 if the respondent does not provide a response to the question, the respondent’s answer is edited,Footnote 20 or the answer is inconsistent with ARs; and it is 0 if an answer is provided that is consistent with ARs. Cases in which ARs are missing are excluded. We set the problematic-response dependent variables \( {Y}_{G_j AGE} \) and \( {Y}_{G_i CITIZENSHIP} \) equal to 1 if the response regarding person of interest j in group G is problematic for the age and citizenship questions, respectively, and 0 otherwise.Footnote 21 The difference between the responses is
$$ \Delta {Y}_{G_j}={Y}_{G_j CITIZENSHIP}-{Y}_{G_j AGE}. $$
(3)
We estimate regression models for each group:
$$ \Delta {Y}_{N_j}={\mathbf{X}}_{N_j}^{\prime }{\upbeta}_N+{\upvarepsilon}_{N_j}. $$
(4)
$$ \Delta {Y}_{C_j}={\mathbf{X}}_{C_j}^{\prime }{\upbeta}_C+{\upvarepsilon}_{C_j}. $$
(5)
The difference-in-differences in expected problematic response rates across the two questions for the two groups NC and C is
$$ \Delta \Delta {Y}_{NC}=E\left({\Delta Y}_N\right)-E\left({\Delta Y}_C\right). $$
(6)
We decompose this as follows:
$$ \Delta \Delta {Y}_{NC}={\left[E\left({\mathbf{X}}_N\right)-E\left({\mathbf{X}}_C\right)\right]}^{\prime }{\upbeta}_C+\left[E{\left({\mathbf{X}}_{NC}\right)}^{\prime}\left({\upbeta}_{NC}-{\upbeta}_C\right)\right]. $$
(7)
The first term (explained variation) applies the coefficients for the AR citizen group to the difference between the expected value of the AR noncitizen group’s predictors and those of the AR citizen group. The second (unexplained variation) is the difference between the expected value of the AR noncitizen group’s predictors applied to the AR noncitizen group’s coefficients and the same predictors applied to the AR citizen group’s coefficients. The interpretation that the unexplained variation represents the variation due to the AR citizenship status of the person of interest is dependent on the assumption that there are no unobserved variables relevant to the difference-in-differences in problematic response across the two questions and AR citizenship groups.
Housing Unit Self-response Methodology
There are several elements to our method for predicting the effect of adding a citizenship question to the 2020 census on housing unit self-response rates. We take advantage of a natural experiment setting. In 2010, a subset of housing units that responded to the census were randomly selected to also participate in the 2010 ACS using a probability sampling scheme that did not depend on the citizenship status of individuals in the selected households. The ACS questionnaire contained 75 questions, including a battery of three questions that asked about nativity, citizenship status, and year of immigration. These same households also received a list of 10 questions from the full-count census questionnaire that did not include citizenship. Both the ACS and the census are mandatory Title 13 surveys that households are required by law to complete. We focus on census housing unitsFootnote 22 that received both questionnaires by mail from the initial mailing, did not have the questionnaire returned as undeliverable as addressed by the U.S. Postal Service, and were not classified as a vacant or delete (meaning unoccupied, uninhabitable, or nonexistent). We define a 2010 census self-response as a returned questionnaire from the first mailing that is not blank. For the 2010 ACS, a self-response is a mail response, also from the first contact mailing.
The simple difference in self-response rates (mail response) between the two surveys does not control for other reasons a household might respond to one survey and not the other besides the presence/absence of a citizenship question. Census self-response is bolstered by a media campaign and intensive community advocacy group support, and the ACS questionnaire involves much greater respondent burden (Office of Management and Budget 2008, 2009).Footnote 23
We control for the effects of other factors on the difference between ACS and census self-response rates by comparing the difference in households likely to have concerns about the citizenship question with the difference in households unlikely to have such concerns. AR noncitizens could be put at risk if their personal information regarding citizenship status and location were shared with immigration enforcement agencies, but AR citizens would not be put at risk. Households containing at least one noncitizen may thus have concerns about participating in a survey specifically containing a citizenship question, but all-citizen households presumably do not have such concerns. Our analysis assumes that any reduction in self-response to the ACS versus the census for all-citizen households is due to factors other than the presence of a citizenship question.
In our dichotomy, the less-sensitive group is “all-citizen households,” those households where all persons reported in the ACS to be living in the household at the time of the survey are AR citizens, and all are reported citizens in the ACS as well. The more sensitive group, “other households,” includes those households where (1) some residents may be both AR citizens and as-reported citizens but at least one resident is not; (2) there is disagreement between the survey report and AR response; or (3) citizenship status is not reported in one or both sources. This expands the group of people potentially having citizenship question confidentiality concerns compared with those we are using in the problematic response analysis. AR noncitizens are probably not the people most sensitive to a citizenship question, given that most of them are legal residents. Because we are unable to distinguish undocumented residents without SSNs or ITINs from citizens or noncitizen legal residents with SSNs or ITINs but have personally identifiable information discrepancies that prevent a link to ARs, we include all persons with missing AR citizenship in the sensitive group here. We use the ACS household roster to define which people are living in the household.
We assume that all-citizen households are less sensitive to the citizenship question than other households because, as we show, respondents have demonstrated a willingness to provide citizenship status answers for AR citizens, and those answers are quite consistent with ARs and thus are likely truthful responses. In comparison with others, more of the all-citizen household group’s reluctance to self-respond to the ACS should be due to reasons other than the citizenship question, such as unwillingness to answer a longer questionnaire. Note that if some of the reluctance by all-citizen households to self-respond is due to the citizenship question in the ACS, that will downwardly bias our estimate of the citizenship question unit self-response effect.Footnote 24
A different magnitude for the decline in self-response rates for the other household group relative to all-citizen households may not actually be due to greater sensitivity. Other characteristics besides citizenship status could be associated with different ACS self-response, and the two household groups could have different propensities to have such characteristics. To control for this possibility, we perform Blinder-Oaxaca decompositions to isolate citizenship question concerns. We use multiple methods for the Blinder-Oaxaca decomposition. The traditional method of relying on the literature to model factors related to observed characteristics that may drive self-response is reported as our main findings. Robust models using lasso and principal components techniques to identify the main observable factors explaining variation are included in the online appendix.
In our model, households belong to one of two groups G ∈ (S, U), where the S group is thought to be potentially sensitive to a citizenship question (other households), and the U group is not (all-citizen households). We set the self-responses \( {R}_{G_i{ACS}_t} \) and \( {R}_{G_i{Census}_t} \) equal to 1 if household i in group G self-responds in year t to the ACS and census, respectively, and 0 otherwise.Footnote 25 The difference between the survey responses is
$$ \Delta {R}_{G_it}={R}_{G_i{ACS}_t}-{R}_{G_i{Census}_t}. $$
(8)
Our choice for the vector of predictors X draws from Erdman and Bates (2017), who developed a block group–level model to predict census self-response rates.Footnote 26 Factors that predict census self-response may be even more important for a more burdensome questionnaire. We use household-level or household reference person equivalents for their variables:Footnote 27 log household size and its square, owned versus other, housing structure type (single-unit structure, multiunit, and other), household income, presence of children (related under 5, related 5–17, unrelated under 5, and unrelated 5–17), presence of an unrelated adult, all adults worked in the last week, reference person characteristics (married male, married female, unmarried male, unmarried female, race/ethnicity, age categories, educational attainment, moved here two to five years ago, and moved here within the last year), tract population density in the 2010 census,Footnote 28 and the shares of housing units in the block group that are vacant and under the poverty level. We add indicators for linguistically isolated households and not linguistically isolated households given McGovern’s (2004) finding that linguistically isolated households self-respond to the ACS at lower rates than only English-speaking households. Because immigrants tend to be concentrated in particular neighborhoodsFootnote 29 and such neighborhoods are more exposed to community outreach encouraging census response (see U.S. Census Bureau 2019),Footnote 30 we also control for the block group–level share of housing units with at least one noncitizen.
We estimate regression models for each household group where β contains the slope parameters and intercept, and ε is a regression error term with conditional mean of 0, given X.
$$ \Delta {R}_{S_{it}}={\mathbf{X}}_{S_{it}}^{\prime }{\upbeta}_{S_t}+{\upvarepsilon}_{S_{it}.} $$
(9)
$$ \Delta {R}_{U_{it}}={\mathbf{X}}_{U_{it}}^{\prime }{\upbeta}_{U_t}+{\upvarepsilon}_{U_{it}}. $$
(10)
The difference-in-differences in expected self-response rates across the two surveys for the two groups S and U in year t is
$$ \Delta \Delta {R}_{SU_t}=E\left({\Delta R}_{S_t}\right)-E\left({\Delta R}_{U_t}\right). $$
(11)
We decompose this as follows:
$$ \Delta \Delta {R}_{SU_t}={\left[E\left({\mathbf{X}}_{S_t}\right)-E\left({\mathbf{X}}_{U_t}\right)\right]}^{\prime }{\upbeta}_{U_t}+\left[E{\left({\mathbf{X}}_{S_t}\right)}^{\prime}\left({\upbeta}_{S_t}-{\upbeta}_{U_t}\right)\right]. $$
(12)
The first term (explained variation) applies the coefficients for the unsensitive group to the difference between the expected value of the sensitive group’s predictors and those of the less-sensitive group. The second (unexplained variation) is the difference between the expected value of the sensitive group’s predictors applied to the sensitive group’s coefficients and the same predictors applied to the unsensitive group’s coefficients. The interpretation that the unexplained variation represents the citizenship question effect is dependent on the assumption that there are no unmeasured confounding variables relevant to the difference-in-differences in self-response across the two surveys.
To study how changes in predictors over time might affect the magnitude of the unexplained variation (UV) in the decomposition, we apply the coefficients from the 2010 models to the predictors as measured in the 2017 ACS:Footnote 31
$$ {UV}_{2017}=E{\left({\mathbf{X}}_{S_{2017}}\right)}^{\prime }{\upbeta}_{S_{2010}}-E{\left({\mathbf{X}}_{S_{2017}}\right)}^{\prime }{\upbeta}_{U_{2010}}. $$
(13)