Empirical characteristics of legal and illegal immigrants in the USA


We combine the New Immigrant Survey (NIS), which contains information on US legal immigrants, with the American Community Survey (ACS), which contains information on legal and illegal immigrants to the USA. Using an econometric methodology proposed by Lancaster and Imbens (J Econ 71:145–160, 1996) we compute the probability for each observation in the ACS data to refer to an illegal immigrant, conditional on observed characteristics. These results are novel, since no other work has quantified the characteristics of illegal immigrants from a random sample representative of the population. Using these conditional probability weights on the ACS data, we are able to uncover some interesting facts on illegal immigrants. We find that, while illegal immigrants suffer a large wage penalty compared to legal immigrants at all education levels, the penalty decreases with education. We also find that the total fertility rate among illegal immigrant women is significantly higher than that among legal ones, in particular for middle and higher educated women. Looking at the sector of activity, we document that the sectors attracting most illegal immigrants are constructions and agriculture. We also generate empirical distributions for state of residence, country of origin, age, sex, and number of legal and illegal immigrants. Our forecasts for the aggregate distribution of legal and illegal characteristics match imputations by the Department of Homeland Security.

This is a preview of subscription content, access via your institution.

Fig. 1


  1. 1.

    Since its inception in 1987, the MMP has surveyed every year between four to eight communities during earlier survey years and between two to five communities during more recent survey years, to an overall total of 81 selected communities. For each household head and spouse, full migration and labor market histories are constructed from recall information; other household members are also interviewed about their first and their last trip to the USA.

  2. 2.

    Related literature investigates the effect of US border enforcement in stemming the flows of illegal Mexican immigration. This approach, referred to as the Apprehensions Method, does not provide an estimate of the number of illegal immigrants in the USA, nor of their characteristics. However, it does provide information on the change in time of the inflow of legal and illegal immigrants (Rosenblum 2012).

  3. 3.

    Somewhat related to our approach, Burtless and Singer (2011) combine data from the MMP with CPS data to get a measure of how many illegal Mexicans contribute to Social Security (being illegal, they have no hope of withdrawing benefits, despite contributing to Social Security). Because they need to identify who in the CPS data is an undocumented immigrant, they use a matching algorithm which they call “cold decking” to infer who in the representative CPS data would be a legal or an illegal Mexican migrant based on the observed characteristics of legal and illegal migrants in the MMP data. While their approach has a very different context, it still suffers from the fact that the MMP is a nonrandom sample and the characteristics of legal and illegal migrants in MMS may be different from the overall characteristics of legal and illegal migrants in the Mexican migrant population.

  4. 4.

    The ACS, which has been piloted since 1996, is intended as a replacement for the Census long form. While estimates from ACS are slightly less precise than those from the Census long form, a comparison of data from the 2000 Census with data from the 1999–2001 ACS indicated that data quality from ACS was very close to the one in the Census (Camarota and Jeffrey 2004). The obvious advantage of ACS over Census data is that it is a yearly survey, thus providing in a timely manner information on the characteristics of the foreign-born population. Compared to the CPS, estimates from ACS on the characteristics of the immigrant population are more precise.

  5. 5.

    See Hoefer et al. (2008).

  6. 6.

    We are less concerned about refugees because they are fewer than temporary visa holders. According to the DHS (Yearbook of Immigration Statistics), there were 40,705 refugee status applications in 2003, out of which 25,329 were approved. Therefore, we expect our sample to have about 25,000 nonpermanent resident immigrants who entered the USA in 2003 and who live in the USA legally as refugees, representing less than 1 % of the immigrants entered in 2003. The other 15,000 applicants who were rejected may also still live in the USA in 2007; however, they are more likely to be undocumented.

  7. 7.

    Exchange visitors are for most part students under J-1 visas. A small fraction of them can be workers, who typically are in short work programs to obtain a J-1 visa. These programs take no longer than 2 years to complete and often are only summer jobs.

  8. 8.

    We conduct further sensitivity analysis to look at the effect of alternative assumptions on the effective legal status of students. That is, we compare our aggregate estimates with the DHS under three hypotheses: (i) students are effectively all legally residing in the USA; (ii) students are all “posers” and effectively illegal immigrants (this happens when students have legal visas but they do not limit their activity to what the visa prescribe, i.e., they work instead than attend a school program), and (iii) students are similar to the rest of the foreign-born population, in which case we allow our methodology to assign probabilities of being legal based on their characteristics. We show that our estimates are closest to the DHS when we assume that students are all residing legally in the USA.

  9. 9.

    Out of them, 434,281 nonimmigrants are intracompany transferees and their spouses (L visas); about 30,000 representatives and staff of international organizations; 10,000 persons with extraordinary ability in the sciences, arts, education, business or athletics (O visas); 12,000 representatives of foreign media (I visas); and 361,470 workers of distinguished merit and ability (H-1 visas).

  10. 10.

    See Lancaster and Imbens (1996) page 149 for a discussion on this point.

  11. 11.

    Tables 18 to 23 in Appendix 4 report sensitivity analysis with different assumptions on the undercounting rates for legal and illegal immigrants.

  12. 12.

    For Mexicans, the total effect of schooling on the propensity of legal immigration is the sum of the effect of education (relative to elementary) plus the effect of Mex ∗education (relative to elementary). In our specification, it should be computed as the difference between the “Mex ∗education” interaction coefficient and the coefficient of “Mex ∗ < elementary (which is −0.74 in the adjusted and 0−.75 in the unadjusted specifications). This overall effect is negative and small.

  13. 13.

    Canadian observations are left out of the estimation, so America includes all of Central and South America except for Mexico, which is estimated alone.

  14. 14.

    The mean in the overall immigrant population is a weight between the legal and illegal means, with weights given by the unconditional probability of being legal (q) and illegal (1−q).

  15. 15.

    One issue about comparing our estimates with the ones from DHS is that we drop several observations as we implicitly assume that although they belong to non LPR, they belong, with very high probability, to legally resident immigrants. In particular, students comprise the bigger share of dropped immigrants. The DHS estimates include students as well. In Appendix 2, we report three tables on students. Table 12 reports the results assuming that students are all illegal, while the weights for the rest of the immigrants are extrapolated using our estimation results in Table 3. The following table assumes that students are all legal (Table 13). In the last, Table 14, students are given the same weights as in the rest of all immigrants. From Tables 12, 13, and 14, it is clear that the assumption that generates statistics closer to the DHS is the one we implicitly make by dropping students from our analysis, that is, that students are most likely legally present in the USA.

  16. 16.

    Note that the DHS estimates are on the entire population of immigrants after 1980. In Tables 15, 16, and 17 in Appendix 3, we report estimates made using data on all immigrants from 1980; while the results are slightly different, they do not change qualitatively. Indeed, our estimates using different samples are in line with the estimates reported by DHS on year 2000 and year 2007.

  17. 17.

    The total fertility rate is calculated as the average of the fertility rates for each cohort, times 5 (we consider seven 5-year cohorts). The fertility rate for each cohort is calculated as the percentage of women who have had one live birth in the last 12 months prior the interview. We restrict to immigrants who immigrated after the year 2000. Some women who intended to migrate and have children are more likely to have waited until the successful migration to have children. We also restrict the sample to nonstudents, which increases the fertility rate of the younger women left in the sample.


  1. Burtless G, Singer A (2011) The earnings and social security contributions of documented and undocumented Mexican immigrants, No. 2 in Working Paper. Boston College Retirement Research Center

  2. Camarota S, Jeffrey C (2004) Assessing the quality of data collected on the foreign born: an evaluation of the american comm unity survey (ACS). Methodology and data quality. COPAFS (The Council of Professional Associations on Federal Statistics)

  3. Durand J, Massey D (2006) What we learned from the Mexican migration Project vol. Crossing the border: research from the Mexican migration project. Russell Sage Foundation, New York

    Google Scholar 

  4. Fortin N, Lemieux T, Firpo S (2011) Decomposition methods in economics Handbook of labor economics, chap. 1, vol 4. Elsevier, New York, pp 1–102

  5. Hoefer M, Rytina N, Baker B C (2008) Estimates of the unauthorized immigrant population residing in the United States: January 2007, Population Estimates. U.S. Department of Homeland Security, Office of Immigration Statistics

  6. Lancaster T, Imbens G (1996) Case-control studies with contaminated controls. J Econ 71(1–2):145–160

    Article  Google Scholar 

  7. Passel J (2006) The size and characteristics of the unauthorized migrant population in the U.S. estimates based on the March 2005 current population survey. Research Report, PEW Hispanic Center

  8. Passel J, Randolph C, Fix M (2004) Undocumented Immigrants: facts and figures. Immigration studies program. Urban Institute, Washington DC

  9. Ridder G, Moffitt R (2007) The econometrics of data combination, chapter 75. Part 2 of Handbook of econometrics, vol 6. Elsevier, New York, pp 5469–5547

  10. Rosenblum M (2012) Border security: immigration enforcement between ports of entry. No. 2 in congressional research service. Washington DC

Download references

Author information



Corresponding author

Correspondence to Vincenzo Caponi.

Additional information

Responsible editor: Klaus F. Zimmermann


Appendix: Using 2007 ACS with sampling weights not adjusted for differences in nonresponse rates

Table 11 Legal and illegal distributions from ACS 2007, including professionals, survey sampling weights

Appendix: Students

This appendix presents the distribution of legal and illegal immigrants under different hypotheses about students. The first table shows the distribution of illegal and legal immigrants when we assume that students are all illegal, i.e., the distributions are calculated assigning a probability to be legal to students equal to 0. The second table assumes a probability to be legal equal to 1. This is the benchmark case, as what we implicitly assume taking students of the sample when we calculate the probabilities of being illegal. The difference is that in this case, the distribution of legal immigrants is changed accordingly to the fact that students are counted in this subpopulation. Notice that we do not include students in the probit step of the computation of the probabilities to be illegal as we believe that this would bias our estimation. Our estimation suggests that younger and less educated immigrants are more likely to be illegal; this, however, is probably not true for students that are younger because of their status as students and less educated because they are still acquiring education. The last table shows the distribution assuming that students are like all other immigrants. Clearly, the second table is the one that most approaches the DHS estimates. Therefore, our hypothesis that students are mostly legally present in the USA is the one that best helps to reproduce the aggregate DHS statistics.

Table 12 Legal and illegal distributions: with students assumed to be illegal (Pr(illegal) = 1)
Table 13 Legal and illegal distributions: with students assumed to be legal (Pr(illegal )=0)—benchmark
Table 14 Legal and illegal distributions: with students with general weights (i.e., assumed to be similar to others)

Appendix: Different samples

This appendix shows the distribution of legal and illegal immigrants using different samples of immigrants from the ACS. We use samples of immigrants from the ACS 2007 who migrated after 2000, after 1990, or after 1980.

Table 15 Legal and illegal distributions from ACS 2007—with immigrants since 1980
Table 16 Legal and illegal distributions from ACS 2007—with immigrants since 1990
Table 17 Legal and illegal distributions from ACS 2007—with immigrants since 2000

Appendix: Sensitivity on different response rates

The following tables present some sensitivity changing the assumptions on nonresponse rates. The first two tables resume all the estimation results, and the following tables use the results from the estimations to create the weights for legal and illegal and compute the means for several characteristics. The assumptions for models M1 to M6 (M6 is the correction used in the main text) for nonresponse rates of illegal and legal immigrants are as follows:

$$\begin{array}{@{}rcl@{}} \mathrm{M}1~&=&~15~\text{and}~10~\%\\ \mathrm{M}2~&=&~15~\text{and}~5~\%\\ \mathrm{M}3~&=&15~\text{and}~2.5~\%\\ \mathrm{M}4~&=&10~\text{and}~10~\%\\ \mathrm{M}5~&=&10~\text{and}~5~\%\\ \mathrm{M}6~&=&10~\text{and}~2.5~\% \end{array} $$
Table 18 Probit results: conditional probability of being a legal immigrant from NIS and ACS 2007
Table 19 Legal and illegal distributions from ACS 2007 using immigrant weights from 2007 ACS M1
Table 20 Legal and illegal distributions from ACS 2007 using immigrant weights from 2007 ACS M2
Table 21 Legal and illegal distributions from ACS 2007 using immigrant weights from 2007 ACS M3
Table 22 Legal and illegal distributions from ACS 2007 using immigrant weights from 2007 ACS M4
Table 23 Legal and illegal distributions from ACS 2007 using immigrant weights from 2007 ACS M5

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Caponi, V., Plesca, M. Empirical characteristics of legal and illegal immigrants in the USA. J Popul Econ 27, 923–960 (2014). https://doi.org/10.1007/s00148-014-0524-x

Download citation


  • Legal immigrants
  • Illegal immigrants
  • Contaminated controls

JEL Classifications

  • J15
  • F22