Defined Contribution Pension Plans: Determinants of Participation and Contributions Rates


Records of 793,794 employees eligible to participate in 647 defined contribution pension plans are studied. About 71% of them choose to participate in the plans, and of the participants, 12% choose to contribute the maximum allowed, $10,500. The main findings are (other things equal) (1) participation rates, contributions and (most remarkably) savings rates increase with compensation; on average, a $10,000 increase in compensation is associated with a 3.7% higher participation probability and $900 higher contribution; (2) women’s participation probability is 6.5% higher than men’s and they contribute almost $500 more than men; (3) participation probabilities are similar for employees covered and not covered by DB plans, but those covered by DB plans contribute more to the DC plans; (4) the availability of a match by the employer increases employees’ participation and contributions; the effect is strongest for low-income employees; (v) participation rates, especially among low-income employees, are higher when company stock is an investable fund.

  1. Here we only consider maximum contribution to the IRS limit ($10,500 or 25% of compensation). Section 3.3 discusses potential plan-specific limits that are lower than the IRS limit.

  2. The 0.5% is to allow for rounding error. We err on over-classifying “potentially limited plans” to be on the conservative side.

  3. It could be that all employees in a plan voluntarily contribute less than certain percentage of their income, and a handful of them (five or more) contribute very close at the top (c%). This is more likely to be the case when c% is higher, such as those greater than 15%. For example, it is plausible that nobody contributes more than 20% of their compensation, even in the absence of any plan-specific limit. For people with compensation greater than $52,500, the $10,500 IRS limit binds first. For people from the lower compensation group, contributing 20% or more would imply low take-home pay. On the other hand, one can also argue that the classification method may miss-out limited plans, too. Under-classification is quite innocuous for the estimation purpose. Non-detectable plan limits imply that they are basically non-binding (except for maybe less than five people in a plan). A plan-specific limit only affects contribution when it is binding, that is, when the employees are constrained.

  4. Using an experiment setting, Duflo et al. (2006) find that a 50% match increase the take-up rate of low- and middle-income subjects into IRA contribution by 11 percentage points compared to no match; and increase the contribution by $345. These numbers seem to be in the same order of magnitude as our findings.


The authors are grateful to Steve Utkus and Gary Mottola from Vanguard® Center for Retirement Research. They made the data available and provided us with constructive comments throughout the process. We also thank Pierre Azoulay, Geert Bakeart, Charlie Calomiris, Sarah Holden, Brigitte Madrian, Jim Powell, Thomas Steinberger, Ed Vytlacil, Elke Weber, Steve Zeldes, an anonymous referee, and seminar participants at Columbia, Rice, Wharton Pension Research Council, New York Federal Reserve Bank, CEPR workshop on Financing Retirement in Europe, and Wharton Workshop on Household Portfolio Choice and Financial Decision Making for their helpful comments.

Data Construction

The original data set provided by the Vanguard consists of 926,104 records of 401(k) eligible employees. The following criteria cause elimination of observations: (1) The employee is hired after January 1, 2001. (His recorded annual contribution might be inaccurate.) (2) The employee is less than 18 years old (He might not be the decision maker.) (3) The employee’s annual compensation is less than $10,000 or greater than $1 million. 793,794 records survive.

The key variables deferral rate, contribution, and compensation appear in all the records. All other individual variables have missing values that are more concentrated in the non-participant sub-sample. 12.8% of the observations do not report gender, among which 62.5% are non-participants; the same percentages for age, tenure and wealth are 12.3% (62.2), 12.2 (62.1%) and 25.6% (76.4). Elimination of all the observations with missing values would cause the study to be based on a partially truncated sample, which is likely to bias the results due to the influence of the selection. Hence the choice to replace them with imputed values.

The imputed values are calculated as follows: (1) unidentified gender variables are recorded as the percentage of females in the plan (a record identified as a female being 1 and as a male being 0); (2) missing age and tenure values are replaced with their respective plan mean age or tenure. To fill in the missing values of wealth, the following regression is estimated on non-missing values:

$${\text{Log}}{\left( {{\text{WEALTH}}} \right)} = \beta _{0} + \beta _{1} {\text{Log}}{\left( {{\text{COMP}}} \right)} + \beta _{2} {\text{FEMALE}} + \beta _{3} {\text{AGE + }}\beta _{{\text{4}}} {\text{AGE}}^{2} {\kern 1pt} + \beta _{5} {\text{Log}}{\left( {{\text{DCASSETS}}} \right)} + \varepsilon ,{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} $$

where DCASSETS is the total assets in the defined-contribution accounts. The specification above was chosen among various models for both within-sample goodness-of-fit and out-of-sample robustness. The next step is to predict out-of-sample values and map the predicted values to the closest IXI brackets.

With the exception of the wealth level, IXI, missing values do not account for a significant proportion (about 10%) of observations, and the symmetric trimming method (Honore et al. 1997) is applied to conduct sensitivity check. That is, for each variable create an artificial sample by first taking only records that report that variable (but may miss other variables). The second step eliminates a given number of participants’ records at random, so that the participation rate in the subsample matches that of the original sample. (This second step eliminates records of participants, since systematically it is the non-participants whose records are likely to miss values.) Then estimate coefficients for the resulting sub-sample. Repeating this process many times (e.g., 30 or 50), the coefficients estimated on the full sample (with missing values imputed) are close to the average of coefficients estimated on those symmetrically trimmed sub-samples.

For the wealth variable the same sensitivity check cannot be used because only a low proportion of non-participant records have this information. Instead, the comparison is between the inputted wealth variables and the general distribution of wealth in the population. The following figure plots the histogram of wealth distribution for the population (IXI), of the non-missing Vanguard data, and the amended Vanguard data. After filling in the missing values, the sample studied here no long over-represents the wealthy households. The sample still under-represents the very poor households (those with negative or less than $1,000 in balance), and over-represent the lower-middle to middle households (with balances ranging from $5,000 to $100,000), which is consistent with evidences from the Survey of Consumer Finance and the Current Population Survey that 401(k) eligible employees are overall financially better-off than the general population in the lower end.

Further, estimating the main regressions excluding the wealth variable, the coefficients on all variables except compensation show little variation. (The loading on compensation is increased, which is expected.)

