Background

Determination of mortality status is an important part of epidemiological studies and many clinical research investigations. Internet sites, such as the Social Security Death Index (SSDI) based on the Social Security Administration (SSA) Death Master File (DMF), are available to researchers for this purpose [1]. The SSA DMF is a database available to the public containing death notices for enrollees in the U.S. Social Security program. This free service is available on the World Wide Web and is updated monthly. The use of databases for ascertainment of mortality status in epidemiological research is common practice. Many prospective cohort studies evaluate the relation between baseline risk factors and total mortality; by means of linking baseline records with databases the mortality status of study participants can be ascertained. For example, Gragoudas et al. [2] developed risk score equations to estimate probabilities of death based on an analysis of 2069 patients treated with proton beam radiation for intraocular melanoma and linked to the National Death Index (NDI), a computerized index of death records maintained by the National Center of Health Statistics for research purposes, and the SSA DMF.

Previous reports have shown the sensitivity of Internet sources for death ascertainment as high as 97.5% among males but as low as 31.1% among females [3] using the NDI as gold standard. The purpose of this paper is to analyze the ability of Internet sites based upon the SSA Death Master File to determine mortality status as a function of gender, ethnic background and additional demographic variables among 374 confirmed decedents.

Methods

Patient population

For the present study we selected 374 consecutive patients followed up between January 1993 and January 2001 from a population involved in the Myocardial Perfusion Imaging/Patient Outcome (MPI/PO) Study at Cedars-Sinai Medical Center (CSMC), a large hospital in Los Angeles, California, whose deaths occurred at CSMC. Date and cause of death were confirmed by physician review. All demographic information, including name, social security number, place of birth, ethnic group and date of birth, was taken from the hospital admission information.

Internet sources of vital status

Internet sites such as http://Ancestry.com[1] provide free access to the SSA Death Master File, maintained by the Social Security Administration. The Death Master File contained 65,445,243 records of decedents with social security numbers whose deaths were reported to the SSA and was current through January 2001 at the time of this study. Search tools such as the Social Security Death Index (SSDI) available as a free service on the Internet contain information fields for social security number, surname, given name, date of death, date of birth, last known residence, location of last benefit, and date and place of issuance. The database is not downloadable, however, software to allow for multiple searches can be easily implemented using packages such as JAVA. Searches can be conducted with any one field or a combination of fields. For this study we used date of birth, social security number and/or first and last name.

Searches were conducted individually and without use of a data matching software package. We considered positive identification for records with exact matches of name, social security number, and dates of death and birth as well as for inexact matches of name with exact match of social security number and/or dates of birth and death [4, 5].

Statistical methods

Continuous variables are expressed as mean value (standard deviation). The mean differences for continuous variables were compared by t-test (2-tailed). Categorical values are expressed as percentage (standard deviation) and compared using chi-square statistics. Sensitivity and 95% confidence intervals (CI) were estimated. Analysis of variance was performed to estimate adjusted means. Age was divided in four categories based on distribution quartiles. Logistic regression was used to identify the variables that best predict positive detection by the Internet mortality database.

Results

The 374 decedents are characterized by identification status in Table 1. The Internet source of the SSA Death Master File identified 330 (88.2%) as dead and failed to identify 44 (11.8%). Those not identified as dead were significantly more likely to be younger and foreign born. We did not observe significant differences in calendar year of birth and of death, gender, or ethnic background.

Table 1 Comparison of characteristics of study participants from the MPI/PO Study by Identification Status on the SSA DMF.

The sensitivity of the Internet accessed SSA Death Master File as a function of gender and place of birth is displayed in Table 2. Sensitivity for American-born males, 91.49 percent (95 percent confidence interval: 86.9, 96.1) and females, 93.3 percent (95 percent confidence interval: 88.1, 98.5) is similar. The sensitivity for foreign-born decedents is substantially lower. For foreign-born males it drops to 83.7 percent (95 percent confidence interval: 76.4, 91.0) and for foreign-born females to 77.8 percent (95 percent confidence interval: 65.6, 89.9).

Table 2 Sensitivity and 95% confidence intervals (CI) for the SSA DMF in determination of mortality status of decedents from the MPI/PO Study by gender and country of birth.

Sensitivity as a function of age is displayed in Table 3. Significant differences in sensitivity were found for age quartiles, ranging from 82.8 percent for ages 41 through 70 (95 percent confidence interval: 75.1, 90.5), to 95.7 percent for ages 85 through 97 (95 percent confidence interval: 91.7, 99.8).

Table 3 Sensitivity and 95% confidence intervals (CI) for the SSA DMF in determination of mortality status of decedents from the MPI/PO Study by quartiles of age at death.

The results of logistic regression are seen in Table 4. Foreign-born decedents had 67% lower odds of being found by the Internet mortality database than American-born counterparts. Decedents in the first age quartile had approximately 85% lower odds of being found than the fourth age quartile. African American decedents had 68% lower odds of being found than were Caucasian decedents (p = 0.07). We did not observe statistically significant odds ratios for reaching retirement age (i.e. age 62), gender, or marital status.

Table 4 Odds Ratios and 95% Confidence Intervals for determination of mortality status of decedents from the MPI/PO Study by the SSA DMF.

Discussion

In our study, the internet source of information from the SSA Death Master File demonstrated high and consistent sensitivity for detecting mortality status of both American-born men and women. The sensitivity for American-born decedents was 92.2%, comparable to documented sensitivity for the National Death Index, 87–98% [4, 68]. However, in foreign-born individuals there is a nearly 10% reduction in sensitivity. The results also suggest that African Americans may have odds as high as 68% of being excluded from Internet databases. Moreover, in our study the odds of sources of the SSA Death Master File finding the youngest decedents were 87% lower than that for the oldest decedents.

The SSA Death Master File is comprised of decedents with social security numbers whose deaths were reported to the Social Security Administration. The SSA reports that in most cases a report of death was made in connection with a claim for Social Security death benefits. In some cases, it is reported to stop Social Security Benefits to the deceased. The primary sources of information utilized for the SSA DMF are relatives of deceased individuals, funeral directors, financial institutions, postal services, as well as other government agencies [9]. Thus the reasons for exclusion from the SSDI include not having a social security number and not having the death reported to the SSA [5, 10].

The SSA was originally founded by an act of Congress in 1935 as a retirement program. In 1972 the SSA was required to issue social security numbers (SSNs) to all legally admitted aliens at entry; SSNs are assigned to all persons authorized to work in the US who request them, including newborns. SSNs are required for tax purposes, to get medical coverage or apply for government services. As a result, most Americans and legal aliens have SSNs [11, 12].

A recent study compared the SSDI to the NDI using the NDI as the "gold standard" and demonstrated a high sensitivity among men (94.7%), but much lower among women (31.1%) using the first and last name search fields [3]. Our study employed social security number as the primary search field and name as secondary. We found an overall sensitivity of 88.3% for men and 88.1% for women, using confirmed mortality as our "gold standard". We believe one source of this discrepancy to be related to the disproportionate frequency of name changes in women. Having information on social security number has been shown to greatly improve sensitivity, as well as specificity, for sources of mortality [7, 9, 13, 14], possibly by reducing the impact of inexact matches of name (e.g. nicknames, misspelling) [7, 15, 16]. Investigators using this information have had similar findings among some demographic groups [6, 14]. While the Health Insurance Portability and Accountability Act (HIPAA) places certain restrictions on personal information available to researchers, identifiers such as social security number are frequently accessible for studies [17].

We found that foreign-born decedents had 67% lower odds of being identified by the internet-accessed SSA Death Master File than American-born individuals. A possible explanation for the differential misclassification is related to the eligibility criteria defined by the SSA for receiving death benefits. Foreign nationals and naturalized citizens may have less opportunity to achieve the necessary 40 quarters (10 years) of work in the US to qualify for benefits and thus reduced incentive to report deaths to the SSA. Foreign-born decedents comprised 38% of our study population. The U.S. Census Bureau recently determined that 10.4% of the American population was foreign-born as of March 2000. Immigrant proportions were highest in major urban areas, with Los Angeles, New York City, and San Francisco accountable for the majority of such individuals [18].

Age at death was another determining factor on identification by the SSDI in our study. Older decedents were significantly more likely to be identified as dead, similar to previous reports [19, 20]. In general, as with immigrants who have not had sufficient opportunity to work the necessary 10 years, younger aged decedents are less likely to have achieved qualification for benefits. In this study, the first age quartile ranged from 41–70; it is unlikely to have greatly affected ability to qualify for benefits. We found a significant increase in sensitivity only for decedents older than 85 years at death; sensitivity was approximately 85% for the first 3 quartiles. We also found a significant reduction in sensitivity for determining mortality status of African American decedents. Previous studies have reported difficulties in ascertainment of mortality status in African Americans using databases of such information [7, 8, 21]. However, these results should be looked at with caution because of the small sample size of African Americans on which they are based.

Our study suggests that the use of the SSDI as the sole source for verification of mortality status might have detrimental effects in research findings if misclassification of mortality status is not accounted for in the analysis. Differential misclassification of mortality status can lead to under/over – estimation of prevalence of outcomes and undesired bias on risk estimates of exposures of interest and their variances. As shown on this paper, this is especially the case, if the exposures of interest are, or related to age, gender, country of birth or race.

Correction methods for bias due to misclassification are available in the literature [2225]. The matrix method described by Greenland et al. [25] is one alternative to correct odds ratios for misclassification for 2 × 2 tables. Magder et al. [26] showed that when the sensitivity and specificity of a diagnostic test are assumed to be known or can be estimated, this information can be incorporated into the fitting of logistic regression models to estimate risk. They also described an EM algorithm that produces unbiased estimates of the odds ratios and their variances.

This study is limited in its generalizability; the patient population is entirely composed of patients seen for potential heart problems in the Nuclear Cardiology department who agreed to be part of an observational follow-up study. Additionally, while the number of decedents studied is similar to that of similar studies, it is still too low for the analysis of certain subgroups. However, despite the use of the convenience sample for this study, we have no reason to suspect that estimates of overall sensitivity or sensitivity as a function of study variables would be grossly different than population values. Regardless, we encourage application of sensitivity analysis techniques to evaluate different levels of uncertainty with respect to bias. Though we have demonstrated the sensitivity of the SSDI using confirmed decedents, we have not attained similar information for other databases of mortality status, such as the NDI. We have not presented information regarding the specificity of the Internet accessed SSA Death Master File, however, our experience agrees with previous studies that have shown it to be nearly 100% [16, 27]. It should be noted that our sole source of demographic information is the hospital admission records. Findings could reflect variance in accuracy as a function of our study variables. However, such information is frequently all that is available to investigators.

Conclusions

Internet sources provide accurate information for determination of mortality status and may be accessed using the web quickly and inexpensively. The SSA Master Death File from which Internet sources are generated is updated monthly, thus making it particularly useful for researchers conducting prospective studies with mortality as an endpoint. While gender and marital status have no effect on the sensitivity of SSA Master Death File in our sample, other demographic factors do. There are significant decreases in accuracy among foreign-born decedents, especially women, as well as among African-Americans. For study populations composed largely of these groups, as urban study samples are likely to be, the SSDI may be less effective for determining mortality. Investigators conducting prospective studies should note this as well as the importance of correct information concerning social security number [7, 13]. For studies without this information other sources of mortality information should be consulted.