Comparing Respondent-Driven Sampling and Targeted Sampling Methods of Recruiting Injection Drug Users in San Francisco
- First Online:
- Cite this article as:
- Kral, A.H., Malekinejad, M., Vaudrey, J. et al. J Urban Health (2010) 87: 839. doi:10.1007/s11524-010-9486-9
The objective of this article is to compare demographic characteristics, risk behaviors, and service utilization among injection drug users (IDUs) recruited from two separate studies in San Francisco in 2005, one which used targeted sampling (TS) and the other which used respondent-driven sampling (RDS). IDUs were recruited using TS (n = 651) and RDS (n = 534) and participated in quantitative interviews that included demographic characteristics, risk behaviors, and service utilization. Prevalence estimates and 95% confidence intervals (CIs) were calculated to assess whether there were differences in these variables by sampling method. There was overlap in 95% CIs for all demographic variables except African American race (TS: 45%, 53%; RDS: 29%, 44%). Maps showed that the proportion of IDUs distributed across zip codes were similar for the TS and RDS sample, with the exception of a single zip code that was more represented in the TS sample. This zip code includes an isolated, predominantly African American neighborhood where only the TS study had a field site. Risk behavior estimates were similar for both TS and RDS samples, although self-reported hepatitis C infection was lower in the RDS sample. In terms of service utilization, more IDUs in the RDS sample reported no recent use of drug treatment and syringe exchange program services. Our study suggests that perhaps a hybrid sampling plan is best suited for recruiting IDUs in San Francisco, whereby the more intensive ethnographic and secondary analysis components of TS would aid in the planning of seed placement and field locations for RDS.
In order to establish the prevalence of a disease or health-related behavior in a population, the gold standard is to develop a probability-based representative sample. However, with populations defined by illicit or stigmatized behavior, a probability-based sampling frame is not usually feasible. Instead, researchers make inferences based upon a portion of the target population gathered using nonprobability-based methods. The validity of these studies depends upon how representative the sample is of the target population.1,2 To optimize representativeness of samples, it is important to plan and execute recruitment of study participants who proportionately have the same characteristics as the target population. Sampling of drug users has been hampered because people who use illicit substances are involved in stigmatized behaviors that make a substantial proportion of them unwilling to identify themselves as being drug users.3–6
Studies of injection drug users (IDUs) in the mid-20th century relied on institutional settings such as hospitals, jails, and drug treatment centers as sampling frames. These sampling frames had inherent selection biases because of differences in prevalence of disease or risk behaviors among IDUs in or entering institutional settings compared to IDUs in community-based settings.7,8 As the HIV/AIDS epidemic emerged in the United States in the mid-1980s, IDUs were identified as a high-risk group that needed to be studied outside of institutional settings. Two sampling methods were developed and utilized: targeted sampling (TS) and snowball sampling, which later was modified to develop respondent-driven sampling (RDS).
In 1986, Watters and Biernacki9 developed TS methods in San Francisco to recruit IDUs directly from communities. This method involves using secondary analysis of existing data (from drug treatment programs, hospitals, jails, etc.) and primary collection of ethnographic data to first establish characteristics of the target population. Then targeted enrollment plans (quotas) for each geographic area and demographic characteristic are established, and recruitment is conducted through the use of community health outreach workers.9 TS is an iterative process designed to assess the characteristics of the sample at several points so that sampling can be adjusted in service of obtaining a final sample similar to that of the hypothesized target population. It quickly became the most common recruitment method to study IDUs in the United States, including its use in the 23-city, National Institute on Drug Abuse (NIDA)-funded Cooperative Agreement study in the 1990s.10,11 While TS has been a successful sampling method of IDUs, there is no formal way to assess whether the samples are representative. As such, it is not feasible to calculate valid prevalence estimates. In more recent years, researchers have enhanced TS by estimating the density of drug users in the target areas and using proportional sampling quotas.12,13 These enhancements, however, have been shown to have significant selection biases.4,8
Snowball sampling (also called chain referral) was developed in the 1960s by Goodman14 and is a logistically simple way of recruiting a convenience sample when the target population is hidden, and there is reason to believe that there are large social networks within the target population who would be willing to recruit members into a study. It involves starting with a few members of the target population and then giving them an incentive to bring in their acquaintances who qualify for the study. Those new recruits are also given incentives to bring in their acquaintances, and so on. This method of sampling was used to recruit IDUs and heroin users in the 1980s,15 including in the NIDA-funded 29-city National AIDS Demonstration Research project from 1987 to 1992.16,17 An important tenet of snowball sampling is that initial recruits (also called “seeds”) be selected at random from the target population.14 However, because selecting the initial recruits at random is not possible with IDUs, the studies using snowball sampling to recruit IDUs in the 1980s and early 1990s were essentially considered convenience samples. However, in 1997, Heckathorn18 used mathematical modeling and data simulation to demonstrate that it is possible to generate population-based estimates by tracking the social networks of participants who refer each other to the study, given certain assumptions. He called this refined snowball sampling method RDS. By keeping track of who is referring whom, it is possible to identify a sequence of recruiter–recruit chains called “recruitment waves.” If these chains are sufficiently large, the composition of the sample (i.e., with respect to its demographic and behavioral characteristics) gradually stabilizes. At this point, the sample is said to have reached “equilibrium,” meaning that subsequent recruitment does not greatly alter the makeup of the sample and that sufficient data are collected to estimate and adjust for recruitment biases. Throughout the survey, the recruiter–recruit chains need to be carefully monitored and the size of the social network of each participant needs to be recorded. Using these network-size data and the characteristics of recruits and recruiters, statistical adjustments are made to the survey results to account for different probabilities of inclusion and the cluster effects introduced by the networked nature of the sample.19 In the 2000s, RDS become a very common method of recruiting IDUs in the United States, including in the Centers for Disease Control and Prevention (CDC)-funded 25-city National HIV Behavioral Surveillance (NHBS) study, and internationally.20–28
Currently, TS and RDS are the two most common methods used to sample IDUs in the United States. Robinson et al.20 conducted a study that provided information comparing logistics and costs of using TS and RDS to recruit IDUs in Detroit, Houston, and New Orleans. Due to small sample size, they were not able to conduct a thorough analysis comparing the characteristics of the samples by method. This article is among the first to compare demographic and behavioral characteristics of IDUs from two separate studies of IDUs, one which used TS (n = 651) and the other which used RDS (n = 534).
We conducted two separate studies of IDUs in San Francisco in 2005, one using TS and the other using RDS. Below, we describe the methods for each study.
Eligibility criteria included the following: (1) reported injecting illicit drugs within the past 30 days, (2) had visible sign of injection (“tracks”), (3) were at least 18 years of age at the time of interview, and (4) were able to speak English or Spanish. Participants from previous serial cross-sections did not receive any special recruitment contact, but they were automatically eligible for future cross-sections (6 months apart), even if they had switched to noninjecting methods or had quit using drugs. Participants were assigned a unique ID code, and we checked their identification by asking five questions: sex, birth year, age, race/ethnicity, state of birth, and first two letters of mother's maiden name. This helped to determine which observations were duplicates. For this analysis, we only included active IDUs who reported injecting drugs in the previous 30 days.
After providing informed consent to this anonymous study, participants were interviewed in person by a trained interviewer in a private space using a structured questionnaire on a computer-assisted programmed interview (CAPI) using QDS software (QDS; NOVA Research Company, Bethesda, MD). They were paid $15 for contributing to the study. Questions covered demographic information, injecting and sexual risk behaviors, and utilization of health care, drug treatment, and HIV prevention programs. The questions pertained to the 6-month period preceding the interview date. Blood specimens were drawn following the interview to assess HIV status using enzyme immunoassay and Western blot assay, following standard laboratory methods. Participants were asked to return for HIV serology results in 2 weeks. They were offered HIV counseling, provided with referrals to medical and social services, and paid $15. Study protocols were approved by the University of California, San Francisco (UCSF) Committee on Human Research.
SFDPH conducted a cross-sectional behavioral surveillance survey using RDS among IDU in San Francisco in 2005, as part of the CDC NHBS study. A standard protocol developed in collaboration with CDC and researchers from 24 other US metropolitan areas was implemented.32 Below is a brief description of the specifics to RDS as used in this study.
After a review of AIDS case surveillance data and existing secondary data, the SFDPH team drew up a preliminary list of diverse characteristics (i.e., race/ethnicity, gender, age, neighborhood, and injection drug of choice) that were desired in the initial recruitment of seed subjects. Enlisting the help of several key informants, active IDUs were identified and approached who met the initial criteria and who also had relatively large network sizes (i.e., IDU who had social ties with several other IDU and were well known in the IDU community). Eight seeds were selected during the first month of the study, and an additional eight seeds were recruited during data collection to form a group of seeds diverse in such demographic characteristics as age, race, and sex, as well as drug of choice. The seeds were given $40 in cash as an incentive for participating in the study and were given three recruitment coupons when they completed the survey. Each coupon listed the objectives of the project, contact information and working hours of the study site, the amount of the incentive for participation of prospective recruits, and a unique tracking code. The seeds were trained in how to use each coupon to recruit an IDU from their network of peers to participate in the survey. They were told that they would receive an additional $10 for each recruit who brought in his or her coupon and completed the survey. The recruited IDU would go through the same procedures as the initial seeds, including eligibility screening, interview, delivery of three new coupons, and training in how to recruit other IDU. Unique tracking codes on coupons were used to document who recruited whom and to facilitate payment of the monetary incentives. A customized version of the database program, Respondent Driven Sampling Coupon Manager (RDSCM), was used to collect recruitment data.33
Eligibility criteria included the following: (1) injection drug use within the past 12 months; (2) visible sign of injection (e.g., track marks, scars, needle-sized scabs) or could correctly describe injection practices; (3) were at least 18 years of age at the time of interview; (4) were able to speak English; and (5) were residents of San Francisco. To be eligible for the survey, participants were required to present a recruitment coupon to SFDPH staff, which then explained study procedures and obtained informed consent. After screening participants for eligibility, explaining the study's procedures, and obtaining consent, interviewers conducted a face-to-face interview with each participant using a computerized questionnaire on a handheld device. The questionnaire was developed using the same software as for the TS study above (QDS). Interviews took place in private rooms at SFDPH (marked by a star in Fig. 1a).
The survey instrument contained two parts. The first part sought specific information about the social networks of the participants required for RDS-specific data analysis. Such information included the size of the participant's social network (e.g., the number of other injectors he or she had been acquainted with during the previous 6 months), rough estimations of the sex and race/ethnicity composition of the members of their social networks, and the relationship of each recruit to his or her recruiter (e.g., friend, sex partner). To determine the size of the participants' networks, we asked all participants the questions, “How many people do you know personally who inject?” and “Of these injectors, how many have you seen at least once in the last 6 months?” The second part of the survey instrument dealt with the participants' demographic characteristics, drug use, sexual risk behaviors, and access to types of drug treatment and HIV prevention programs (e.g., methadone treatment programs, syringe exchange programs [SEPs]). There was no HIV testing or counseling in the RDS study.
Participation was anonymous. This survey was part of the larger NHBS system and was classified as a nonresearch survey by the CDC Institutional Review Board and the UCSF Committee on Human Research.
For the purpose of making comparisons between RDS and TS, we focused our analysis on key indicators in three basic domains: (1) sociodemographic characteristics, (2) drug of choice, and (3) access to and use of HIV prevention and treatment programs including drug and alcohol treatment programs, and received sterile needles for injection.
To analyze the RDS-generated data, we used Respondent-Driven Sampling Analysis Tool (RDSAT v. 6.0). We calculated population-based proportions and 95% confidence intervals (CIs) for selected key variables listed above. RDSAT adjusts for each individual's network size and characteristics in relation to the other recruits. For TS data, we computed point prevalence estimates and 95% CIs for select variables using Stata version 9 (StataCorp LP, College Station, TX). To determine whether there were statistically significant differences, we assessed whether there was an overlap in 95% CI between the same variables in the two studies.
IDU recruitment using TS was completed in 16 weeks. Because of the informal way that advertising TS studies is conducted, there is no way of knowing how many IDUs were informed about the study and decided not to participate. A total of 651 IDUs participated in the study and are included in the analyses. IDU recruitment using RDS lasted 32 weeks, and the 16 seeds selected for the study generated 27 waves of recruitment. A total of 1,435 coupons were distributed through these recruits, and 630 IDUs bearing recruitment coupons presented themselves for the study for a 44% coupon return rate. Of these recruits, 571 (91%) were deemed eligible and were enrolled into the study. In total, 534 subjects who completed valid interviews were included in the current analysis. There was some overlap in participation of both studies, with 24.7% (adjusted; 22% crude) of IDUs in the RDS study reporting that they had participated in the TS study in the past year.
Comparison of IDUs recruited using targeted sampling and RDS, San Francisco, 2005
RDS (NHBS) (n = 534)
TS (UHS) (n = 651)
Adjusted, 95% CI
Unadjusted 95% CI
Native Hawaiian/Pacific Islander
Ever tested for HIV
Self-reported HIV status
Did not receive results
Ever tested positive for HCV
Ever participated in alcohol/drug treatment programs
Programs used in the past 12 months
In-patient drug treatment
Residential drug treatment
Outpatient drug treatment
Purchased syringes at pharmacy in 12 months
Obtained syringes via SEP in the past 12 months
In order to assess the geographic distribution of each sample, we assessed the proportion of recruited IDUs by zip code of where they usually live (Fig. 1a and b). The geographic distribution of IDUs for each sample shows a similar pattern across zip codes. One notable exception is that compared to the TS sample, the RDS sample had a significantly lower proportion of participants in zip code 94124, which is an impoverished, geographically isolated neighborhood consisting largely of African Americans. Only the TS study had a field site located in zip code 94124 (marked by stars in Fig. 1b).
Our research shows that TS and RDS both resulted in sizable and diverse samples of IDUs in San Francisco. IDUs in San Francisco are easily accessible through community-based street-intercept methods, as demonstrated by the TS study. They are socially networked and suitable for peer-recruitment sampling, as demonstrated by the RDS study. We were able to satisfy the RDS methodological requirements, which means that we were theoretically able to generate representative estimates of demographic variables and indicators of access to health care of the IDU population.
We found that the TS and RDS studies reached similar samples of IDU in terms of demographic characteristics, with the exception of African Americans. African Americans represent a small minority of the population in San Francisco (6.9% in 2007 per the US Census Bureau39) and are largely concentrated in one neighborhood in the southeastern part of the city (Bayview/Hunter's Point, zip code 94124), which in 2005 was isolated from the rest of the city geographically by highway systems and poor access via public transportation (two bus lines, no streetcar, no subway). Through the secondary analysis and ethnography components of TS methodology, this neighborhood was identified as having a large population of IDUs, and a field site was placed in its midst. While the RDS study attempted to include IDUs from this neighborhood (three seeds), we hypothesize that the long travel time between the neighborhood and the RDS data collection site may have limited study participation. This finding suggests that when utilizing RDS, it may be wise to implement some of the steps of TS during the planning stages, which can be used to decide how many and where to establish data collection sites. This could include collecting secondary indicator data and conducting a brief ethnography to figure out where IDUs are located and what cultural factors may be important to consider in designing the procedures of the study. The finding also underscores the importance of including geographic markers (zip codes or census tracts) on surveys of IDUs to assess geographic reach.
The two samples differed substantially with respect to the proportion of participants who had utilized current prevention and care programs in San Francisco. Fewer RDS participants reported use of drug treatment and SEP, although SEP use was very high overall. This finding implies that RDS may be more effective than TS at reaching IDUs not receiving services. If the use of RDS is coupled with a proactive system of referral to services, the study can potentially bring IDU with less access to care into prevention and treatment programs.
The TS sample had a higher proportion of IDU who had reported “ever testing positive” for HCV, compared to the RDS sample. This finding might be because the TS sample included more IDU who had ever been in drug treatment. Those who have access to care are more likely to be tested for HCV. Another explanation is that the TS study provided HCV testing for participants from 1998 to 2001, and many in the 2005 TS sample had participated in the earlier cross-sections. HCV prevalence among IDUs in the TS study during those years was 91%.34
There are several limitations to this study that need to be considered when interpreting its results. Although TS and RDS were both very effective in generating diverse samples, there is no way of knowing whether these samples are representative of the target population as a whole. TS has several limitations that should be noted. It requires that IDUs are part of a street culture that is easily accessed by an outreach worker, relying on the talents of the outreach workers who are involved in the ethnography and recruitment. For example, younger IDUs may not be interested in talking with or may not trust an older outreach worker. In RDS, it is possible to assess whether homophily exists and then corrects for it using weighting in RDSAT. Another limitation of the study is that the TS sample consisted of the 37th cross-section of a long-standing study, which may have generated a different sample than if it had been a first-time TS sample. The reputation of the study in the community may have biased the attributes of those who were willing to participate. RDS also has several limitations. It cannot access those who are not socially networked or isolated. For example, in an RDS survey of IDU in Cairo, Egypt, chains of referrals did not reach women.35 In a survey of IDU in Tehran, Iran, the final sample lacked women and Afghan IDUs, despite empirical evidence supporting the existence of such groups.36 Because RDS does not generally involve an intensive formative research phase, it is not easy to understand how the study procedures might bias who decides to participate.
There are also limitations common to both of these studies. Response bias is a limitation of all surveys, regardless of sampling methods, particularly when studying populations most at risk for HIV. Measuring the response rate is more challenging in surveys of IDUs, given that researchers usually are not present when study subjects are recruiting their peers to find out how many potential subjects are approached by recruiters. There were some differences in eligibility criteria in the two studies, which may account for some of the observed differences. Specifically, the RDS study was limited to English speakers while the TS study also included questionnaires in Spanish. However, in reality, no study participants in the TS study chose to be interviewed in Spanish. The time frame for injection criteria was 12 months in the RDS study and 30 days in the TS study. This may mean that some of the participants in the RDS study were less likely to be active IDUs. This could have biased the drug treatment estimates for example, even though drug treatment was less prevalent in the RDS sample. Finally, the RDS study required San Francisco residency. From our 20 years of experience conducting research with IDUs in San Francisco, we feel it is highly unlikely that more than 2% of the TS sample consisted of IDUs who reside outside of San Francisco. The majority of variables is self-reported and subject to recall bias and social desirability responses. However, we expect that these sources of biases would affect both samples similarly given that populations and type of questions were relatively comparable. Moreover, previous studies of IDUs have found good reliability with respect to the measures we used to assess the main outcomes in this study.37,38 And finally, because of the different statistical methods needed for analyzing RDS and TS data, we were not able to combine the datasets and conduct multivariate comparisons to assess whether the data are significantly different. Instead, we relied on assessing whether the 95% CIs overlapped in the estimates.
In order for quantitative studies of IDUs to be useful for identifying the prevalence and factors associated with various social and medical outcomes, it is important they use a sampling methodology that optimizes generalizability to the target population. Given that it is not feasible to carry out population-based randomized sampling of drug users, it is important to choose methods that are most likely to fit the geographic, social, and political characteristics of the area. It appears that in San Francisco, both RDS and TS are useful tools for recruiting IDUs. Our study suggests that perhaps a hybrid model is best suited for San Francisco, whereby the ethnographic and secondary analysis components of TS would precede initiation of RDS. This would optimize the benefits of both methods by assuring that study procedures enable RDS sampling to actualize its promise of representation.
This study was supported by the National Institute on Drug Abuse (R01DA023377), the Centers for Disease Control and Prevention, and the SFDPH. None of the authors have any financial interest in anything related to this manuscript.
We would like to thank the following individuals for assistance with the studies, including Allison Futeral, Brent Herrera, Theresa Ick, Steve King, Binh Le, Jason Mehrtens, and Askia Muhammad.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.