Journal of Urban Health

, Volume 83, Issue 3, pp 459–476

Effectiveness of Respondent-Driven Sampling for Recruiting Drug Users in New York City: Findings from a Pilot Study

Authors

    • Behavioral and Clinical Surveillance Branch, Division of HIV/AIDS Prevention-Surveillance and Epidemiology, National Center for HIV, STD and TB Prevention, Centers for Disease Control and Prevention
  • Douglas D. Heckathorn
  • Courtney McKnight
  • Heidi Bramson
  • Chris Nemeth
  • Keith Sabin
  • Kathleen Gallagher
  • Don C. Des Jarlais
Article

DOI: 10.1007/s11524-006-9052-7

Cite this article as:
Abdul‐Quader, A.S., Heckathorn, D.D., McKnight, C. et al. JURH (2006) 83: 459. doi:10.1007/s11524-006-9052-7

Abstract

A number of sampling methods are available to recruit drug users and collect HIV risk behavior data. Respondent-driven sampling (RDS) is a modified form of chain-referral sampling with a mathematical system for weighting the sample to compensate for its not having been drawn randomly. It is predicated on the recognition that peers are better able than outreach workers and researchers to locate and recruit other members of a “hidden” population. RDS provides a means of evaluating the reliability of the data obtained and also allows inferences about the characteristics of the population from which the sample is drawn. In this paper we present findings from a pilot study conducted to assess the effectiveness of RDS to recruit a large and diversified group of drug users in New York City. Beginning with eight seeds (i.e., initial recruits) we recruited 618 drug users (injecting and non-injecting) in 13 weeks. The data document both cross-gender and cross-race and -ethnic recruitment as well as recruitment across drug-use status. Sample characteristics are similar to the characteristics of the drug users recruited in other studies conducted in New York City. The findings indicate that RDS is an effective sampling method for recruiting diversified drug users to participate in HIV-related behavioral surveys.

Keywords

Human immunodeficiency virusRecruitment of drug usersRespondent-driven samplingSampling hidden populations

Introduction and Background

In 2005, the Centers for Disease Control and Prevention implemented the National HIV Behavioral Surveillance (NHBS) among injecting drug users (IDUs) in 25 U.S. metropolitan statistical areas (MSAs). Prior to the implementation of NHBS, CDC conducted a pilot study in New York City to assess the effectiveness of respondent-driven sampling to recruit a large and diverse sample of drug users. While NHBS is implemented among IDUs, however, this study recruited both injecting and non-injecting drug users.

Appropriate development and implementation of HIV prevention services for drug users (DUs) at risk for HIV, Hepatitis B and Hepatitis C require gathering data on risk behaviors from a non-biased sample of DUs. In the last two decades, a variety of sampling methods have been used to recruit DUs in order to collect risk behavior data and to direct DUs to prevention services. These include venue-based time and space sampling, targeted sampling, and snowball sampling. While all of these methods have been successful in recruiting DUs and collecting useful data on risk behaviors among this group, these methods have a number of limitations.

Time and space and targeted sampling provide only limited coverage of target populations, and those members of the target population who are missed may differ from those who are captured.1 Venue-based time and space sampling is best suited for populations clustered in large public venues, where probability samples can be drawn from those frequenting these venues. This method has limited coverage because it excludes those who do not attend those settings.

Targeted sampling fares well when compared to other forms of convenience sampling in recruiting large and varied groups of DUs.25 This type of sampling may allow access to a large number of non-institutionalized DUs; however, the probability of an individual's selection is unknown and non-random. In addition, a significant amount of formative research is required to develop the initial sampling frame and to continue to update sampling frames, as needed.

Snowball sampling can achieve broader coverage because respondents, including those who do not attend public venues, are reached through their social networks. In snowball sampling, the researcher recruits a few eligible individuals who are then asked to bring in other potential respondents or provide references (contact details) for other potential respondents. These persons are then recruited and are also asked to bring in or provide references for other potential respondents and so on. Because the respondents are not randomly drawn and are dependent on the subjective choices of the first respondents, snowball samples are biased and do not provide the basis for valid generalizations to the populations from which the sample was drawn.6 This method cannot reach those who are not connected to any network, and it oversamples those who have more inter-relationships.7

A recent development in sampling methodology, respondent-driven sampling (RDS), was designed to overcome these limitations by providing breadth of coverage with statistical validity.8 RDS combines a modified form of chain-referral or snowball sampling with a mathematical system for weighting the sample to compensate for its not having been drawn randomly. RDS is based on the premise that peers are better able than outreach workers and researchers to locate and recruit other members of a hidden population.8,9 RDS provides means for sample selection and evaluation of the reliability of the data obtained. As such, it allows for inferences about the characteristics of the population from which the sample is drawn.10 Similar to other chain-referral sampling methods, RDS starts with a small number of peers (called seeds) and expands through successive ‘waves’ of peer recruitment. First-wave respondents recruit second-wave respondents, second-wave respondents in turn recruit the third-wave, and this continues until the desired sample size is reached. Respondents generally recruit those with whom they have a preexisting relationship. In aggregate, respondents have been found to recruit as though they are sampling randomly from their personal social networks.11 The procedures in RDS incorporate the direct recruitment of peers by their peers, recruitment quotas, and a dual system of incentives. Respondents are remunerated for completing the study and also for successfully recruiting other respondents from within their networks.12 With its reliance on social networks, RDS has the potential, like other chain referral methods, to reach individuals who do not go to public venues.

However, unlike other chain referral methods, RDS also allows for the assessment of relative inclusion probabilities for members of the population based on a mathematical model of the recruitment process. This model is derived from a synthesis and extension of Markov chain theory13 and biased network theory14 and provides the basis for calculating both unbiased estimators and standard errors or confidence intervals.10 These calculations are based on information collected from respondents regarding their relationship with both their recruiters and recruits and the size of their own social networks.

The statistical theory upon which RDS is based suggests that if peer recruitment proceeds through a sufficiently large number of waves, the composition of the sample will stabilize, becoming independent of the seeds from which recruitment began and thereby overcoming any bias the nonrandom choice of seeds may have introduced. This stable sample composition is termed the “equilibrium” (see Appendix). Furthermore, RDS statistical theory suggests that the number of waves required for equilibrium to be attained depends on network homophily. When groups are mutually isolated, that is, when network homophily is strong, equilibrium is attained slowly because recruitment chains have difficulty moving across group boundaries. In contrast, when homophily is moderate or weak, recruitment chains are able to move more easily across group boundaries, thereby potentially including even the most isolated members of the target population. Homophily is measured using an index that has a value of one when homophily is maximal, so all groups are mutually isolated with no cross-cutting ties. The index has a value of zero in the absence of homophily, so social ties are formed irrespective of group membership, through random mixing. The homophily index has a value of negative one when groups have only out-group but no in-group ties. Based on this index value, the number of waves required to attain equilibrium can be calculated, and this establishes the minimum number of waves required to overcome bias introduced by the choice of seeds. Attainment of equilibrium can also be confirmed by comparing the sample composition to the equilibrium value, as calculated using Markov chain theory.8,9

The statistical theory on which RDS is based suggests that well-connected respondents within the target population also affect the sampling process. What matters is their personal network size as defined by the number of friends and acquaintances they have within that population. Groups with larger average network sizes are over-sampled because more recruitment paths lead to them. Consequently, population estimates derived from RDS are weighted to compensate for this over-sampling, through a process termed “post-stratification.”

In this paper, we present findings from a study conducted to assess the effectiveness of RDS to recruit a large and diverse sample of drug users in a large urban area in preparation for behavioral surveillance. In addition, the study posed three additional questions.
  1. 1.

    Can a set of seeds from a small geographic area be used to recruit DUs from a broad geographic area?

     
  2. 2.

    Does RDS yield a sample with sufficient sociometric depth (i.e., the number of network links) to attain an equilibrium distribution of respondents by gender, race/ethnicity, and drug use irrespective of the choice of seeds?

     
  3. 3.

    Using RDS, how long does it take to recruit 500 DUs for enrollment into a survey?

     

Materials and Methods

In order to address the first research question, we chose to enroll seeds from a single location, the Lower East Side Syringe-Exchange program in New York City. In other respects, the seeds were selected to ensure diversity to make certain that we included individuals from different social networks, as socio-demographic characteristics tend to shape social networks. The distribution of seeds along race/ethnicity and gender was based on existing data on characteristics of DUs on the Lower East Side of New York City.

The syringe-exchange program staff was requested to identify injecting drug users who met the following criteria: trusted and well-liked by other drug users, connected to many people including other drug users, and having good verbal communication skills. Eight IDUs (syringe-exchange program participants) were identified by the syringe-exchange program staff and referred to the research project staff. Individuals were briefly screened to ensure that they met the above mentioned criteria. They were also asked to show track marks to ensure that they had injected illicit drugs in the past six months. The seeds were asked to come to a research storefront on the Lower East Side the following day to complete a computer-assisted interviewer-administered personal interview (CAPI) and to have their blood drawn for an HIV test. Each seed was told that they would receive $20 compensation for their time.

After recruitment, seeds were assessed for eligibility, which included the following: 1) injected illicit drugs in the recent past (6 months); 2) was aged 18 years or older; 3) spoke English adequately to permit informed consent; and 4) resided or used or purchased drugs on the Lower East Side. Following eligibility assessment and provision of informed consent the seeds were interviewed using a computer-assisted interviewer-administered personal interview (CAPI). In addition to basic demographic information, the interview focused on drug and sexual risk behaviors, HIV testing history, exposure to HIV prevention services, and health status. The study was conducted at a storefront on the Lower East Side of New York City.

After the interview was completed, the study staff asked the seeds if they would be willing to help recruit other respondents if given a small incentive ($10 for each eligible drug user recruited by the respondent that completed the study). All seeds agreed to recruit, and they were given a brief training on the recruitment process, such as whom to recruit and how to recruit. They were then given three uniquely coded, non-replicable coupons and were told to give these coupons to three injecting or non-injecting drug users they knew. Each coupon was printed with a serial number, the study name, study location, and a brief explanation of the study.

Subsequently, all persons who came to the study storefront with a valid coupon were assessed for eligibility. Eligibility criteria for recruits included the following: 1) recent “hard drug” use (past 6 months; not marijuana); 2) aged 18 or older; and 3) spoke English adequately to permit informed consent. Unlike the criterion for seeds, injection drug use and residence or drug use or drug purchase on the Lower East Side were not required. All eligible recruits then received an explanation of the study's purpose and the nature of the questions to be asked and were asked to provide informed consent. Following provision of consent, recruits were enrolled and interviewed. Eligibility was determined using a screener in which injectors were asked to show track marks, and non-injectors were asked about type of drugs they used, how often they used them, and how they used them.

After completion of the interview, all respondents were asked if they would be willing to help to recruit other respondents for a small incentive. If they agreed, they received training on recruitment and were given three coupons. As part of the training, the respondents were told to give these coupons only to drug users they knew and not to strangers of whose behaviors they were not sure. They were told that someone they knew would be more likely to come in for the study, and they (recruiters) would receive incentives. They were also told that the coupons should be considered currency, and they should recruit people whom they thought would come in to participate in the study. A recruiter was given only three coupons to limit earnings from peer recruitment and thereby prevent the creation of professional recruiters and also to produce longer recruitment chains. The process continued until the sample size exceeded 500 persons. Coupon distribution was stopped 3 weeks prior to the termination of data collection to ensure that anyone with a valid coupon and eligible to participate was interviewed. Interviews lasted about 1 h, and respondents received $20 for completing the interview and having their blood drawn for a HIV test and later were given $10 for each of up to three eligible drug users they recruited.

Tracking of Coupons and Network Information

The coupons that were given to respondents were recorded using custom software—IRISPlus—developed for tracking coupons. The software keeps track of recruitment data, including who recruited whom and when they were interviewed; data from IRISPlus were used to calculate the length of time to complete 500 interviews. When eligible respondents called or came to the storefront to participate in the study, they were assigned a unique identification number. The identification number and other identifying criteria, such as any physical traits (e.g., tattoos), were also noted. This allowed the study staff to check the validity of the coupon and verify the identity of the respondent who had recruited the eligible respondent. Once verified, the recruiter could receive payment for recruiting. All respondents were asked to come back to the storefront to check to see if any of the recruits had completed an interview, and after verification, they were paid $10 for each eligible recruit. Only the recruiter was allowed to redeem incentives, so recruitment rights were non-transferable.

All eligible respondents were asked several questions regarding their networks, including the size of their drug using networks, how many of the people in this network they had seen in the last six months, and the demographic characteristics of those with whom they had had contact in the last 6 months. In addition, when a respondent returned to collect the incentive for successfully enrolled referees, s/he was asked a number of questions about whether anyone refused to accept coupons and the characteristics of those who refused them. This data was combined with basic demographic and risk behavior information to determine if an equilibrium distribution of respondents was enrolled and if adequate sociometric depth was achieved.

Data Analysis

Recruitment matrices and confidence intervals based on coupon referral data were calculated using RDS Analysis Tool (RDSAT), Version 5.0.1 The link between the recruiter and the recruit is documented by matching the serial numbers of the recruitment coupons given to each respondent with the serial numbers of the coupons returned to the project by the recruits. A matrix is then constructed based on the relevant characteristics of the recruiter and recruit. RDS population estimates were calculated based on these recruitment matrices and on the estimated size of the network for each category of respondents.

Findings

Using RDS, 618 DUs were recruited during 18 waves. Data collection began with eight seeds recruited from the Lower East Side syringe exchange program. The eight seeds produced a total of 583 documented peer recruitments and 27 cases for which recruitment data was missing. Table 1 provides a description of the sample composition. Seventy-six percent of the respondents were male and 24% female. The race/ethnicity breakdown included 35% Hispanic, 46% black, and 14% white. The mean age of the sample was 44 years and mean age of first drug use was 19 years.
Table 1

Socio-demographic characteristics of drug users in New York City recruited through RDS, 2004—N=618

 

n

%

Gender

Male

469

76

Female

149

24

Race/Ethnicity

Black

285

46

Hispanic

218

35

White

88

14

Native American

4

<1

Multi-racial not specified

23

4

Education

No high school diploma or GED

231

37

High school diploma

272

44

>12 years of education including college

115

19

Income

From regular and legal source

143

23

From welfare/disability

298

43

From other sources including illegal means

174

28

Live on the Lower East Side

358

58

Mean age

44 years

Mean age of first drug use

19 years

Mean age of first injection drug use (Current and former IDUs, n=382)

22 years

Table 2 provides a description of drug use status, recruitment status and information in relation to whether they bought and used drugs on the Lower East Side. More than half (58%) of the sample lived on the Lower East Side, where the study was conducted. A large majority bought drugs on the Lower East Side (64%) and used drugs on the Lower East Side (69%). Forty-two percent were in some type of drug treatment program at the time of the interview. Forty-three percent were current injectors (injected within the last 6 months), 19% had injected more than 6 months ago, and 38% never injected any drugs. The never injectors either sniffed or smoked heroin, cocaine, speedball, crack, amphetamines, or street methadone. Frequency of use varied between less than once a month to ten or more times a day, almost every day.
Table 2

Drug use status and recruitment status of drug users in New York City recruited through RDS, 2004—N=618

 

n

%

Injection status

Injected <6 months ago

263

43

Injected >6 months ago

119

19

Never injected

236

38

Any drug purchase on the Lower East Side

398

64

Any drug use on the Lower East Side

424

69

Currently in drug treatment

257

42

Recruitment status

Seed

8

1

Peer recruit

583

94

Undocumented recruit*

27

4

*Recruitment records were unavailable.

Table 3 shows the recruitment patterns by gender. There was a tendency towards within-gender recruitment. Though females made up an estimated 23% of the population, women recruited 38% other females and 62% males. Similarly, though males made up an estimated 77% of the population, men recruited 81% other males and 19% females. However, cross-gender recruitment was substantial (29%), implying that recruitment chains did not become trapped within a single gender group but instead had little difficulty crossing gender lines. This explains why there is a strong convergence between the sample composition (76% males) and the equilibrium sample composition (77% males).
Table 3

Cross-gender recruitment of drug users in New York City, 2004

Gender of recruiter

Gender of recruit

Male

Female

Total

Male

352

81

433

(81%)

(19%)

(100%)

Female

90

56

146

(62%)

(38%)

(100%)

Total

442

137

579*

Sample composition

76%

24%

100%

Population estimate—P

77%

23%

100%

Standard error of P

0.03

0.03

 

Equilibrium sample Composition

77%

23%

100%

Homophily

0.18

0.19

 

Estimated network size

6.51

6.60

 

*Because of missing information, this table is based on 579 respondents.

Table 3 also reports on two above-discussed terms that have potentially important effects on the referral process, homophily and estimated network size. Homophily by gender was modest and similar across gender, 0.18 for males and 0.19 for females. This indicates that males formed networks as though 18% of their ties were to other males, and the other 82% of ties were formed independent of gender, through random mixing; the corresponding figure for females was 19% ties to other females and 81% ties through random mixing. Females on average had slightly larger networks; the estimated network size for males was 6.51, and for females it was 6.59. In this study the difference by gender in homophily and network size is small, so the population estimate for female DUs (23%) remains close to the sample composition (24%).

The recruitment patterns by race/ethnicity indicate inter-ethnic mixing (Table 4). As with the case of gender, a tendency toward in-group recruitment coexists with considerable cross-group recruitment. Overall, Hispanics recruited 58% Hispanics, 13% white, and 25% blacks. Similarly, blacks recruited 33% non-blacks and whites recruited 60% non-whites. There is a strong convergence between the sample composition (14% white, 47% Black, and 35% Hispanic) and the equilibrium sample composition (15% white, 45% Black, and 36% Hispanic). Homophily by race/ethnicity is positive: 0.31 for whites, 0.35 for Blacks, and 0.34 for Hispanics. Whites had a larger network size (9.1) compared to blacks (5.75) and Hispanics (6.72). The population estimates differed slightly from the sample compositions. For whites it was 10%, and for the blacks it was 51%. For Hispanics, it was close to the sample composition (35%).
Table 4

Cross-race/ethnic recruitment of drug users in New York City, 2004

Race and ethnicity of recruiter

Race and ethnicity of recruit

White

Black

Hispanic

Other

Total

White

29

19

22

6

76

(38%)

(25%)

(29%)

(8%)

(100%)

Black

19

192

63

11

285

(6%)

(67%)

(22%)

(4%)

(100%)

Hispanic

26

49

113

8

196

(13%)

(25%)

(58%)

(4%)

(100%)

Other

7

13

6

0

26

(27%)

(50%)

(23%)

(0%)

(100%)

Total

81

273

204

25

583

Sample composition

14%

47%

35%

4%

100%

Population estimate—P

10%

51%

34%

4%

100%

Standard error of P

0.02

0.04

0.03

0.01

 

Equilibrium sample composition

15%

45%

36%

4%

100%

Homophily

0.31

0.35

0.34

−1.0

 

Estimated network size

9.10

5.76

6.72

7.80

 
Figure 1 depicts the recruitment networks by race/ethnicity generated by the chain-referral process. As in most RDS studies, a single large recruitment network is dominant (Fig. 1). The largest network was initiated by a Hispanic seed who recruited a non-Hispanic white and two non-Hispanic blacks. This network includes more than 90% of the respondents and 18 waves. The next largest network was also initiated by a Hispanic seed, as was the smallest network (bottom center). The third largest network (right center) was initiated by a black seed. In this study, there were four unproductive seeds (i.e., did not successfully recruit any additional participants). Unproductive seeds (bottom right) included one black seed and all three of the white seeds.
https://static-content.springer.com/image/art%3A10.1007%2Fs11524-006-9052-7/MediaObjects/11524_2006_9052_Fig1_HTML.gif
Figure 1

Recruitment networks NYC drug users where arrows point from recruiter to recruit.**

**Color Coded by Race/Ethnicity (Non-Hispanic Black = white, Non-Hispanic White = light grey, Hispanic = black, Other = dark grey) Seeds who recruited are enlarge, and seeds who did not recruit are shown at the bottom right.

Both injecting and non-injecting drugs users were recruited for the study. Table 5 shows the recruitment patterns in terms of drug use status. There were three groups of DUs: current injecting drug users (injected drugs within the last 6 months), former injectors (injected more than 6 months ago), and non-injectors (used other illicit drugs). The table indicates substantial recruitment across drug use statuses. Current injectors recruited other current injectors, former injectors, and non-injectors. Similarly, non-injectors recruited current injectors, former injectors, and other non-injectors. The table shows the strong convergence between sample compositions and the equilibrium sample compositions. The population estimates, however, are different from the sample compositions. For current injectors it was 25%, for former injectors it was 24%, and for non-injectors it was 51%. This is due, principally, to differences in network sizes. For example, current injectors had networks nearly twice (9.9) as large as those of either former (5.2) or non-injectors (4.9); therefore current injectors can theoretically be expected to have been over-sampled, so the RDS population estimator compensates by deflating the sample composition (42%) down to the population estimate of 25%.
Table 5

Recruitment by drug use status in New York City, 2004

Drug use status of recruiter

Drug use status of recruit

Current injectors*

Former injectors**

Non-injectors***

Total

Current injectors

147

51

65

263

(56%)

(19%)

(25%)

(100%)

Former injectors

45

29

51

125

(36%)

(23%)

(41%)

(100%)

Non-injectors

53

36

106

195

(27%)

(18%)

(54%)

(100%)

Total

245

116

222

583

Sample composition

42%

20%

38%

100%

Population estimate—P

25%

24%

51%

100%

Standard error of P

0.03

0.03

0.03

 

Equilibrium sample composition

42%

20%

39%

100%

Homophily

0.41

−0.02

0.07

 

Estimated network size

9.91

5.21

4.86

 

*Those who injected within the last 6 months.

**Those who injected more than 6 months ago.

***Those who never injected any drug.

Because the seeds in this study were recruited from a needle exchange program on the Lower East Side, geographic diversity of subsequent enrollees was of special interest. Table 6 shows recruitment patterns based on whether respondents lived on the Lower East Side or elsewhere. Though within-area recruitment predominated, cross-area recruitment occurred as well, with Lower East Side residents recruiting outsiders (31%) and others outside the Lower East Side area recruiting Lower East Side residents (43%). There is strong convergence between the sample compositions and the equilibrium sample compositions. The population estimates also remain close to the sample compositions.
Table 6

Recruitment by residency of drug users in New York City, 2004

Recruiter lives on the Lower East Side

Recruit lives on the Lower East Side

No

Yes

No

127

95

(57%)

(43%)

Yes

112

249

(31%)

(69%)

Total

239

344

Sample composition

41%

59%

Population estimate—P

40%

60%

Standard error of P

0.03

0.03

Equilibrium sample composition

43%

57%

Homophily

0.28

0.23

Estimated network size

6.83

6.34

Geographic diversity was further analyzed based on the zip code of respondents' place of residence. Zip codes provide a useful proxy for location because in high-density areas such as Manhattan, zip code size is only about one half square mile. Respondents were drawn from a total of 70 zip codes, the most distant of which was more than 200 miles away in upstate New York. A majority of respondents lived within a few miles of the center of the needle exchange, but 25% lived more than 5 miles away, and 10% lived more than 8 miles away.

The study also looked at the number of recruitment waves required to reach equilibrium. This number is positively related to the strength of homophily; the stronger the homophily, the greater the tendency for recruitment chains to become trapped within specific groups. This would increase the number of waves required for the sample to attain a stable equilibrium where any bias due to selection of seeds would be eliminated. For example, for gender it only required four waves to reach equilibrium. For race/ethnicity, six waves were needed to reach equilibrium, reflecting the strong homophily based on race/ethnicity, as compared to gender. Among those who injected heroin within the last 6 months, four waves were required to reach equilibrium. Given that the data set has 18 waves, this suggests that the sample had more than ample sociometric depth to reach even the most isolated members of the target population.

Figure 2 provides information on the rate of recruitment during our study. Only 13 weeks were required to enroll the required sample of 500. During the first week of data collection, 30 drug users were recruited and interviewed. The recruitment rate was faster during the next 12 weeks. Recruitment and data collection were completed within 13 weeks and, as stated above, exceeded the required sample size.
https://static-content.springer.com/image/art%3A10.1007%2Fs11524-006-9052-7/MediaObjects/11524_2006_9052_Fig2_HTML.gif
Figure 2

Weekly and cumulative recruitment of 618 drug users in New York City, 2004

Discussions and Conclusion

Prior to this study, RDS was primarily used to recruit DUs to participate in research intervention studies where they also received HIV prevention services. Other studies have focused on the statistical validity of RDS.911 In contrast, this study examined the effectiveness of RDS for use in HIV behavioral surveillance and was designed to answer three research questions stated above regarding geographic depth, sociometric depth, and speed.

Enrollees in our study were ethnically diverse and varied by drug use status and type of drug used. The demographic characteristics of this study's respondents are similar to the characteristics of drug users recruited in others studies conducted in New York City.15,16 According to a study of injecting drugs users (IDUs) conducted in 1997–1999,15 77% of the IDUs in Lower East Side were white compared to 13% in East Harlem. During the last few years, the Lower East Side has gone through significant demographic changes, and these changes are reflected in the study sample. Gender characteristics of the study respondents are similar to the gender characteristics of the 1997–1999 study.15 Sixty-nine percent of the sample recruited from the Lower East Side and 70% of the sample recruited from East Harlem were males. This is similar to the present study sample with 76% male. Mean age of the sample was similar to the mean age of IDUs recruited in other studies.15,16 Mean age at first injection was 22 years, which was slightly higher than the mean age at first injection among the 1997–1999 study respondents recruited from the Lower East Side (18 years) and from East Harlem (20 years). However, it should be noted that the current sample includes both injecting and non-injecting drug users, and the comparison is made between IDUs in the earlier study and both injecting and non-injecting drug users in the current study.

The 1997–199915 study was based on a street-recruited sample, and the second study16 was based on IDUs entering the Beth Israel Center drug detoxification program. The similarities between these two study samples and the current RDS-based sample may indicate that in this instance, street-recruitment and recruitment from a detoxification program was not different from RDS recruitment strategy. However, the other two samples depended on either outreach workers or interviewers to conduct the recruitment, so they were convenience samples, and hence results could not be validly generalized to the population of drug users. Moreover, the RDS sample had greater coverage, due to inclusion of individuals who were in drug treatment as well as those not in any treatment and inclusion of individuals who were accessible on the street as well as those who avoided the streets in drug-use areas. In addition, RDS provides a probability sample without requiring the highly extensive formative research required by other probability methods, such as time-location sampling.

Findings from our study suggest that, using RDS, a small number of seeds recruited from a small, well-defined geographic area could successfully recruit DUs from a broader geographic area. RDS produced a sample that is geographically diverse. Although the majority of enrollees were from Manhattan, the sample was spread among 70 zip codes and included residents of four of the city's five boroughs (except Staten Island).

All seeds were current injectors. However, they were asked to recruit any drug users—both injecting and non-injecting drug users. We wanted to assess the extent of drug users' networks by recruiting IDU seeds only and having them recruit both IDUs and non-IDUs. Our recruitment matrix (Table 5) showed that despite having only IDU seeds, we were able to recruit a diverse (both IDU and non-IDU) sample of drug users.

Our study demonstrated that RDS was able to yield a sample with sufficient sociometric depth to attain an equilibrium distribution after only a relatively small number of waves. In order to achieve equilibrium by race, gender, and drug use status, we needed to enroll 6, 4, and 4 waves, respectively.

Data collection required about 13 weeks. We had one full-time research assistant who conducted the initial eligibility assessment, including coupon management, and two full-time interviewers who conducted the interviews. In the first two weeks the coupons were redeemed on the same day. As more coupons were distributed, the project staff instituted an appointment system for both interviews as well as for receiving reimbursement for coupon distribution. In order to avoid further overcrowding at the storefront, the project staffs began to post-date all of the coupons distributed by two days.

Our results suggest that RDS provides an effective method for recruiting a large sample of DUs for conducting HIV-related behavioral surveys in a large city. The substantial level of cross-gender and inter-ethnic mixing facilitated the emergence of diversity within each recruitment chain by reducing the number of waves required for equilibrium to be approximated. In other studies on RDS, usually after a modest number of waves, the sample composition reaches a projected equilibrium depending on the patterns of cross-gender and cross-race/ethnic recruitment. As mentioned earlier, a recruiter was only given three coupons. Limiting the number of coupons facilitated the lengthening of recruitment chains. Long recruitment chains provide the means to overcome bias from the choice of seeds. After at most six waves, the sample composition stabilizes, remaining unchanged during future waves.8,12

When sampling from a hidden population, such as DUs, direct assessment of the validity of the sample is not possible. However, comparison with other studies provides useful information for assessing potential bias. The convergence of our study results with those from previous studies of DUs conducted on the Lower East Side provide empirical support for theoretic deductions showing that RDS produces statistically unbiased results when the assumptions of the statistical theory upon which it is based are satisfied.11

Even though the study was successful in recruiting a diverse group of drug users in New York City, there are some limitations because New York City is unique in several respects. New York City is the largest city in the Unites States. The sheer size and population density make the city demographically unique, and this may well have influenced the network characteristics and network size of DUs. Illicit drug use, especially heroin use, has a long history in New York City and particularly on the Lower East Side. Illicit drug markets developed early and rapidly in New York City.17 Because of this long history, the drug culture is much less segregated in terms of race/ethnicity and geographic locations in New York City compared to other cities in the USA. Drug markets are more integrated in terms of race/ethnicity, whereas some other cities may have very segregated markets.18,19 HIV among injecting drugs users was traced back to 1978, and prevention efforts have made significant reduction of HIV prevalence among IDUs in New York City.20,21 The drug users on the Lower East Side in particular and the New York City in general are one of the most studied groups. The city has very good public transportation system, which also facilitated social linkages among drug users living in different parts of the city. Cities without good public transportation may have much less geographic linking. The study findings were also limited to English speaking drug users. In addition, as 42% of the respondents were in some type of drug treatment program, the study findings should be treated with some caution.

Currently, CDC is using RDS to conduct HIV behavioral surveillance among IDUs in 25 U.S. metropolitan statistical areas. This use of RDS will provide further opportunities to assess its effectiveness in multiple and diverse settings.

Human Participant Protection

The study protocol was reviewed and approved by the Centers for Disease Control and Prevention, Beth Israel Medical Center and the New York State Department of Health institutional review boards.

Appendix

A Technical Summary of Respondent-Driven Sampling

Respondent-driven sampling has three analytic components. These consist of procedures for computing: (1) the composition of the equilibrium sample and the number of waves required to reach equilibrium; (2) the estimated population composition while controlling for the effects of differentials in network size, homophily, and recruitment effectiveness; and (3) the homophily level of each group. This appendix summarizes each of these elements.

Computing the Composition of the Equilibrium Sample

Respondent-driven samples attain a stable composition after a modest number of recruitment waves. Computing this equilibrium requires solving a system of n linear equations, where n is the number of groups into which respondents are divided.8 Where respondents are divided into groups a, b,..., n; Sxy is the selection proportion, that is, the proportion of members of group X recruited by members of group Y, and Ex is the proportion of members of group X in the equilibrium sample E = (Ea, Eb,...En), the system of linear equations is:
$$\begin{array}{*{20}c} {1 = E_{a} + E_{b} + \ldots + E_{n} } \\ {E_{a} = S_{{aa}} E_{a} + S_{{ba}} E_{b} + \ldots S_{{na}} E_{n} } \\ {E_{b} = S_{{ab}} E_{a} + S_{{bb}} E_{b} + \ldots S_{{nb}} E_{n} } \\ \ldots \\ {E_{{n - 1}} = S_{a} \;_{{n - 1}} E_{a} + S_{b} \;_{{n - 1}} E_{b} + \ldots + S_{n} \;_{{n - 2}} E_{n} } \\ \end{array} $$
(1)
In a two-category system, this reduces to:
$$\begin{array}{*{20}c} {1 = E_{a} + E_{b} } \\ {E_{a} = S_{{aa}} E_{a} + S_{{ba}} E_{b} } \\ \end{array} $$
(2)
Substituting 1−Ea for Eb, and 1−Sab for Saa, and solving for Ea yields,
$$E_{a} = \frac{{S_{{ba}} }}{{S_{{ba}} + S_{{ab}} }}$$
(3)
For example, in the analysis of recruitment by gender presented in Table 3, where group A are males and B are females, the proportion of males recruited by females, Sba, is 0.626 (90/146). Similarly, the proportion of females recruited by males, Sab, is 0.190 (81/433). The equilibrium sample composition for males, is therefore
$$E_{a} = \frac{{0.626}}{{0.626 + 0.190}} = 0.767$$
(4)
Finally, given that the equilibrium proportions must sum to 1, the equilibrium composition of the sample for females, Eb, is 1 − 0.767 = 0.232.

Computing the Number of Recruitment Waves Required to Approximate Equilibrium

To compute the number of waves required to approximate the composition of the equilibrium sample, it is necessary first to identify the way in which the composition of the sample changes from wave to wave. Where \(X^{i}_{a} \) is the proportion of group A recruited during wave i, the proportional distribution of the n groups of respondents during any wave i is defined by the vector \(X^{i} = {\left( {X^{i}_{a} ,X^{i}_{b} , \ldots X^{i}_{n} } \right)}\). The composition of the sample during the subsequent wave, i+1, can be computed as follows:
$$\begin{array}{*{20}c} {X^{{i + 1}}_{a} = S_{{aa}} X^{i}_{a} + S_{{ba}} X^{i}_{b} + \ldots + S_{{na}} X^{i}_{n} } \\ {X^{{i + 1}}_{b} = S_{{ab}} X^{i}_{a} + S_{{bb}} X^{i}_{b} + \ldots + S_{{nb}} X^{i}_{n} } \\ {X^{{i + 1}}_{c} = S_{{ac}} X^{i}_{a} + S_{{bc}} X^{i}_{b} + \ldots + S_{{nc}} X^{i}_{n} } \\ \ldots \\ {X^{{i + 1}}_{n} = S_{{an}} X^{i}_{1} + S_{{bn}} X^{i}_{2} + \ldots + S_{{na}} X^{i}_{n} } \\ \end{array} $$
(5)

Computations begin with wave 0, X0, which specifies the seeds from which sampling began. Each subsequent wave is computed from the preceding wave. After the composition of the wave has been computed, the composition of the wave can be compared to the composition of the equilibrium sample. We consider equilibrium to have been approximated when the discrepancy is less than 2% between the equilibrium and wave-specific composition of the sample for each of the sample's n constituent groups.

Figure 3 illustrates the process by which a RDS sample approximates equilibrium, using data from Table 4's analysis of recruitment by race/ethnicity. Figure 3A simulates how sample composition would have changed wave by wave had recruitment begun with only Hispanic seeds. At wave 0 (i.e., the seed), Hispanics would compose 100% of the sample, but this group's representation declines wave by wave, reaching the equilibrium of 36% after six waves. Similarly, Fig. 3B simulates the recruitment process had it begun with only black seeds. In this scenario, Hispanics would compose 0% of the sample at wave zero and increase wave by wave, reaching the equilibrium of 36% after five waves. Thus, consistent with the statistical theory underlying RDS, the sample composition stabilizes, reaching a stable equilibrium after only a modest number of waves that is independent of the seeds from which recruitment began.
https://static-content.springer.com/image/art%3A10.1007%2Fs11524-006-9052-7/MediaObjects/11524_2006_9052_Fig3_HTML.gif
Figure 3

Sample composition stabilizes, reaching equilibrium independent of the choice of seeds. Recruitment by race/ethnicity, NYC drug users

Computing the Estimated Composition of the Population

Estimates of population composition that compensate for differentials in network size, homophily, and recruitment effectiveness are based on the reciprocity model.9 In RDS, respondents recruit overwhelmingly from those with whom they have network ties, generally friends, acquaintances, or relatives. Such ties are reciprocal because a link from any individual X to Y implies that a link also exists from Y to X. Hence for two groups A and B, the number of links from A to B (Tab) will equal the number from B to A (Tba), i.e., Tab = Tba. Furthermore, the number of ties from any group X to Y is the product of four terms: the number of persons in the population (Z), the proportional size of the group (Px), the mean network size of group members (Nx), and the proportion of ties from that go from X to Y (Sxy), i.e., xy = Z PxNxSxy. Hence, for two groups, A and B, ZPaNaSab = ZPbNbSba. When group size is expressed as a proportion, so 1−Pa can be substituted for Pb; this expression can be solved for group A's size, Pa as follows:
$$P_{a} = \frac{{S_{{ba}} N_{b} }}{{S_{{ba}} N_{b} + S_{{ab}} N_{a} }}$$
(6)
Note that the term for total population size, Z, drops out, so the estimate refers to proportional group size. It provides the means for controlling for three sources of bias: those due to differentials in network size, homophily, and differential recruitment.9 For example, consider again the case of gender. From Table 3, where males are group A and females are group B, the proportion of females recruited by males was Sab = 0.19, and the proportion of males recruited by females was Sba =0.62. Furthermore, the mean network size for males was Na = 6.51, and Nb = 6.60 for females. Substituting these values into the above expression yields the estimated proportion of males in the population is:
$$P_{a} = \frac{{0.626\quad 6.60}}{{0.62\quad 6.60 + 0.19\quad 6.51}} = 0.769$$
(7)

Note that the estimated proportion of males, 0.769, closely approximates the equilibrium sample composition, i.e., 0.767. This reflects similarities across genders in network sizes, levels of homophily, and recruitment effectiveness, factors that in combination made the sample approximately self-weighting. However, these factors are generally divergent [e.g., see Table 4's analysis of recruitment by race/ethnicity, where the equilibrium percentage of white DUs differs by 50% from the estimated percentage of white DUs (15 versus 10%, respectively)]. RDS samples are generally not self-weighting and hence the post-stratification provided by the RDS population estimator is typically needed.

This estimation procedure generalizes to systems with more than two groups. The solution for a system with n groups requires solving a system of linear equations:
$$\begin{array}{*{20}c} {1 = P_{a} + P_{b} + \ldots + P_{n} } \\ {P_{a} N_{a} S_{{ab}} = P_{b} N_{b} S_{{ba}} } \\ {P_{a} N_{a} S_{{ac}} = P_{c} N_{c} S_{{ca}} } \\ \ldots \\ {P_{a} N_{a} S_{{an}} = P_{n} N_{n} S_{{na}} } \\ \end{array} $$
(8)
Finally, homophily is computed based on the population estimate derived from the reciprocity model (i.e., the P terms) and from the recruitment selection proportions (i.e., the S terms).
$$\begin{array}{*{20}c} {H_{x} = \frac{{P_{x} - S_{{xx}} }}{{P_{x} - 1}}}{if\,S_{{xx}} \geqslant P_{x} } \\ {H_{x} = \frac{{S_{{xx}} - P_{x} }}{{P_{x} }}}{if\,S_{{xx}} \prec P_{x} } \\ \end{array} $$
(9)

In this expression, homophily is positive if self-recruitment (Sxx) exceeds that which random mixing would have produced (Px) and negative if there is a bias toward out-group recruitment (i.e., if Sxx < Px). In Table 3's analysis of recruitment by gender, homophily is positive for both males and females, e.g., for females, homophily is (0.23−0.38)/(0.23−1) = 0.19. This indicates that females recruited as though 19% of the time they recruited another female, and the other 81% of the time they recruited randomly, without respect to gender. Similarly, the homophily for males was 0.18, so for the gender variable, differences in homophilous were minor. Other variables reveal more substantial differences, e.g., current injectors had a substantial homophily level (0.41), whereas former injectors and non-injectors had near zero homophily (−0.02 and 0.07, respectively). Given that higher homophily groups tend to be over-sampled,9 the post-stratification process compensates by deflating the estimated percentage of injectors, from 42% in the actual and equilibrium samples, to an estimate of only 25% injectors in the DU population. As this example illustrates, homophily levels may vary substantially across variables within the same data set.

Respondent-Driven Sampling Software

To facilitate computation of the sampling equilibrium, number of waves required for equilibrium to be attained, estimated population size, homophily, and other relevant terms, custom software has been developed. It is termed the RDS Analysis Tool (RDSAT). Software is also available for managing field operations, including calculating respondent fees for peer recruitment and measures designed to reduce subject duplication and impersonation. This is termed the RDS Coupon Manager (RDSCM). This software is free for non-commercial use and can be downloaded at: http://www.respondentdrivensampling.org/.

Copyright information

© The New York Academy of Medicine 2006