Skip to main content

Surveying Hard-to-Reach Groups Through Sampled Respondents in a Social Network

A Comparison of Two Survey Strategies


The sampling frame in most social science surveys misses members of certain groups, such as the homeless or individuals living with HIV. These groups are known as hard-to-reach groups. One strategy for learning about these groups, or subpopulations, involves reaching hard-to-reach group members through their social network. In this paper we compare the efficiency of two common methods for subpopulation size estimation using data from standard surveys. These designs are examples of mental link tracing designs. These designs begin with a randomly sampled set of network members (nodes) and then reach other nodes indirectly through questions asked to the sampled nodes. Mental link tracing designs cost significantly less than traditional link tracing designs, yet introduce additional sources of potential bias. We examine the influence of one such source of bias using simulation studies. We then demonstrate our findings using data from the General Social Survey collected in 2004 and 2006. Additionally, we provide survey design suggestions for future surveys incorporating such designs.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9


  1. Slashdot is a technology news blog. Slashdot recently introduced a feature, known as Slashdot Zoo, which allows users to connect to one another as friends.


  1. Blitzstein J, Diaconis P (2006) A sequential importance sampling algorithm for generating random graphs with prescribed degrees. Preprint, pp 1–35

  2. DiPrete TA, Gelman A, McCormick T, Teitler J, Zheng T (2011) Segregation in social networks based on acquaintanceship and trust. Am J Sociol 116:1234–1283

    Article  Google Scholar 

  3. Goel S, Salganik M (2009) Respondent-driven sampling as Markov chain Monte Carlo. Stat Med 28(17):2202–2229

    MathSciNet  Article  Google Scholar 

  4. Handcock MS, Gile KJ (2010) Modeling social networks from sampled data. Ann Appl Stat 4(1):5–25

    MathSciNet  MATH  Article  Google Scholar 

  5. Killworth PD, McCarty C, Bernard HR, Johnsen EC, Domini J, Shelley GA (2003) Two interpretations of reports of knowledge of subpopulation sizes. Soc Netw 25:141–160

    Article  Google Scholar 

  6. Killworth PD, McCarty C, Johnsen EC, Bernard HR, Shelley GA (2006) Investigating the variation of personal network size under unknown error conditions. Soc Methods Res 35(1):84–112

    MathSciNet  Article  Google Scholar 

  7. Laumann EO (1969) Friends of urban men: an assessment of accuracy in reporting their socioeconomic attributes, mutual choice and attitude agreement. Sociometry 32:54–69

    Article  Google Scholar 

  8. Leskovek J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6:29–123

    MathSciNet  Article  Google Scholar 

  9. Lohr S (1999) Sampling: design and analysis. Duxbury, N Scituate

    MATH  Google Scholar 

  10. McCormick T, Salganik MJ, Zheng T (2010) How many people do you know?: Efficiently estimating personal network size. J Am Stat Assoc 105:59–70

    MathSciNet  Article  Google Scholar 

  11. McCormick TH, Zheng T (2007) Adjusting for recall bias in “how many X’s do you know?” Surveys. In: Proceedings of the joint statistical meetings

    Google Scholar 

  12. McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444

    Article  Google Scholar 

  13. Salganik MJ, Heckathorn DD (2004) Sampling and estimation in hidden populations using respondent-driven sampling. Sociol Method 34:193–239

    Article  Google Scholar 

  14. Salganik MJ, Mello MB, Adbo AH, Bertoni N, Fatzio D, Bastos FI (2011) The game of contacts: estimating the social visibility of groups. Soc Netw 33:70–78

    MATH  Article  Google Scholar 

  15. Shelley GE, Killworth PD, Bernard HR, McCarty C, Johnsen EC, RE Rice (2006) Who knows your HIV status II?: Information propagation within social networks of seropositive people. Hum Organ 65(4):430–444

    Google Scholar 

  16. Sirken MG (1970) Household surveys with multiplicity. J Am Stat Assoc 65(329):257–266

    Google Scholar 

  17. Zheng T, Salganik MJ, Gelman A (2006) How many people do you know in prison?: Using overdispersion in count data to estimate social structure. J Am Stat Assoc 101:409–423

    MathSciNet  MATH  Article  Google Scholar 

Download references


The authors gratefully acknowledge the support of the SAMSI Complex Networks Program.

Tyler McCormick is partially supported by NIAID grant R01 HD54511. This work was partially completed while McCormick was supported by a Google PhD Fellowship in Statistics. The research of Tian Zheng is, in parts, supported by NSF grants DMS-0714669 and SES-1023176, NIH grant R01 GM070789, and a 2010 Google research award. Eric Kolaczyk is supported by ONR award N000140910654.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Tyler H. McCormick.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

McCormick, T.H., He, R., Kolaczyk, E. et al. Surveying Hard-to-Reach Groups Through Sampled Respondents in a Social Network. Stat Biosci 4, 177–195 (2012).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Aggregated relational data
  • Egocentric nominations
  • Hard-to-reach groups
  • Mental link tracing design
  • Sampling
  • Social network