Prevention Science

, Volume 21, Issue 2, pp 194–202 | Cite as

Using Security Questions to Link Participants in Longitudinal Data Collection

  • Shu XuEmail author
  • Anthea Chan
  • Michael F. Lorber
  • Justin P. Chase


Anonymous data collection systems are often necessary when assessing sensitive behaviors but can pose challenges to researchers seeking to link participants over time. To assist researchers in anonymously linking participants, we outlined and tested a novel security question linking (security question linking; SEEK) method. The SEEK method includes four steps: (1) data management and standardization, (2) many-to-many matching, (3) fuzzy matching, and (4) rematching and verification. The method is demonstrated in SAS with two samples from a longitudinal study of adolescent dating violence. After an initial assessment during a laboratory visit, participants were asked to complete an online assessment either (a) once, 3 months later (Sample 1, n = 60), or (b) three times at 1-month intervals (Sample 2, n = 140). Demographics, eye color, and responses to nine security questions were used as key variables to link responses from the laboratory and online follow-up assessments. The rates of matched cases were 100% in Sample 1 and from 94.3 to 98.3% in Sample 2. To quantify the confidence in the data quality of successfully matched pairs, we reported the means and standard deviations of the number of matched security questions. In addition, we reported the rank order and counts of the mismatched components in key variables. Results indicate that the SEEK method provides a feasible and reliable solution to link responses in longitudinal studies with sensitive questions.


Security questions Linking Longitudinal studies SEEK Online studies 



Support for the Dating Study data collection was provided by Grants 2014-VA-CX-0066 and 1R21HD077345. We thank Gabriella Damewood, Ashley Dills, Nicole Graziano, and Angela Marinakis for their assistance in data collection.

Funding Information

The third author received research grants from the National Institutes of Health (1R21HD077345) and the National Institute of Justice (2014-VA-CX-0066) to support this study.

Compliance with Ethical Standards

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed Consent

Informed consent was obtained from all individual participants included in the study.

Conflicts of Interest

The first, second, and fourth authors declare that they have no conflict of interest.

Supplementary material

11121_2019_1080_MOESM1_ESM.docx (18 kb)
ESM 1 (DOCX 17 kb)


  1. Barnea, Z., Rahav, G., & Teichman, M. (1987). The reliability and consistency of self-reports on substance use in a longitudinal study. British Journal of Addiction, 82, 891–898. Scholar
  2. Bold, K. W., Kong, G., Cavallo, D. A., Camenga, D. R., & Krishnan-Sarin, S. (2016). Reasons for trying e-cigarettes and risk of continued use. Pediatrics, 138, 1–8. Scholar
  3. Brown, A. P., Ferrante, A. M., Randall, S. M., Boyd, J. H., & Semmens, J. B. (2017). Ensuring privacy when integrating patient-based datasets: New methods and developments in record linkage. Frontiers in Public Health, 5, 1–6. Scholar
  4. Cadieux, R. & Bretheim, D. R. (2014, March). Matching rules: Too loose, too tight, or just right? Proceedings of the 2014 SAS global forum (SGF) conference, Washington D.C. Retrieved from
  5. Carifio, J., & Biron, R. (1978). Collective sensitive data anonymously: The CDRPG technique. Journal of Alcohol and Drug Education, 23, 47–66.Google Scholar
  6. Daigneault, I., Hébert, M., McDuff, P., Michaud, F., Vézina-Gagnon, P., Henry, A., & Porter-Vignola, É. (2015). Effectiveness of a sexual assault awareness and prevention workshop for youth: A 3-month follow-up pragmatic cluster randomization study. The Canadian Journal of Human Sexuality, 24, 19–30. Scholar
  7. Galanti, M. R., Siliquini, R., Cuomo, L., Melero, J. C., Panella, M., & Faggiano, F. (2007). Testing anonymous link procedures for follow-up of adolescents in a school-based trial: The EU-DAP pilot study. Preventive Medicine, 44, 174–177. Scholar
  8. Gilbert, R., Lafferty, R., Hagger-Johnson, G., Harron, K., Zhang, L. C., Smith, P., et al. (2017). GUILD: Guidance for information about linking data sets. Journal of Public Health, 40, 191–198. Scholar
  9. Grube, J. W., Morgan, M., & Kearney, K. A. (1989). Using self-generated identification codes to match questionnaires in panel studies of adolescent substance use. Addictive Behaviors, 14, 159–171. Scholar
  10. Heerwegh, D., & Loosveldt, G. (2008). Face-to-face versus web surveying in a high-internet-coverage population: Differences in response quality. Public Opinion Quarterly, 72, 836–846. Scholar
  11. Holden, J. D. (2001). Hawthorne effects and research into professional practice. Journal of Evaluation in Clinical Practice, 7, 65–70. Scholar
  12. Kearney, K. A., Hopkins, R. H., Mauss, A. L., & Weisheit, R. A. (1984). Self-generated identification codes for anonymous collection of longitudinal questionnaire data. Public Opinion Quarterly, 48, 370–378. Scholar
  13. Kristjansson, A. L., Sigfusdottir, I. D., Sigfusson, J., & Allegrante, J. P. (2014). Self-generated identification codes in longitudinal prevention research with adolescents: A pilot study of matched and unmatched subjects. Prevention Science, 15, 205–212. Scholar
  14. McGloin, J., Holcomb, S., & Main, D. S. (1996). Matching anonymous pre-posttests using subject-generated information. Evaluation Review, 20, 724–736. Scholar
  15. Ong, A. D., & Weiss, D. J. (2000). The impact of anonymity on responses to sensitive questions. Journal of Applied Social Psychology, 30, 1691–1708. Scholar
  16. Pérez, A., Ariza, C., Sánchez-Martínez, F., & Nebot, M. (2010). Cannabis consumption initiation among adolescents: A longitudinal study. Addictive Behaviors, 35, 129–134. Scholar
  17. Pfeiffer, M., Slopen, M., Curry, A., & McVeigh, K. (2010). Creation of a linked inter-agency data warehouse: The longitudinal study of early development. A research report from the New York city department of health and mental hygiene. Retrieved from
  18. Rabkin, A. (2008, July). Personal knowledge questions for fallback authentication: Security questions in the era of Facebook. In In proceedings of the 4th symposium on usable privacy and security, Pittsburgh, Pennsylvania (13–23). New York, New York: ACM.Google Scholar
  19. Rubin, D., Schrauf, R., & Greenberg, D. (2004). Stability in autobiographical memories. Memory, 12, 715–721. Scholar
  20. SAS Institute Inc. (2018). COMPGED Function. Retrieved February 8 from
  21. Schnell, R., Bachteler, T., & Reiher, J. (2010). Improving the use of self-generated identification codes. Evaluation Review, 34, 391–418. Scholar
  22. Staum, P. (2007, ). Fuzzy matching using the COMPGED function. In Proceedings of the 2007 NorthEast SAS users group (NESUG) conference, Baltimore, Maryland. Retrieved from
  23. Tamariz, L., Medina, H., Suarez, M., Seo, D., & Palacio, A. (2018). Linking census data with electronic medical records for clinical research: A systematic review. Journal of Economic and Social Measurement, 43, 105–118. Scholar
  24. Theis, M. K., Reid, R. J., Chaudhari, M., Newton, K. M., Spangler, L., Grossman, D. C., & Inge, R. E. (2010). Case study of linking dental and medical health records. The American Journal of Managed Care, 16, e51–e56.PubMedGoogle Scholar
  25. Tromp, M., Ravelli, A. C., Bonsel, G. J., Hasman, A., & Reitsma, J. B. (2011). Results from simulated data sets: Probabilistic record linkage outperforms deterministic record linkage. Journal of Clinical Epidemiology, 64, 565–572. Scholar
  26. Yurek, L. A., Vasey, J., & Sullivan Havens, D. (2008). The use of self-generated identification codes in longitudinal research. Evaluation Review, 32, 435–452. Scholar
  27. Zhu, Y., Matsuyama, Y., Ohashi, Y., & Setoguchi, S. (2015). When to conduct probabilistic linkage vs. deterministic linkage? A simulation study. Journal of Biomedical Informatics, 56, 80–86. Scholar

Copyright information

© Society for Prevention Research 2019

Authors and Affiliations

  • Shu Xu
    • 1
    Email author
  • Anthea Chan
    • 2
  • Michael F. Lorber
    • 3
  • Justin P. Chase
    • 3
  1. 1.Department of BiostatisticsNew York UniversityNew YorkUSA
  2. 2.Columbia UniversityNew YorkUSA
  3. 3.Family Translational Research Group, New York UniversityNew YorkUSA

Personalised recommendations