Using Security Questions to Link Participants in Longitudinal Data Collection
Anonymous data collection systems are often necessary when assessing sensitive behaviors but can pose challenges to researchers seeking to link participants over time. To assist researchers in anonymously linking participants, we outlined and tested a novel security question linking (security question linking; SEEK) method. The SEEK method includes four steps: (1) data management and standardization, (2) many-to-many matching, (3) fuzzy matching, and (4) rematching and verification. The method is demonstrated in SAS with two samples from a longitudinal study of adolescent dating violence. After an initial assessment during a laboratory visit, participants were asked to complete an online assessment either (a) once, 3 months later (Sample 1, n = 60), or (b) three times at 1-month intervals (Sample 2, n = 140). Demographics, eye color, and responses to nine security questions were used as key variables to link responses from the laboratory and online follow-up assessments. The rates of matched cases were 100% in Sample 1 and from 94.3 to 98.3% in Sample 2. To quantify the confidence in the data quality of successfully matched pairs, we reported the means and standard deviations of the number of matched security questions. In addition, we reported the rank order and counts of the mismatched components in key variables. Results indicate that the SEEK method provides a feasible and reliable solution to link responses in longitudinal studies with sensitive questions.
KeywordsSecurity questions Linking Longitudinal studies SEEK Online studies
Support for the Dating Study data collection was provided by Grants 2014-VA-CX-0066 and 1R21HD077345. We thank Gabriella Damewood, Ashley Dills, Nicole Graziano, and Angela Marinakis for their assistance in data collection.
The third author received research grants from the National Institutes of Health (1R21HD077345) and the National Institute of Justice (2014-VA-CX-0066) to support this study.
Compliance with Ethical Standards
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed consent was obtained from all individual participants included in the study.
Conflicts of Interest
The first, second, and fourth authors declare that they have no conflict of interest.
- Barnea, Z., Rahav, G., & Teichman, M. (1987). The reliability and consistency of self-reports on substance use in a longitudinal study. British Journal of Addiction, 82, 891–898. https://doi.org/10.1111/j.1360-0443.1987.tb03909.x.CrossRefPubMedGoogle Scholar
- Cadieux, R. & Bretheim, D. R. (2014, March). Matching rules: Too loose, too tight, or just right? Proceedings of the 2014 SAS global forum (SGF) conference, Washington D.C. Retrieved from http://support.sas.com/resources/papers/proceedings14/1674-2014.pdf
- Carifio, J., & Biron, R. (1978). Collective sensitive data anonymously: The CDRPG technique. Journal of Alcohol and Drug Education, 23, 47–66.Google Scholar
- Daigneault, I., Hébert, M., McDuff, P., Michaud, F., Vézina-Gagnon, P., Henry, A., & Porter-Vignola, É. (2015). Effectiveness of a sexual assault awareness and prevention workshop for youth: A 3-month follow-up pragmatic cluster randomization study. The Canadian Journal of Human Sexuality, 24, 19–30. https://doi.org/10.3138/cjhs.2626.CrossRefGoogle Scholar
- Galanti, M. R., Siliquini, R., Cuomo, L., Melero, J. C., Panella, M., & Faggiano, F. (2007). Testing anonymous link procedures for follow-up of adolescents in a school-based trial: The EU-DAP pilot study. Preventive Medicine, 44, 174–177. https://doi.org/10.1016/j.ypmed.2006.07.019.CrossRefPubMedGoogle Scholar
- Haron, K. (2016). Introduction to data linkage. Retrieved March 16, 2019, from https://mail.google.com/mail/u/0/#search/michael+linking/QgrcJHrtqfZVLGTbpMWpsDZZbdHJkVFqSLg?projector=1&messagePartId=0.2
- Kristjansson, A. L., Sigfusdottir, I. D., Sigfusson, J., & Allegrante, J. P. (2014). Self-generated identification codes in longitudinal prevention research with adolescents: A pilot study of matched and unmatched subjects. Prevention Science, 15, 205–212. https://doi.org/10.1007/s11121-013-0372-z.CrossRefGoogle Scholar
- Ong, A. D., & Weiss, D. J. (2000). The impact of anonymity on responses to sensitive questions. Journal of Applied Social Psychology, 30, 1691–1708. https://doi.org/10.1111/j.1559-1816.2000.tb02462.x.CrossRefGoogle Scholar
- Pfeiffer, M., Slopen, M., Curry, A., & McVeigh, K. (2010). Creation of a linked inter-agency data warehouse: The longitudinal study of early development. A research report from the New York city department of health and mental hygiene. Retrieved from https://www1.nyc.gov/assets/doh/downloads/pdf/episrv/lsed-white-paper.pdf
- Rabkin, A. (2008, July). Personal knowledge questions for fallback authentication: Security questions in the era of Facebook. In In proceedings of the 4th symposium on usable privacy and security, Pittsburgh, Pennsylvania (13–23). New York, New York: ACM.Google Scholar
- SAS Institute Inc. (2018). COMPGED Function. Retrieved February 8 from http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002206133.htm
- Staum, P. (2007, ). Fuzzy matching using the COMPGED function. In Proceedings of the 2007 NorthEast SAS users group (NESUG) conference, Baltimore, Maryland. Retrieved from https://www.lexjansen.com/nesug/nesug07/ap/ap23.pdf
- Tromp, M., Ravelli, A. C., Bonsel, G. J., Hasman, A., & Reitsma, J. B. (2011). Results from simulated data sets: Probabilistic record linkage outperforms deterministic record linkage. Journal of Clinical Epidemiology, 64, 565–572. https://doi.org/10.1016/j.jclinepi.2010.05.008.CrossRefPubMedGoogle Scholar