Abstract
This paper offers an easy-to-implement approach to protect multivariate survey data common in marketing, such as attitudes and demographics. Our approach preserves multivariate distributions by releasing a protected data set with privacy protections. The data represent a highly detailed multivariate survey with severe privacy issues that enables us to demonstrate the tradeoff between data utility and data privacy. We create a data privacy metric that quantifies the ability of a data intruder successfully identify survey respondents and their sensitive responses. We provide data privacy measurements for a variety of competitor methods such as sampling and random noise addition and we show that by comparison, our approach can prevent a data intruder from targeting individuals while maintaining a very high level of data utility.
Similar content being viewed by others
Notes
Note that we are not singling Austin out for criticism per se; many similar databases are publicly accessible.
We have blanked out the actual ZIP code because a colleague mentioned being uncomfortable when reading the precision of identification of this 88-year-old woman. It is precisely this discomfort that data privacy efforts like ours are intended to minimize. However, to respect the fact that a reader could indeed go on to identify this woman, we have blinded that data in the illustration.
References
Abowd, J.M., M.J. Schneider, and L. Vilhuber. 2013. Differential privacy applications to Bayesian and linear mixed model estimation. Journal of Privacy and Confidentiality 5 (1).
Bambauer, J., K. Muralidhar, and R. Sarathy. 2013. Fool’s gold: An illustrated critique of differential privacy. Vanderbilt Journal of Entertainment & Technology Law 16: 701.
Blair, Graeme, Kosuke Imai, and Yang-Yang Zhou. 2015. Design and analysis of the randomized response technique. Journal of the American Statistical Association 110 (511): 1304–1319.
Churchill Jr., Gilbert A. 1979. A paradigm for developing better measures of marketing constructs. Journal of Marketing Research 16 (1): 64–73.
City of Austin. 2016. AAQoL: AustinTexas.gov—The official website of the city of Austin. http://www.austintexas.gov/asianlifeaustin. Accessed 21 Oct 2017.
City of Austin. 2017a. AAQOL survey monkey answers Nov 7 update: Open data: City of Austin Texas. https://data.austintexas.gov/City-Government/AAQOL-Survey-Monkey-Answers-Nov7-Update/i3d7-gc2g.
City of Austin. 2017b. Open data: AustinTexas.gov—The official website of the city of Austin. http://austintexas.gov/opendata. Accessed 21 Oct 2017
De Jong, Martijn G., Rik Pieters, and Jean-Paul Fox. 2010. Reducing social desirability bias through item randomized response: An application to measure underreported desires. Journal of Marketing Research 47 (1): 14–27.
Dhar, Tirtha, and Kathy Baylis. 2011. Fast-food consumption and the ban on advertising targeting children: The Quebec experience. Journal of Marketing Research 48 (5): 799–813.
Dickson, John P., and Douglas L. Maclachlan. 1996. Fax surveys: Return patterns and comparison with mail surveys. Journal of Marketing Research 33 (1): 108–113.
Donnelly, Laura. 2014. Hospital records of All NHS patients sold to insurers. Telegraph. http://www.telegraph.co.uk/news/health/news/10656893/Hospital-records-of-all-NHS-patients-sold-to-insurers.html. Accessed 23 Feb
Duchi, John C., Michael I. Jordan, and Martin J. Wainwright. 2018. Minimax optimal procedures for locally private estimation. Journal of the American Statistical Association 113 (521): 182–201.
Dwork, Cynthia, et al. 2006. Calibrating noise to sensitivity in private data analysis. Theory of cryptography conference. Springer, Berlin.
Furse, David H., and David W. Stewart. 1982. Monetary incentives versus promised contribution to charity: New evidence on mail survey response. Journal of Marketing Research 19 (3): 375–380.
Holtrop, Niels, Jaap E. Wieringa, Maarten J. Gijsenberg, and Peter C. Verhoef. 2017. No future without the past? Predicting churn in the face of customer privacy. International Journal of Research in Marketing 34 (1): 154–172.
Iacobucci, Dawn, and G.A. Churchill Jr. 2015. Marketing research: Methodological foundations, 11th ed. Nashville, TN: Earlie Lite Books.
Iman, Ronald L., and W.J. Conover. 1982. A distribution-free approach to inducing rank correlation among input variables. Communications in Statistics-Simulation and Computation 11 (3): 311–334.
Jang, Yuri. 2016. Asian Americans in Austin: Final report of the Asian American Quality of Life (AAQoL) Survey. http://www.austintexas.gov/sites/default/files/files/Communications/4.2_FINAL_AA_in_Austin_report_from_UT.pdf. Accessed 21 Oct 2017.
Kannan, P.K. 2017. Digital marketing: A framework, review and research agenda. International Journal of Research in Marketing 34 (1): 22–45.
Li, Bai, Vishesh Karwa, Aleksandra Slavković, and Rebecca Steorts. 2018. A privacy preserving algorithm to release sparse high-dimensional histograms. Journal of Privacy and Confidentiality 8 (1).
Little, Roderick J.A. 1993. Statistical analysis of masked data. Journal of Official Statistics 9 (2): 407–426.
Machanavajjhala, A., J. Gehrke, D. Kifer, and M. Venkitasubramaniam. 2006. l-Diversity: Privacy beyond k-anonymity. In 22nd international conference on data engineering (ICDE’06), 24. IEEE.
Machanavajjhala, A., D. Kifer, J. Abowd, J. Gehrke, and L. Vilhuber. 2008. Privacy: Theory meets practice on the map. In Proceedings of the 2008 IEEE 24th international conference on data engineering, 277–286. IEEE Computer Society.
Martin, K.D., A. Borah, and R.W. Palmatier. 2017. Data privacy: Effects on customer and firm performance. Journal of Marketing 81 (1): 36–58.
Mildenhall, Stephen J. 2006. Correlation and aggregate loss distributions with an emphasis on the Iman-Conover method. Casualty Actuarial Society Forum 103–204.
Miller, Amalia R., and Catherine Tucker. 2009. Privacy protection and technology diffusion: The case of electronic medical records. Management Science 55 (7): 1077–1093.
Muralidhar, K., and R. Sarathy. 2006. Data shuffling—A new masking approach for numerical data. Management Science 52 (5): 658–670.
Ng, Irene C.L., and Susan Y.L. Wakenshaw. 2017. The internet-of-things: Review and research directions. International Journal of Research in Marketing 34 (1): 3–21.
Posada, David. 2002. Evaluation of methods for detecting recombination from DNA sequences: Empirical data. Molecular Biology and Evolution 19 (5): 708–717.
Rindfleisch, Aric, Alan J. Malter, Shankar Ganesan, and Christine Moorman. 2008. Cross-sectional versus longitudinal survey research: Concepts, findings, and guidelines. Journal of Marketing Research 45 (3): 261–279.
Schneider, Matthew J., Sharan Jagpal, Sachin Gupta, Shaobo Li, and Yu. Yan. 2017. Protecting customer privacy when marketing with second-party data. International Journal of Research in Marketing 34 (3): 593–603.
Schneider, Matthew, Sharan Jagpal, Sachin Gupta, Yan Yu, and Shaobo Li. 2018. A flexible method for protecting marketing data: An application to point-of-sale data. Marketing Science, Articles in Advance.
Sweeney, Latanya. 2000. Uniqueness of simple demographics in the U.S. population. Technical Report, Carnegie Mellon University.
Sweeney, Latanya. 2002. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10 (5): 557–570.
Vitale, R. A. 1990. On stochastic dependence and a class of degenerate distributions. Lecture Notes-Monograph Series, 459–469.
Warner, Stanley L. 1965. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association 60 (309): 63–69.
Wang, Yue, Kifer, Daniel, Lee, Jaewoo. 2019. Differentially private confidence intervals for empirical risk minimization. Journal of Privacy and Confidentiality 9(1)
Wedel, Michel, and P.K. Kannan. 2016. Marketing analytics for data-rich environments. Journal of Marketing 80 (6): 97–121.
Weijters, Bert, and Hans Baumgartner. 2012. Misresponse to reversed and negated items in a Survey: A review. Journal of Marketing Research 49 (5): 737–747.
Wynn-Evans, Charles. 2014. Risking a criminal record for misusing confidential information: People management magazine online. http://www2.cipd.co.uk/pm/peoplemanagement/b/weblog/archive/2014/10/13/risking-a-criminal-record-for-misusing-confidential-information.aspx. Accessed 13 Oct
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Schneider, M.J., Iacobucci, D. Protecting survey data on a consumer level. J Market Anal 8, 3–17 (2020). https://doi.org/10.1057/s41270-020-00068-6
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1057/s41270-020-00068-6