Abstract
Governments face new and serious risks when striving to protect their citizens. Among the various information technology tools discussed in the political and legal sphere, data mining applications for the analysis of personal information have probably generated the greatest interest. Data mining has captured the imagination as a tool which can potentially close the intelligence gap constantly deepening between governments and their targets. In the US, data mining initiatives are popping up everywhere. The reaction to the data mining of personal information by governmental entities came to life in a flurry of reports, discussions, and academic papers. The general notion in these sources is that of fear and even awe. Striving to understand what lies behind this strong visceral response is difficult and complex. An important methodological step must be part of every one of these inquires mentioned aboveāthe adequate consideration of alternatives. This chapter is devoted to bringing this step to the attention of academics and policy makers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
For a paper discussing these initiatives in the Netherlands, see van der Veer et al. (2009).
- 3.
For a discussion of the building blocks of data mining, see Zarsky (2002ā2003).
- 4.
Such success has been recently detailed in several popular booksāsee Baker (2008).
- 5.
This outcome is interesting, as stories related to privacy in general have generated limited interest, less they involve an actual catastropheāpersonal data about a judge blocks his nomination, information regarding the address of an actress leads to her murder, and many other examples. Yet the data mining stories here addressed focus on potential harms, which have yet to materialize. This outcome tells an interesting story about the risks of data mining.
- 6.
In some instances, the services rendered are not essential, thus allowing for consumer choiceāan option which requires rethinking many of the elements to be addressed below. Finally, the obligations and motivations of governmental entities are different than their commercial counterparts, thus altering the internal calculus leading to the final recommendations.
- 7.
Since the matters addressed here were drawn out elsewhere, the analysis is brief. For a more in-depth discussion, see DeRosa 2004; Zarsky (2002ā2003). See also Taipale (2003).
- 8.
For a discussion regarding the distinction among the twoāsee Cate (2008).
- 9.
For a discussion as to how these data mining techniques are carried out see, Zarsky (2002ā2003),
- 10.
This is done both in advance, and after the fact, by āweeding outā results she might consider as random, wrong or insignificant.
- 11.
I was told by data mining experts that this is usually the case with face and image recognition software.
- 12.
However, ābuildingā a theoretical justification to a statistical correlation is usually easy and merely requires some imagination. Thus, one can easily question the extent of protection from arbitrary results a call for ācausationā provides.
- 13.
Transparency is an additional category which requires scrutiny and discussion, yet it calls for a very different form of analysis. For more on this issue, see Zarsky (2012).
- 14.
For an empirical study pointing in this direction, see Christopher Slobogin (2007).
- 15.
In the United States, such rights are governed by the Privacy Act, which call for the publication of SORNs to notify the public of such uses. For more on this, see the Privacy Act Overview of 2010, accessed July 12, 2011, http://www.justice.gov/opcl/1974indrigacc.htm.
- 16.
This is not the classic understanding of a āsearch,ā which does not pertain to searches of data which were already collected. However, newer theories reexamining the āsearchā terminology question such wisdom. Slobogin, for instance, believes the term should be used in the same way the public understands it. According to his empirical studies, that includes data mining. Mark Blitz is also examining whether searches within data or other sources the government obtained lawfully could be considered a āsearch,ā nonetheless, while focusing on DNA samples.
- 17.
- 18.
For a discussion of this argument in the Data Mining context, see Cate (2008) who notes it as perhaps the most powerful one in this context. Strandburg makes a similar argument, while pointing out that in some contexts data mining might impede on US Constitutional First Amendment Rights, such as freedom of speech and association. For a general discussion of privacy and autonomy, see Solove (2001).
- 19.
For a discussion and critique of this distinction, see Slobogin (2007).
- 20.
- 21.
For a full discussion of this issue in EU law (as well as the law in the various states) see an excellent discussion in Korff (2011)
- 22.
For a discussion of errors in general and of this context in particular, see Ramasastry (2004).
- 23.
US ādue processā doctrine does not apply for various reasons. In some contexts, EU law provides for a right to understand the processesā internal workings. For a discussion of this issue, see Steinbock (2005).
- 24.
It would mean that all individuals, for instance, would be required to arrive 30Ā minutes earlier at the airport to go through heightened security checks.
- 25.
For instance, discrimination on the basis of āsensitive informationā such as race is illegal, even when such discrimination is statistically justified. For a partial critique of this outcome, see Schauer (2006).
- 26.
For instance, one might argue that encumbering the ability of all individuals to travel when striving to provide for security might limit their freedom of movement. I will refrain from developing this notion. For more on this point, see Slobogin (2007, 102).
- 27.
An option promoted by Harcourt (2007).
- 28.
When the chance for selection is very low, such enforcement loses its teeth, as the penalties inflicted cannot be disproportionate to the specific transgression. See similar dynamics occurring in the music and film industry when striving to enforce their rights online.
- 29.
Clearly, just selecting every tenth person or a similar strategy will allow easy gaming of the system by interested parties (all they have to do is travel in pairs and one of them will surely be beyond suspicion!).
- 30.
I thank Kathy Strandburg for making this point.
- 31.
The discussion is intentionally avoiding instances in which the actions resulting from the higher level of scrutiny constitute searches, or other actions which directly impede upon the liberty of the subjects. I am doing so to sidestep the broader discussion about Terry stops and other such actions, where āreasonable causeā or other levels of scrutiny are mandated. For a mapping of these contexts, see Slobogin (2007, 23).
- 32.
For instance, if the officer focuses on someone with a gun, it is because he created a mental profile with the category āpeople with guns,ā and is focusing his attention on those within that category.
- 33.
As Schauer explains, such practices are wide spread, and applied by customs, as well as by the IRS; see Schauer (2006).
- 34.
For empirical findings showing this point, see Slobogin (2007, 195).
- 35.
This was the case in the Netflix/Imdb fiasco. Such multi-factored datasets are now at the disposal of many public and private entities.
- 36.
This option still holds substantial benefits, as it minimizes the risk of illegal abuse of the information by a government executives (such as the many stories occurring every year of tax officials sharing or selling personal information about citizens). Note, however, that this problem could also be mitigated through disciplinary actions.
- 37.
If one form of discretion generates errors which are frequent, the entire process is compromised. However, let us assume that the threshold of a reasonable level of errors would be attended to as a preliminary matterāand if the level of errors will be unacceptably high, the project would be set aside. Yet as I demonstrated in the text, even with an overall acceptable level of errors, problems can still prevail.
- 38.
This was exactly, according to Schauer, the case in OāHara airport, where it was revealed that the percentage of minorities made subject to intrusive cavity searches was very high. When such practices, which were no doubt motivated by racial animosity, were stopped, the success of such searches increased. See Schauer (2006).
- 39.
The authors explain that part of the role of the 4th Amendment is to limit the discretion of law enforcement. Harcourt and Meares (2010).
- 40.
I acknowledge that even when using a central system, some level of examining of the actions of the periphery operation is needed as well. Yet this would be substantially less than the level required in the third alternative model.
- 41.
For a discussion of this matter in the Corporate Risk Management setting.
References
Ayres, Ian. 2007. Super crunchers. New York: Bantam Dell.
Baker, Stephan. 2008. The numerati. New York: HMH.
Bamberger, Kenneth A. 2010. Technologies of compliance: Risk and regulation in a digital age. Texas Law Review 88 (4): 669ā739.
Blitz, Mark. 2011. Warranting a closer look when should the government need probable cause to analyze information it has already acquired? PLSC 2011 Workshop. Draft, on file with author.
Cate, Fred H. 2008. Data mining: The need for a legal framework. Harvard Civil Rights-Civil Liberties Law Review 43 (2): 435ā489.
DeRosa, Mary. 2004. Data mining and data analysis for counterterrorism. Center for Strategic and International Studies (CSIS) report, 14. http://csis.org/files/media/csis/pubs/040301_data_mining_report.pdf. Accessed 12 July 2011.
Harcourt, Bernard E. 2007. Against prediction. Chicago: University of Chicago Press.
Harcourt, Bernard E., and Tracey L. Meares. 2010. Randomization and the fourth amendment. University of Chicago Law & Economics, Olin Working Paper No. 530:3ā76.
IBM. 2010. Memphis police department reduces crime rates with IBM predictive analytics software. http://www-03.ibm.com/press/us/en/pressrelease/32169.wss. Accessed 12 July 2011.
Jonas, Jeff, and Harper, Jim. 2006. Effective counterterrorism and the limited role of predictive data mining. Cato Institute, Policy Analysis 584: 1ā12. www.thebreakingnews.com/files/articles/datamining-cato-report.pdf. Accessed 12 July 2011.
Korff, Douwe. 2011. Data protection laws in the EU: The difficulties in meeting the challenges posed by global social and technical developments. Working Paper No. 2, European Commission Directorate-General Justice, Freedom and Security (January 20, 2010), final [extended and re-edited] version. http://ec.europa.eu/justice/policies/privacy/docs/studies/new_privacy_challenges/final_report_working_paper_2_en.pdf. Accessed 12 July 2011.
Korobkin, Russell. 2003. Bounded rationality, standard form contracts, and unconscionability. University of Chicago Law Review 70:1203ā1295.
Markle Foundation. 2003. Creating a trusted network for homeland security (December 1, 2003). http://www.markle.org/publications/666-creating-trusted-network-homeland-security. Accessed 12 July 2011.
Nissenbaum, Helen. 2009. Privacy in Context. California: Stanford University Press.
Ohm, Paul. 2010. Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review 57:1701ā1777.
Ramasastry, Anita. 2004. Lost in translation? Data mining, national security and the adverse inference problem. Santa Clara Computer & High Tech. Law Journal 22:757ā796.
Schauer, Frederick. 2006. Profiles, probabilities and stereotyping. Harvard University Press.
Schneier, Bruce. 2006. Why data mining wonāt stop terror. Wired (September 3, 2006). http://www.wired.com/politics/security/commentary/securitymatters/2006/03/70357. Accessed 12 July 2011.
Schwartz, Paul M. 2008. Reviving Telecommunications Surveillance Law. University of Chicago Law Review 75:310ā311.
Scism, Leslie, and Maremont, Mark. 2011. Insurers test data profiles to identify risky clients. The Wall Street Journal. http://online.wsj.com/article/SB10001424052748704648604575620750998072986.html?mod=WSJ_hp_LEADNewsCollection. Accessed 12 July 2011.
Slobogin, Christopher. 2007. Privacy at risk: The New Government Surveillance and the Fourth Amendment. Chicago: The University of Chicago Press.
Slobogin, Christopher. 2008. Government data mining and the fourth amendment. The University of Chicago Law Review 75:317ā341.
Slobogin, Christopher. 2010. Is the fourth amendment relevant in a technological age? Governance Studies at Brookings (December 8, 2010). http://www.brookings.edu/~/media/Files/rc/papers/2010/1208_4th_amendment_slobogin/1208_4th_amendment_slobogin.pdf. Accessed July 12, 2011.
Solove, Daniel J. 2001. Privacy and power: Computer databases and metaphors for information privacy. Stanford Law Review 53:1393ā1462.
Solove, Daniel J. 2008. Data mining and the security-liberty debate. University of Chicago Law Review 74:343ā362.
Solove, Daniel J., and Schwartz, Paul M. 2006. Information Privacy Law. New York: Aspen.
Steinbock, Daniel J. 2005. Data matching, data mining, and due process. Georgia Law Review 40:1ā86.
Strandburg, Kathrine J. 2008. Freedom of association in a networked world: First amendment regulation of relational surveillance. Boston College Law Review 49:741ā822.
Taipale, Kim A. 2003. Data mining and domestic security: Connecting the dots to make sense of data. Columbia Science and Technology Law Review 5 (2): 1ā83.
TAPAC. 2004. The report of the technology and privacy advisory committee, safeguarding privacy in the fight against terrorism. http://epic.org/privacy/profiling/tia/tapac_report.pdf (Hereinafter TAPAC Report). Accessed 12 July 2011.
Tor, Avishalom. 2008. The methodology of the behavioral analysis of law. Haifa Law Review 4:237ā327.
U.S. General Accounting Office. 2004. Data mining: Federal efforts over a wide range of uses. Report to the ranking minority member, subcommittee on financial management, the budget, and international security, committee on governmental affairs, U.S. senate, GAO-04ā548. Washington: 9ā54. http://www.gao.gov/new.items/d04548.pdf. Accessed 12 July 2011.
van der Veer, R.C.P., Roos, H.T., and van der Zanden, A. 2009. Data mining for intelligence led policing. Paper presented at the proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France (June 28āJuly 01, 2009). http://www.sentient.nl/docs/data_mining_for_intelligence_led_policing.pdf. Accessed 12 July 2011.
Westin, Alan. 1967. Privacy and Freedom. New York: Atheneum.
Zarsky, Tal Z. 2002ā2003. Mine your own business!: Making the case for the implications of the data mining of personal information in the forum of public opinion. Yale Journal of Law & Technology 5:1ā56.
Zarsky, Tal Z. 2012. Transparency in data mining: From theory to practice, in Discrimination and Privacy in the Information Society, (Forthcoming) (Springer)
Acknowledgment
This chapter is part of an NWO-funded research project āData Mining without Discrimination.ā I thank Kathy Strandburg, Richard Stewart, the participants of the NYU Law School Hauser Research Forum, the NYU Privacy Reading Group and the DePaul Law School CIPLIT presentation for their comments. I also thank Talya Ponchek for her comments and research assistance. For an extended version of the ideas presented here, see: Zarsky, Tal Z. 2012. Data Mining and its Alternatives Penn State Law Review 116(2):101.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2012 Springer Science+Business Media B.V.
About this chapter
Cite this chapter
Zarsky, T.Z. (2012). The Data Mining Balancing Act. In: Gutwirth, S., Leenes, R., De Hert, P., Poullet, Y. (eds) European Data Protection: In Good Health?. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-2903-2_5
Download citation
DOI: https://doi.org/10.1007/978-94-007-2903-2_5
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-2902-5
Online ISBN: 978-94-007-2903-2
eBook Packages: Humanities, Social Sciences and LawLaw and Criminology (R0)