Skip to main content

The Data Mining Balancing Act

  • Chapter
  • First Online:
European Data Protection: In Good Health?
  • 2083 Accesses

Abstract

Governments face new and serious risks when striving to protect their citizens. Among the various information technology tools discussed in the political and legal sphere, data mining applications for the analysis of personal information have probably generated the greatest interest. Data mining has captured the imagination as a tool which can potentially close the intelligence gap constantly deepening between governments and their targets. In the US, data mining initiatives are popping up everywhere. The reaction to the data mining of personal information by governmental entities came to life in a flurry of reports, discussions, and academic papers. The general notion in these sources is that of fear and even awe. Striving to understand what lies behind this strong visceral response is difficult and complex. An important methodological step must be part of every one of these inquires mentioned aboveā€”the adequate consideration of alternatives. This chapter is devoted to bringing this step to the attention of academics and policy makers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For a countering view, see Jonas and Harper (2006). See also commentary Schneier (2006).

  2. 2.

    For a paper discussing these initiatives in the Netherlands, see van der Veer et al. (2009).

  3. 3.

    For a discussion of the building blocks of data mining, see Zarsky (2002ā€“2003).

  4. 4.

    Such success has been recently detailed in several popular booksā€”see Baker (2008).

  5. 5.

    This outcome is interesting, as stories related to privacy in general have generated limited interest, less they involve an actual catastropheā€”personal data about a judge blocks his nomination, information regarding the address of an actress leads to her murder, and many other examples. Yet the data mining stories here addressed focus on potential harms, which have yet to materialize. This outcome tells an interesting story about the risks of data mining.

  6. 6.

    In some instances, the services rendered are not essential, thus allowing for consumer choiceā€”an option which requires rethinking many of the elements to be addressed below. Finally, the obligations and motivations of governmental entities are different than their commercial counterparts, thus altering the internal calculus leading to the final recommendations.

  7. 7.

    Since the matters addressed here were drawn out elsewhere, the analysis is brief. For a more in-depth discussion, see DeRosa 2004; Zarsky (2002ā€“2003). See also Taipale (2003).

  8. 8.

    For a discussion regarding the distinction among the twoā€”see Cate (2008).

  9. 9.

    For a discussion as to how these data mining techniques are carried out see, Zarsky (2002ā€“2003),

  10. 10.

    This is done both in advance, and after the fact, by ā€œweeding outā€ results she might consider as random, wrong or insignificant.

  11. 11.

    I was told by data mining experts that this is usually the case with face and image recognition software.

  12. 12.

    However, ā€œbuildingā€ a theoretical justification to a statistical correlation is usually easy and merely requires some imagination. Thus, one can easily question the extent of protection from arbitrary results a call for ā€œcausationā€ provides.

  13. 13.

    Transparency is an additional category which requires scrutiny and discussion, yet it calls for a very different form of analysis. For more on this issue, see Zarsky (2012).

  14. 14.

    For an empirical study pointing in this direction, see Christopher Slobogin (2007).

  15. 15.

    In the United States, such rights are governed by the Privacy Act, which call for the publication of SORNs to notify the public of such uses. For more on this, see the Privacy Act Overview of 2010, accessed July 12, 2011, http://www.justice.gov/opcl/1974indrigacc.htm.

  16. 16.

    This is not the classic understanding of a ā€œsearch,ā€ which does not pertain to searches of data which were already collected. However, newer theories reexamining the ā€œsearchā€ terminology question such wisdom. Slobogin, for instance, believes the term should be used in the same way the public understands it. According to his empirical studies, that includes data mining. Mark Blitz is also examining whether searches within data or other sources the government obtained lawfully could be considered a ā€œsearch,ā€ nonetheless, while focusing on DNA samples.

  17. 17.

    The notion of ā€œprivacy as controlā€ was set forth by Alan Westin and implemented in various elements of both the OECD Principles and the EU Data Protection Directives. See generally Westin (1967); on the EU Data Protection Directives in general, see Solove and Schwartz (2006).

  18. 18.

    For a discussion of this argument in the Data Mining context, see Cate (2008) who notes it as perhaps the most powerful one in this context. Strandburg makes a similar argument, while pointing out that in some contexts data mining might impede on US Constitutional First Amendment Rights, such as freedom of speech and association. For a general discussion of privacy and autonomy, see Solove (2001).

  19. 19.

    For a discussion and critique of this distinction, see Slobogin (2007).

  20. 20.

    I intentionally emphasize the lack of laws in the governmental realm. In the commercial realm there is some reference to this issue in the Fair Credit Reporting Act. For a critique of this situation and a call for a change, see Harcourt (2007). For a very different perspective, see Schauer (2006).

  21. 21.

    For a full discussion of this issue in EU law (as well as the law in the various states) see an excellent discussion in Korff (2011)

  22. 22.

    For a discussion of errors in general and of this context in particular, see Ramasastry (2004).

  23. 23.

    US ā€œdue processā€ doctrine does not apply for various reasons. In some contexts, EU law provides for a right to understand the processesā€™ internal workings. For a discussion of this issue, see Steinbock (2005).

  24. 24.

    It would mean that all individuals, for instance, would be required to arrive 30Ā minutes earlier at the airport to go through heightened security checks.

  25. 25.

    For instance, discrimination on the basis of ā€œsensitive informationā€ such as race is illegal, even when such discrimination is statistically justified. For a partial critique of this outcome, see Schauer (2006).

  26. 26.

    For instance, one might argue that encumbering the ability of all individuals to travel when striving to provide for security might limit their freedom of movement. I will refrain from developing this notion. For more on this point, see Slobogin (2007, 102).

  27. 27.

    An option promoted by Harcourt (2007).

  28. 28.

    When the chance for selection is very low, such enforcement loses its teeth, as the penalties inflicted cannot be disproportionate to the specific transgression. See similar dynamics occurring in the music and film industry when striving to enforce their rights online.

  29. 29.

    Clearly, just selecting every tenth person or a similar strategy will allow easy gaming of the system by interested parties (all they have to do is travel in pairs and one of them will surely be beyond suspicion!).

  30. 30.

    I thank Kathy Strandburg for making this point.

  31. 31.

    The discussion is intentionally avoiding instances in which the actions resulting from the higher level of scrutiny constitute searches, or other actions which directly impede upon the liberty of the subjects. I am doing so to sidestep the broader discussion about Terry stops and other such actions, where ā€œreasonable causeā€ or other levels of scrutiny are mandated. For a mapping of these contexts, see Slobogin (2007, 23).

  32. 32.

    For instance, if the officer focuses on someone with a gun, it is because he created a mental profile with the category ā€œpeople with guns,ā€ and is focusing his attention on those within that category.

  33. 33.

    As Schauer explains, such practices are wide spread, and applied by customs, as well as by the IRS; see Schauer (2006).

  34. 34.

    For empirical findings showing this point, see Slobogin (2007, 195).

  35. 35.

    This was the case in the Netflix/Imdb fiasco. Such multi-factored datasets are now at the disposal of many public and private entities.

  36. 36.

    This option still holds substantial benefits, as it minimizes the risk of illegal abuse of the information by a government executives (such as the many stories occurring every year of tax officials sharing or selling personal information about citizens). Note, however, that this problem could also be mitigated through disciplinary actions.

  37. 37.

    If one form of discretion generates errors which are frequent, the entire process is compromised. However, let us assume that the threshold of a reasonable level of errors would be attended to as a preliminary matterā€”and if the level of errors will be unacceptably high, the project would be set aside. Yet as I demonstrated in the text, even with an overall acceptable level of errors, problems can still prevail.

  38. 38.

    This was exactly, according to Schauer, the case in Oā€™Hara airport, where it was revealed that the percentage of minorities made subject to intrusive cavity searches was very high. When such practices, which were no doubt motivated by racial animosity, were stopped, the success of such searches increased. See Schauer (2006).

  39. 39.

    The authors explain that part of the role of the 4th Amendment is to limit the discretion of law enforcement. Harcourt and Meares (2010).

  40. 40.

    I acknowledge that even when using a central system, some level of examining of the actions of the periphery operation is needed as well. Yet this would be substantially less than the level required in the third alternative model.

  41. 41.

    For a discussion of this matter in the Corporate Risk Management setting.

References

  • Ayres, Ian. 2007. Super crunchers. New York: Bantam Dell.

    Google ScholarĀ 

  • Baker, Stephan. 2008. The numerati. New York: HMH.

    Google ScholarĀ 

  • Bamberger, Kenneth A. 2010. Technologies of compliance: Risk and regulation in a digital age. Texas Law Review 88 (4): 669ā€“739.

    Google ScholarĀ 

  • Blitz, Mark. 2011. Warranting a closer look when should the government need probable cause to analyze information it has already acquired? PLSC 2011 Workshop. Draft, on file with author.

    Google ScholarĀ 

  • Cate, Fred H. 2008. Data mining: The need for a legal framework. Harvard Civil Rights-Civil Liberties Law Review 43 (2): 435ā€“489.

    Google ScholarĀ 

  • DeRosa, Mary. 2004. Data mining and data analysis for counterterrorism. Center for Strategic and International Studies (CSIS) report, 14. http://csis.org/files/media/csis/pubs/040301_data_mining_report.pdf. Accessed 12 July 2011.

    Google ScholarĀ 

  • Harcourt, Bernard E. 2007. Against prediction. Chicago: University of Chicago Press.

    Google ScholarĀ 

  • Harcourt, Bernard E., and Tracey L. Meares. 2010. Randomization and the fourth amendment. University of Chicago Law & Economics, Olin Working Paper No. 530:3ā€“76.

    Google ScholarĀ 

  • IBM. 2010. Memphis police department reduces crime rates with IBM predictive analytics software. http://www-03.ibm.com/press/us/en/pressrelease/32169.wss. Accessed 12 July 2011.

    Google ScholarĀ 

  • Jonas, Jeff, and Harper, Jim. 2006. Effective counterterrorism and the limited role of predictive data mining. Cato Institute, Policy Analysis 584: 1ā€“12. www.thebreakingnews.com/files/articles/datamining-cato-report.pdf. Accessed 12 July 2011.

    Google ScholarĀ 

  • Korff, Douwe. 2011. Data protection laws in the EU: The difficulties in meeting the challenges posed by global social and technical developments. Working Paper No. 2, European Commission Directorate-General Justice, Freedom and Security (January 20, 2010), final [extended and re-edited] version. http://ec.europa.eu/justice/policies/privacy/docs/studies/new_privacy_challenges/final_report_working_paper_2_en.pdf. Accessed 12 July 2011.

    Google ScholarĀ 

  • Korobkin, Russell. 2003. Bounded rationality, standard form contracts, and unconscionability. University of Chicago Law Review 70:1203ā€“1295.

    ArticleĀ  Google ScholarĀ 

  • Markle Foundation. 2003. Creating a trusted network for homeland security (December 1, 2003). http://www.markle.org/publications/666-creating-trusted-network-homeland-security. Accessed 12 July 2011.

    Google ScholarĀ 

  • Nissenbaum, Helen. 2009. Privacy in Context. California: Stanford University Press.

    Google ScholarĀ 

  • Ohm, Paul. 2010. Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review 57:1701ā€“1777.

    Google ScholarĀ 

  • Ramasastry, Anita. 2004. Lost in translation? Data mining, national security and the adverse inference problem. Santa Clara Computer & High Tech. Law Journal 22:757ā€“796.

    Google ScholarĀ 

  • Schauer, Frederick. 2006. Profiles, probabilities and stereotyping. Harvard University Press.

    Google ScholarĀ 

  • Schneier, Bruce. 2006. Why data mining wonā€™t stop terror. Wired (September 3, 2006). http://www.wired.com/politics/security/commentary/securitymatters/2006/03/70357. Accessed 12 July 2011.

    Google ScholarĀ 

  • Schwartz, Paul M. 2008. Reviving Telecommunications Surveillance Law. University of Chicago Law Review 75:310ā€“311.

    Google ScholarĀ 

  • Scism, Leslie, and Maremont, Mark. 2011. Insurers test data profiles to identify risky clients. The Wall Street Journal. http://online.wsj.com/article/SB10001424052748704648604575620750998072986.html?mod=WSJ_hp_LEADNewsCollection. Accessed 12 July 2011.

    Google ScholarĀ 

  • Slobogin, Christopher. 2007. Privacy at risk: The New Government Surveillance and the Fourth Amendment. Chicago: The University of Chicago Press.

    Google ScholarĀ 

  • Slobogin, Christopher. 2008. Government data mining and the fourth amendment. The University of Chicago Law Review 75:317ā€“341.

    Google ScholarĀ 

  • Slobogin, Christopher. 2010. Is the fourth amendment relevant in a technological age? Governance Studies at Brookings (December 8, 2010). http://www.brookings.edu/~/media/Files/rc/papers/2010/1208_4th_amendment_slobogin/1208_4th_amendment_slobogin.pdf. Accessed July 12, 2011.

    Google ScholarĀ 

  • Solove, Daniel J. 2001. Privacy and power: Computer databases and metaphors for information privacy. Stanford Law Review 53:1393ā€“1462.

    ArticleĀ  Google ScholarĀ 

  • Solove, Daniel J. 2008. Data mining and the security-liberty debate. University of Chicago Law Review 74:343ā€“362.

    Google ScholarĀ 

  • Solove, Daniel J., and Schwartz, Paul M. 2006. Information Privacy Law. New York: Aspen.

    Google ScholarĀ 

  • Steinbock, Daniel J. 2005. Data matching, data mining, and due process. Georgia Law Review 40:1ā€“86.

    Google ScholarĀ 

  • Strandburg, Kathrine J. 2008. Freedom of association in a networked world: First amendment regulation of relational surveillance. Boston College Law Review 49:741ā€“822.

    Google ScholarĀ 

  • Taipale, Kim A. 2003. Data mining and domestic security: Connecting the dots to make sense of data. Columbia Science and Technology Law Review 5 (2): 1ā€“83.

    Google ScholarĀ 

  • TAPAC. 2004. The report of the technology and privacy advisory committee, safeguarding privacy in the fight against terrorism. http://epic.org/privacy/profiling/tia/tapac_report.pdf (Hereinafter TAPAC Report). Accessed 12 July 2011.

    Google ScholarĀ 

  • Tor, Avishalom. 2008. The methodology of the behavioral analysis of law. Haifa Law Review 4:237ā€“327.

    Google ScholarĀ 

  • U.S. General Accounting Office. 2004. Data mining: Federal efforts over a wide range of uses. Report to the ranking minority member, subcommittee on financial management, the budget, and international security, committee on governmental affairs, U.S. senate, GAO-04ā€“548. Washington: 9ā€“54. http://www.gao.gov/new.items/d04548.pdf. Accessed 12 July 2011.

    Google ScholarĀ 

  • van der Veer, R.C.P., Roos, H.T., and van der Zanden, A. 2009. Data mining for intelligence led policing. Paper presented at the proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France (June 28ā€“July 01, 2009). http://www.sentient.nl/docs/data_mining_for_intelligence_led_policing.pdf. Accessed 12 July 2011.

    Google ScholarĀ 

  • Westin, Alan. 1967. Privacy and Freedom. New York: Atheneum.

    Google ScholarĀ 

  • Zarsky, Tal Z. 2002ā€“2003. Mine your own business!: Making the case for the implications of the data mining of personal information in the forum of public opinion. Yale Journal of Law & Technology 5:1ā€“56.

    Google ScholarĀ 

  • Zarsky, Tal Z. 2012. Transparency in data mining: From theory to practice, in Discrimination and Privacy in the Information Society, (Forthcoming) (Springer)

    Google ScholarĀ 

Download references

Acknowledgment

This chapter is part of an NWO-funded research project ā€œData Mining without Discrimination.ā€ I thank Kathy Strandburg, Richard Stewart, the participants of the NYU Law School Hauser Research Forum, the NYU Privacy Reading Group and the DePaul Law School CIPLIT presentation for their comments. I also thank Talya Ponchek for her comments and research assistance. For an extended version of the ideas presented here, see: Zarsky, Tal Z. 2012. Data Mining and its Alternatives Penn State Law Review 116(2):101.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tal Z. Zarsky .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2012 Springer Science+Business Media B.V.

About this chapter

Cite this chapter

Zarsky, T.Z. (2012). The Data Mining Balancing Act. In: Gutwirth, S., Leenes, R., De Hert, P., Poullet, Y. (eds) European Data Protection: In Good Health?. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-2903-2_5

Download citation

Publish with us

Policies and ethics