Abstract
Data mining is not an invasion of privacy because access to data is only by machines, not by people: this is the argument that is investigated here. The current importance of this problem is developed in a case study of data mining in the USA for counterterrorism and other surveillance purposes. After a clarification of the relevant nature of privacy, it is argued that access by machines cannot warrant the access to further information, since the analysis will have to be made either by humans or by machines that understand. It concludes that the current data mining violates the right to privacy and should be subject to the standard legal constraints for access to private information by people.
Similar content being viewed by others
Notes
"The law requires the NSA to not deliberately collect data on US citizens or on persons in the United States without a warrant based on foreign intelligence requirements.” (9/11 Commission 2004, 87) This avoidance of domestic data was considered a significant factor for the failure to prevent the 11 September 2001 attacks (ibid.).
Edwin Meese on MSNBC, in Chris Mattews’ program “Hardball”, 12 January 2006, 19:45. Fried in his article in the Boston Globe (Fried 2005). Fried was solicitor general in the second Reagan administration. Taipale argues data-mining is “…different than claiming that ‘everybody is being investigated’ through pattern-matching. In reality only the electronic footprints of transactions and activities are being scrutinized” (Taipale 2003, 66).
This is a common form of vehicle theft prevention worldwide. Germany now requires all trucks to carry GPS and logs their travel to collect motorway tolls. In 2005, it was seriously considered in Great Britain to control all movement of all vehicles via GPS, as a means to collect road tolls for the entire country, in replacement of road tax. Access to the center of London is already controlled by a system that logs license plates of all vehicles entering and leaving the designated zone. GPS has been used by US car rental companies to issue speeding tickets to their customers, a practice that has been legally challenged (O’Harrow 2005, 292).
Now spreading very widely: Used by all suppliers of WalMart, i.e. practically all consumer products in the USA. Required by law in the EU to be injected in all domestic cats and dogs.
It is estimated, as an extreme case, that London has 2.5 million video cameras, the average Londoner is filmed 300 times per day (US Congress 2002 2). The whole USA had about 2 million cameras in 2002 (Bailey 2004, 75; Keenan 2005, 57). Some London boroughs connect their video data to face-recognition software (O’Harrow 2005, 165f).
The sequence by 1024 is megabyte, gigabyte, terabyte, petabyte, exabyte, zettabyte, etc. The original report explains “If digitized with full formatting, the seventeen million books in the Library of Congress contain about 136 terabytes of information.” (Lyman and Varian 2003) Such numbers must be taken with great caution, particularly since the notion of “amount of data” is utterly meaningless if data is not taken to be digital (How much data is there on your desk?). Even digital data can be compressed and have many formats. Some sources say that “intelligence data sources grow at the rate of four petabytes a month” (O’Harrow 2005, 212).
The same provisions, excluding military operations and intelligence activities overseas or against non-US citizens are in the “Department of Homeland Securities Appropriations Act”, sect. 8131 (b), as quoted in (Taipale 2003, 10, n. 28).
For more information about the NSA, consider National security archive 2005.
About the failure to identify 11 September: “To put this into perspective, throughout the summer of 2001 we had more than 30 warnings that something was imminent. We dutifully reported these, yet none of these subsequently correlated with terrorist attacks. The concept of ‘imminent’ to our adversaries is relative; it can mean soon or simply sometime in the future.” (Hayden 2002, 4) Hayden also stresses the difficulty of identifying and processing several languages and the crucial factor of processing on time.
Most references to TIA have been removed from the DARPA sites, but Director Pointexter’s outlook can be gathered from his slides (Poindexter 2002).
References
9/11 Commission (2004) The final report of the National Commission on terrorist attacks upon the United States. Norton, New York
Alexander RG (2006) Carnivore personal edition: exploring distributed data surveillance. AI Soc 20(4):483–492
Armour T (2002) Genoa II and total information awareness. 2002 DARPATECH symposium “Transforming fantasy” http://www.darpa.mil/darpatech2002/presentations/iao_pdf/slides/armouriao.pdf. Accessed 1 April 2007
Bailey D (2004) The open society paradox: why the twenty-first century calls for more openness, not less. Brasseys, Washington
Crane T (2003) The mechanical mind: a philosophical introduction to minds, machines and mental representation. 2nd edn. Routledge, London
DARPA (2003) fact file: a compendium of DARPA programs. Defense Advanced Research Projects Agency. http://www.darpa.mil/body/news/2003/final2003factfilerev1.pdf. Accessed 1 April 2007
DeRosa M (2004) Data mining and data analysis for counterterrorism. CSIS, Washington
Etzioni A (1999) The limits of privacy. Basic Books, New York
Fried C (2005) The case for surveillance. The Boston Globe. http://www.boston.com/news/globe/editorial_opinion/oped/articles/2005/12/30/the_case_for_surveillance?mode = PF. Accessed 1 April 2007
Hayden MV (2002) Statement for the record by Lieutenant General Michael V. Hayden, USAF, Director, National Security Agency/Chief, Central Security Service, before the joint inquiry of the Senate Select Committee on Intelligence and the House Permanent Select Committee on Intelligence” 17 October 2002. http://www.gwu.edu/∼nsarchiv/NSAEBB/NSAEBB24/nsa27.pdf. Accessed 1 April 2007
HEW (1973) Records, computers, and the rights of citizens. US Department of Health, Education and Welfare, report of the secretary’s advisory committee on automated personal data systems. http://www.epic.org/privacy/hew1973report/. Accessed 1 April 2007
IAO (2003) Report to congress regarding the Terrorism Information Awareness Program. Information Awareness Office (Department of Defense). http://wyden.senate.gov/leg_issues/reports/darpa_tia_summary.pdf. Accessed 1 April 2007
IEEE (2003) Proceedings of the Institute of Electrical and Electronics Engineers (IEEE) conference on advanced video and signal based surveillance. AVSS 2003, 21–22 July 2003 http://ieeexplore.ieee.org. Accessed 1 April 2007)
Kargupta H, Joshi A, Sivakumar K, Yesha Y (Eds) (2004) Data mining: next generation challenges and future directions. MIT, Boston
Keenan K (2005) Invasion of privacy: a reference handbook. ABC-CLIO, Santa Barbara
Lyman P, Varian HR (2003) How much information? http://www.sims.berkeley.edu/how-much-info–2003. Accessed 1 April 2007
Markle Foundation (2002) Protecting America’s freedom in the information age: a report of the Markle Foundation task force. http://www.markletaskforce.org. Accessed 1 April 2007
Markle Foundation (2003) Creating a trusted network for homeland security: a report of the Markle Foundation task force. http://www.markletaskforce.org. Accessed 1 April 2007
National security archive (2005). http://www.gwu.edu/∼nsarchiv/NSAEBB/NSAEBB24/index.htm. Accessed 1 April 2007
O’Harrow R (2005) No place to hide: behind the scenes of our emerging surveillance society. Simon & Schuster, New York
Pincus W, Eggen D (2006, A01) 325,000 Names on terrorism list: rights groups say database may include innocent people. The Washington Post. http://www.washingtonpost.com/wp-dyn/content/article/2006/02/14/AR2006021402125.html. Accessed 1 April 2007
Poindexter J (2002) Information Awareness Office overview. Introductory statement to the 2002 DARPATECH symposium. http://www.darpa.mil/darpatech2002/presentations/iao_pdf/speeches/poindext.pdf. Accessed 1 April 2007
Preston J, Bishop M (Eds.) (2002) Views into the Chinese room: new essays on Searle and artificial intelligence. Oxford University Press, Oxford
Rosenberg RA (ed) (2004) The social impact of computers (3rd edn.). Elsevier, San Diego
Searle JR (1980) Minds, brains and programs. Behav Brain Sci 3:417–457
Taipale KA (2003) Data mining and domestic security: connecting the dots to make sense of data. Columbia Sci Technol Law Rev V 2003–2004:1–83
TAPAC (2004) Safeguarding privacy in the fight against terrorism. Report of the Technology and Privacy Advisory Committee to the Department of Defense, 1 March 2004. http://purl.access.gpo.gov/GPO/LPS52114. Accessed 1 April 2007)
US Congress (2002) Privacy vs. security: electronic surveillance in the nation’s capital. Hearing before the House of Representatives, Subcommittee on the District of Columbia, Committee on Government Reform, Washington DC, 22 March 2002. http://purl.access.gpo.gov/GPO/LPS29636. Accessed 1 April 2007
Vaidya J, Clifton CW, Zhu YM (2005) Privacy preserving data mining. Springer, New York
Acknowledgments
The writing of this paper was carried out mainly during a Stanley J. Seeger Fellowship in Research at Princeton University. I am very grateful for this excellent opportunity. A first version of the paper, entitled “If You Had Nothing to Hide, Would You Still Mind Being Watched by Machines?” was presented at the workshop “Privacy: intercultural perspectives” at ZiF, Bielefeld University, in February 2006. I thank Karsten Weber for the invitation and all participants for the very stimulating discussions at that pleasant meeting. I also thank Gordana Dodig-Crnkovic for the very useful written comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Müller, V.C. Would you mind being watched by machines? Privacy concerns in data mining. AI & Soc 23, 529–544 (2009). https://doi.org/10.1007/s00146-007-0177-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00146-007-0177-3