Skip to main content

Big Data and Information Quality

  • Chapter
  • First Online:

Part of the book series: Synthese Library ((SYLI,volume 358))

Abstract

This paper is divided into two parts. In the first, I shall briefly analyse the phenomenon of “big data”, and argue that the real epistemological challenge posed by the zettabyte era is small patterns. The valuable undercurrents in the ocean of data that we are accumulating are invisible to the computationally-naked eye, so more and better technology will help. However, because the problem with big data is small patterns, ultimately, the game will be won by those who “know how to ask and answer questions” (Plato, Cratylus, 390c). This introduces the second part, concerning information quality (IQ): which data may be useful and relevant, and so worth collecting, curating, and querying, in order to exploit their valuable (small) patterns? I shall argue that the standard way of seeing IQ in terms of fit-for-purpose is correct but needs to be complemented by a methodology of abstraction, which allows IQ to be indexed to different purposes. This fundamental step can be taken by adopting a bi-categorical approach. This means distinguishing between purpose/s for which some information is produced (P-purpose) and purpose/s for which the same information is consumed (C-purpose). Such a bi-categorical approach in turn allows one to analyse a variety of so-called IQ dimensions, such as accuracy, completeness, consistency, and timeliness. I shall show that the bi-categorical approach lends itself to simple visualisations in terms of radar charts.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Fracking (hydraulic fracturing) is a technique in which a liquid (usually water), mixed with sand and chemicals, is injected underground at high pressure in order to cause small fractures (typically less than 1 mm), along which fluids such as gas (especially shale gas), petroleum and brine water can surface.

  2. 2.

    The body of literature on IQ is growing, see for example (Olson (2003), Wang et al. (2005), Batini and Scannapieco (2006), Lee et al. (2006), Al-Hakim (2007), Herzog et al. (2007), Maydanchik (2007), McGilvray (2008), Theys (2011)).

  3. 3.

    http://www.whitehouse.gov/omb/fedreg_reproducible

  4. 4.

    See more recently United States. Congress. House. Committee on Government Reform. Subcommittee on Regulatory Affairs (2006).

  5. 5.

    http://webarchive.nationalarchives.gov.uk/20090811143745/http://www.bristol-inquiry.org.uk

  6. 6.

    http://webarchive.nationalarchives.gov.uk/+/www.dh.gov.uk/en/Publicationsandstatistics/Publications/PublicationsPolicyAndGuidance/DH_4125508

  7. 7.

    http://mitiq.mit.edu/ICIQ/2013/

  8. 8.

    http://jdiq.acm.org/

  9. 9.

    http://www.dataqualitysummit.com/

  10. 10.

    http://mitiq.mit.edu/

  11. 11.

    Borges, “The Analytical Language of John Wilkins”, originally published in 1952, English translation in Borges (1964).

  12. 12.

    On the method of abstraction and LoA see Floridi (2008) and Floridi (2011).

  13. 13.

    http://www.ons.gov.uk/ons/guide-method/census/2011/how-our-census-works/how-we-took-the-2011-census/how-we-processed-the-information/data-quality-assurance/index.html

References

  • Al-Hakim, L. (2007). Information quality management: Theory and applications. Hershey: Idea Group Pub.

    Book  Google Scholar 

  • Batini, C., & Scannapieco, M. (2006). Data quality – Concepts, methodologies and techniques. Berlin/New York: Springer.

    Google Scholar 

  • Borges, J. L. (1964). Other inquisitions, 1937–1952. Austin: University of Texas Press.

    Google Scholar 

  • Cajochen, C., Altanay-Ekici, S., Münch, M., Frey, S., Knoblauch, V., & Wirz-Justice, A. (2013). Evidence that the lunar cycle influences human sleep. Current Biology: CB, 23(15), 1485–1488.

    Article  Google Scholar 

  • Census. (2011). Census data quality assurance strategy. http://www.ons.gov.uk/ons/guide-method/census/2011/the-2011-census/processing-the-information/data-quality-assurance/2011-census---data-quality-assurance-strategy.pdf

  • English, L. (2009). Information quality applied: Best practices for improving business information, processes, and systems. Indianapolis: Wiley.

    Google Scholar 

  • Floridi, L. (2008). The method of levels of abstraction. Minds and Machines, 18(3), 303–329.

    Article  Google Scholar 

  • Floridi, L. (2011). The philosophy of information. Oxford: Oxford University Press.

    Book  Google Scholar 

  • Herzog, T. N., Scheuren, F., & Winkler, W. E. (2007). Data quality and record linkage techniques. New York: Springer.

    Google Scholar 

  • Lee, Y. W., Pipino, L. L., Funk, J. D., & Wang, R. Y. (2006). Journey to data quality. Cambridge, MA: MIT Press.

    Google Scholar 

  • Luebke, D. M., & Milton, S. (1994). Locating the victim: An overview of census-taking, tabulation technology and persecution in Nazi Germany. Annals of the History of Computing, IEEE, 16(3), 25–39.

    Article  Google Scholar 

  • Maydanchik, A. (2007). Data quality assessment. Bradley Beach: Technics Publications.

    Google Scholar 

  • McGilvray, D. (2008). Executing data quality projects ten steps to quality data and trusted information. Amsterdam/Boston: Morgan Kaufmann/Elsevier.

    Google Scholar 

  • Olson, J. E. (2003). Data quality the accuracy dimension. San Francisco: Morgan Kaufmann Publishers.

    Google Scholar 

  • Raper, J. F., Rhind, D., & Shepherd, J. F. (1992). Postcodes: The new geography. Harlow: Longman.

    Google Scholar 

  • Redman, T. C. (1996). Data quality for the information age. Boston: Artech House.

    Google Scholar 

  • Theys, P. P. (2011). Quest for quality data. Paris: Editions TECHNIP.

    Google Scholar 

  • Tozer, G. V. (1994). Information quality management. Oxford: Blackwell.

    Google Scholar 

  • United States Federal Trade Commission. (2010). Social security numbers and id theft. New York: Nova.

    Google Scholar 

  • United States. Congress. House. Committee on Government Reform. Subcommittee on Regulatory Affairs. (2006). Improving information quality in the federal government: Hearing before the subcommittee on regulatory affairs of the committee on government reform, house of representatives, one hundred ninth congress, first session, july 20, 2005. Washington, DC: U.S. G.P.O.

    Google Scholar 

  • Wang, R. Y. (1998). A product perspective on total data quality management. Communication of the ACM, 41(2), 58–65.

    Article  Google Scholar 

  • Wang, Y. R., & Kon, H. B. (1992). Toward quality data: An attributes-based approach to data quality. Cambridge, MA: MIT Press.

    Google Scholar 

  • Wang, R. Y., Pierce, E. M., Madnik, S. E., Zwass, V., & Fisher, C. W. (Eds.). (2005). Information quality. Armonk/London: M.E. Sharpe.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luciano Floridi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Floridi, L. (2014). Big Data and Information Quality. In: Floridi, L., Illari, P. (eds) The Philosophy of Information Quality. Synthese Library, vol 358. Springer, Cham. https://doi.org/10.1007/978-3-319-07121-3_15

Download citation

Publish with us

Policies and ethics