Skip to main content

An X-Ray Exam of Your Data

  • Chapter
  • First Online:
Understand, Manage, and Prevent Algorithmic Bias
  • 1480 Accesses

Abstract

In this chapter, we will dive into the question of how you can detect seeds for algorithmic biases in your data. As must have become clear from the previous chapters, we are chasing many different foes; therefore, we need to scan our data for many different types of potential issues, just as an annual health check might include a dozen procedures to check blood, urine, and various organs. With the recommendations in this chapter, my goal is to give you "a thousand eyes and a thousand ears" in six fairly easy and efficient steps. These analyses will create a set of maps where each map attempts to shade in bright red specific areas of concern, just like how an X-ray exam would reveal broken bones, ruptured organs, and swallowed cutlery. This will enable you to review all significant irregularities and (considering your context knowledge and what you have learned in this book, especially the previous chapter) decide whether there is reason for concern, and if so, what best to do to avoid an algorithmic bias.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Of course, if you are an astrologist, you also might conclude that an IV of 0.3 is too low and indicative of some zodiac signs incorrectly coded in your data...

  2. 2.

    If the dependent variable is continuous, you do not use IV. In this case, you can create an equivalent imputation table by calculating median outcomes for non-numerical categories (including missing). For prioritization, I use the larger of Pearson’s and Spearman’s correlation.

  3. 3.

    Leonard Kaufmann and Peter J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, Wiley-Interscience, 1990.

  4. 4.

    The Mahalanobis distance is a scaled version of the Euclidean distance as it is normalized by the standard deviation of each variable.

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Tobias Baer

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Baer, T. (2019). An X-Ray Exam of Your Data. In: Understand, Manage, and Prevent Algorithmic Bias. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4885-0_19

Download citation

Publish with us

Policies and ethics