An X-Ray Exam of Your Data

Baer, Tobias

doi:10.1007/978-1-4842-4885-0_19

Tobias Baer²

1480 Accesses

Abstract

In this chapter, we will dive into the question of how you can detect seeds for algorithmic biases in your data. As must have become clear from the previous chapters, we are chasing many different foes; therefore, we need to scan our data for many different types of potential issues, just as an annual health check might include a dozen procedures to check blood, urine, and various organs. With the recommendations in this chapter, my goal is to give you "a thousand eyes and a thousand ears" in six fairly easy and efficient steps. These analyses will create a set of maps where each map attempts to shade in bright red specific areas of concern, just like how an X-ray exam would reveal broken bones, ruptured organs, and swallowed cutlery. This will enable you to review all significant irregularities and (considering your context knowledge and what you have learned in this book, especially the previous chapter) decide whether there is reason for concern, and if so, what best to do to avoid an algorithmic bias.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Of course, if you are an astrologist, you also might conclude that an IV of 0.3 is too low and indicative of some zodiac signs incorrectly coded in your data...
2.
If the dependent variable is continuous, you do not use IV. In this case, you can create an equivalent imputation table by calculating median outcomes for non-numerical categories (including missing). For prioritization, I use the larger of Pearson’s and Spearman’s correlation.
3.
Leonard Kaufmann and Peter J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, Wiley-Interscience, 1990.
4.
The Mahalanobis distance is a scaled version of the Euclidean distance as it is normalized by the standard deviation of each variable.

Author information

Authors and Affiliations

Kaufbeuren, Germany
Tobias Baer

Authors

Tobias Baer
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Baer, T. (2019). An X-Ray Exam of Your Data. In: Understand, Manage, and Prevent Algorithmic Bias. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4885-0_19

Download citation

DOI: https://doi.org/10.1007/978-1-4842-4885-0_19
Published: 08 June 2019
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-4884-3
Online ISBN: 978-1-4842-4885-0
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)

Publish with us

Policies and ethics