Artificial Intelligence and Law

, Volume 18, Issue 1, pp 1–43 | Cite as

Integrating induction and deduction for finding evidence of discrimination

  • Salvatore Ruggieri
  • Dino Pedreschi
  • Franco Turini


We present a reference model for finding (prima facie) evidence of discrimination in datasets of historical decision records in socially sensitive tasks, including access to credit, mortgage, insurance, labor market and other benefits. We formalize the process of direct and indirect discrimination discovery in a rule-based framework, by modelling protected-by-law groups, such as minorities or disadvantaged segments, and contexts where discrimination occurs. Classification rules, extracted from the historical records, allow for unveiling contexts of unlawful discrimination, where the degree of burden over protected-by-law groups is evaluated by formalizing existing norms and regulations in terms of quantitative measures. The measures are defined as functions of the contingency table of a classification rule, and their statistical significance is assessed, relying on a large body of statistical inference methods for proportions. Key legal concepts and reasonings are then used to drive the analysis on the set of classification rules, with the aim of discovering patterns of discrimination, either direct or indirect. Analyses of affirmative action, favoritism and argumentation against discrimination allegations are also modelled in the proposed framework. Finally, we present an implementation, called LP2DD, of the overall reference model that integrates induction, through data mining classification rule extraction, and deduction, through a computational logic implementation of the analytical tools. The LP2DD system is put at work on the analysis of a dataset of credit decision records.


Direct discrimination Indirect discrimination Affirmative actions Classification rules Data mining Knowledge discovery Logic programming 


  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of VLDB 1994, Morgan Kaufmann, pp 487–499Google Scholar
  2. Agresti A (2002) Categorical data analysis. Wiley, LondonzbMATHCrossRefGoogle Scholar
  3. Agresti A, Brian C (2000) Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. Am Stat 54(4):280–288CrossRefGoogle Scholar
  4. Apt KR (1997) From logic programming to prolog. Prentice Hall, EnglewoodGoogle Scholar
  5. Australian Legislation (2010)(a) Equal Opportunity Act—Victoria State, (b) Anti-Discrimination Act—Queensland State.
  6. Baesens B, Gestel TV, Viaene S, Stepanova M, Suykens J, Vanthienen J (2003) Benchmarking state-of-the-art classification algorithms for credit scoring. J Oper Res Soc 54(6):627–635zbMATHCrossRefGoogle Scholar
  7. Becker GS (1957) The economics of discrimination. University of Chicago Press, ChicagoGoogle Scholar
  8. Bell M, Chopin I, Palmer F (2007) Developing anti-discrimination law in Europe. European Network of Legal Experts in Anti-Discrimination,
  9. Calem PS, Gillen K, Wachter S (2004) The neighborhood distribution of subprime mortgage lending. J Real Estate Finance Econ 29:393–410CrossRefGoogle Scholar
  10. Chien CF, Chen L (2008) Data mining to improve personnel selection and enhance human capital: a case study in high-technology industry. Expert Syst Appl 34(1):280–290CrossRefGoogle Scholar
  11. Dymski GA (2006) Discrimination in the credit and housing markets: findings and challenges. In: Rodgers WM (ed) Handbook on the economics of discrimination. Edward Elgar Publishing Inc., Northampton, MA, pp 215–259Google Scholar
  12. Ellis E (2005) EU Anti-Discrimination Law. Oxford University Press, OxfordGoogle Scholar
  13. ENAR (2007) European Network Against Racism, Fact Sheet 33: multiple discrimination.
  14. ENAR (2008) European Network Against Racism, Fact Sheet 35: positive actions.
  15. European Union Legislation (2010) (a) Racial Equality Directive, (b) Employment Equality Directive.
  16. Farrington CP, Manning G (1990) Test statistics and sample size formulae for comparative binomial trials with null hypothesis of non-zero risk difference or non-unity relative risk. Stat Med 9:1447–1454CrossRefGoogle Scholar
  17. Fleiss JL, Levin B, Paik MC (2003) Statistical methods for rates and proportions. Wiley, LondonzbMATHCrossRefGoogle Scholar
  18. Gastwirth JL (1984) Statistical methods for analyzing claims of employment discrimination. Ind Labor Relat Rev 38:75–86CrossRefGoogle Scholar
  19. Gastwirth JL (1992) Statistical reasoning in the legal setting. Am Stat 46(1):55–69CrossRefGoogle Scholar
  20. Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3), Article 9Google Scholar
  21. Goethals B (2010) Frequent itemset mining implementations repository.
  22. Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1):55–86CrossRefMathSciNetGoogle Scholar
  23. Hand DJ, Henley WE (1997) Statistical classification methods in consumer credit scoring: a review. J Roy Stat Soc Ser A 160:523–541CrossRefGoogle Scholar
  24. Harford T (2008) The logic of life. The Random House Publishing Group, New York, NYGoogle Scholar
  25. Hintoglu AA, Inan A, Saygin Y, Keskinöz M (2005) Suppressing data sets to prevent discovery of association rules. In: Proceedings of IEEE ICDM 2005, IEEE Computer Society, pp 645–648Google Scholar
  26. Holzer HJ, Neumark D (eds) (2004) The economics of affirmative action. Edward Elgar, CheltenhamGoogle Scholar
  27. Holzer HJ, Neumark D (2006) Affirmative action: what do we know? J Policy Anal Manag 25:463–490CrossRefGoogle Scholar
  28. Hunter R (1992) Indirect discrimination in the workplace. The Federation Press, AnnandaleGoogle Scholar
  29. Johnston B, Governatori G (2003) Induction of defeasible logic theories in the legal domain. In: Proceedings of ICAIL 2003, ACM, pp 204–213Google Scholar
  30. Kamiran F, Calders T (2009) Classification without discrimination. In: IEEE international conference on computer, control & communication (IEEE-IC4), IEEE pressGoogle Scholar
  31. Kaye D, Aickin M (eds) (1992) Statistical methods in discrimination litigation. Marcel Dekker, Inc., New YorkGoogle Scholar
  32. Kim KH (2007) Favoritism and reverse discrimination. Eur Econ Rev 51:101–123CrossRefGoogle Scholar
  33. Knopff R (1986) On proving discrimination: statistical methods and unfolding policy logics. Can Public Policy 12:573–583CrossRefGoogle Scholar
  34. Kuhn P (1987) Sex discrimination in labor markets: the role of statistical evidence. Am Econ Rev 77:567–583Google Scholar
  35. LaCour-Little M (1999) Discrimination in mortgage lending: a critical review of the literature. J Real Estate Lit 7:15–49CrossRefGoogle Scholar
  36. Lerner N (1991) Group rights and discrimination in international law. Martinus Nijhoff Publishers, DordrechtGoogle Scholar
  37. Lerner R, Nagai AK (2000) Reverse discrimination by the numbers. J Acad Quest 13:71–84CrossRefGoogle Scholar
  38. Leung HM, Kupper LL (1981) Comparisons of confidence intervals for attributable risk. Biometrics 37(2):293–302zbMATHCrossRefGoogle Scholar
  39. Makkonen T (2006) Measuring discrimination: data collection and the EU equality law. European Network of Legal Experts in Anti-Discrimination,
  40. Makkonen T (2007) European handbook on equality data. European Network of Legal Experts in Anti-Discrimination,
  41. Newcombe RG (1998) Interval estimation for the difference between independent proportions: comparison of eleven methods. Stat Med 17:873–890CrossRefGoogle Scholar
  42. Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning databases.
  43. Pedreschi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of ACM KDD 2008, ACM, pp 560–568, Extended version to appear in ACM Trans. on Knowledge Discovery from DataGoogle Scholar
  44. Pedreschi D, Ruggieri S, Turini F (2009) Measuring discrimination in socially-sensitive decision records. In: Proceedings of the SIAM SDM 2009, SIAM, pp 581–592Google Scholar
  45. Piette MJ, White PF (1999) Approaches for dealing with small sample sizes in employment discrimination litigation. J Forensic Econ 12:43–56Google Scholar
  46. Prakken H, Sartor G (2002) The role of logic in computational models of legal argument: a critical survey. In: Kakas AC, Sadri F (eds) Computational logic. Logic programming and beyond, Springer, Lecture notes in Computer Science, vol 2408, pp 342–381Google Scholar
  47. R Development Core Team (2010) R: a language and environment for statistical computing. Version 2.7.2,
  48. Rauch J, Simunek M (2005) An alternative approach to mining association rules. In: Lin TY, Ohsuga S, Liau C-J, Hu X, Tsumoto S (eds) Foundations of data mining and knowledge discovery, studies in computational intelligence, vol 6. Springer, USA, pp 211–231Google Scholar
  49. Rauch J, Simunek M (2010) 4-ft Miner procedure.
  50. Reiczigel J, Abonyi-Tóth Z, Singer J (2008) An exact confidence set for two binomial proportions and exact unconditional confidence intervals for the difference and ratio of proportions. Comput Stat Data Anal 52(11):5046–5053zbMATHCrossRefGoogle Scholar
  51. Riach PA, Rich J (2002) Field experiments of discrimination in the market place. Econ J 112:480–518CrossRefGoogle Scholar
  52. Rorive I (2009) Proving discrimination cases—the role of situation testing. Centre For Equal Rights & Migration Policy Group
  53. Schiek D, Waddington L, Bell M (2007) Cases, materials and text on National, Supranational and International Non-Discrimination Law. IUS Commune Casebooks for the Common Law of EuropeGoogle Scholar
  54. Sowell T (ed) (2005) Affirmative action around the World: an empirical analysis. Yale University Press, New HavenGoogle Scholar
  55. Squires GD (2003) Racial profiling, insurance style: insurance redlining and the uneven development of metropolitan areas. J Urban Aff 25(4):391–410CrossRefMathSciNetGoogle Scholar
  56. Sterling L, Shapiro E (1994) The art of prolog, 2nd edn. The MIT Press, CambridgeGoogle Scholar
  57. Stranieri A, Zeleznikow J (1999) The evaluation of legal knowledge based systems. In: Proceedings of ICAIL 1999, ACM, pp 18–24Google Scholar
  58. Stranieri A, Zeleznikow J, Gawler M, Lewis B (1999) A hybrid rule—neural approach for the automation of legal reasoning in the discretionary domain of family law in australia. Artif Intell Law 7(2–3):153–183CrossRefGoogle Scholar
  59. Sweeney L (2002) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzziness Knowl Based Syst 10(5):571–588zbMATHCrossRefMathSciNetGoogle Scholar
  60. Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining. Addison-Wesley, ReadingGoogle Scholar
  61. Thomas LC (2000) A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers. Int J Forecast 16:149–172zbMATHCrossRefGoogle Scholar
  62. Tian M, Tang ML, Ng HKT, Chan PS (2008) Confidence intervals for the risk ratio under inverse sampling. Stat Med 27:3301–3324CrossRefMathSciNetGoogle Scholar
  63. Tobler C (2008) Limits and potential of the concept of indirect discrimination. European Network of Legal Experts in Anti-Discrimination,
  64. UK Legislation (2010) (a) Sex Discrimination Act, (b) Race Relation Act.
  65. United Nations Legislation (2010) (a) Convention on the Elimination of All forms of Racial Discrimination, (b) Convention on the Elimination of All forms of Discrimination Against Women.
  66. US Federal Legislation (2010) (a) Equal Credit Opportunity Act, (b) Fair Housing Act, (c) Intentional Employment Discrimination, (d) Equal Pay Act, (e) Pregnancy Discrimination Act, (f) Civil Right Act.
  67. Verykios VS, Elmagarmid AK, Bertino E, Saygin Y, Dasseni E (2004) Association rule hiding. IEEE Trans Knowl Data Eng 16(4):434–447CrossRefGoogle Scholar
  68. Wang K, Fung BCM, Yu PS (2005) Template-based privacy preservation in classification problems. In: Proceedings of IEEE ICDM 2005, IEEE Computer Society, pp 466–473Google Scholar
  69. Webb GI (2000) Efficient search for association rules. In: Proceedings of ACM KDD 2000, ACM, pp 99–107Google Scholar
  70. Wielemaker J (2009) SWI-Prolog. University of Amsterdam, Version 5.6,
  71. Williams T, Kelley C (2010) Gnuplot. Version 4.0,
  72. Yin X, Han J (2003) CPAR: Classification based on Predictive Association Rules. In: Proceedings of SIAM SDM 2003, SIAM, pp 331–335Google Scholar
  73. Yinger J (1998) Evidence on discrimination in consumer markets. J Econ Perspect 12:23–40Google Scholar
  74. Zeleznikow J, Vossos G, Hunter D (1994) The IKBALS project: multi-modal reasoning in legal knowledge based system. Artif Intell Law 2(3):169–203CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  • Salvatore Ruggieri
    • 1
  • Dino Pedreschi
    • 1
  • Franco Turini
    • 1
  1. 1.Dipartimento di InformaticaUniversità di PisaPisaItaly

Personalised recommendations