Keywords

The following article deals with the topic of discrimination “by” a recommender system that is based on incomplete or biased data or algorithms. After a short introduction (I.), I will describe the main reasons why discrimination by such a recommender system can happen (II.). Afterwards I will describe the current legal frame (III.) and conclude on how the future legal frame could look like and how the legal situation might be further improved (IV.).

1 Introduction

A recommender system gives recommendations based on an algorithm, often a machine learning algorithm (Projektgruppe Wirtschaft, Arbeit, Green IT 2013, 20). A machine learning algorithm basically works in the way that it takes a set of data and tries to find correlations between different data sets. If it finds enough correlations, it might derive a rule from those correlations. Based on the rule, the algorithm then makes a prediction about how a similar input might be handled in the future. Based on the prediction, the recommendation is made. For example, a machine learning algorithm that is supposed to classify cats, is trained on a certain number of pictures of cats and other animals. The algorithm then finds a correlation regarding the shape and size of ears, the tail, and whiskers. When novel pictures are used as inputs it checks for these features to conclude whether the picture shows a cat or not.

All these steps, be it the data set or data gathering, the finding of the correlations or, consequently, of the rules and predictions, can contain biases. As a result, the recommendation can contain those biases as well, which might lead to a discriminatory recommendation, e.g. that a machine gives a recommendation that is more favorable towards men than women or towards persons from a privileged social background than persons from another background (Alpaydın 2016, 16 ff.; Gerards and Xenidis 2021, 32 ff.; Kelleher 2019, 7 ff.; Kim and Routledge 2022, 75–102, 77 ff.; Vöneky 2020, 9–22, 21).

While some recommendations, e.g. the ranking of proposed items on a shopping website, (Wachter 2020, 367–430, 369 ff.) can be of a lesser fundamental rights relevance (Speicher et al. 2018), some recommender systems can be extremely relevant for the well-being of a person. E.g. a website can match employers and employees. If the website does not propose a possible employee for a job even though s/he would have been well-suited (Lambrecht and Tucker 2020, 2966–2981), that is not only a question of a bad-functioning algorithm but can touch the professional existence of the person left out (see, e.g., recital 71 of the Data Protection Directive – General Data Protection Regulation (GDPR 2016, 1)). Similarly, rankings of professionals (doctors, lawyers etc.) for somebody looking for the relevant service, are highly important as only the first few candidates have a chance to be chosen.Footnote 1

2 Reasons for Discriminating Recommendations

There are several reasons why a recommendation can be discriminating. They can basically be distinguished into three categories: The data set from which the machine learning algorithm is trained and adjusted can lack the relevant diversity (1), the training data can contain conscious or unconscious bias of the people creating the data (2) and, finally, the underlying algorithm can be modelled in a way that it enhances discriminations (3) (von Ungern-Sternberg forthcoming).

2.1 Lack of Diversity in Training Data

The level of diversity in training data is paramount for the outcome of the concrete recommendation. One famous example where the lack of diversity lead to discrimination of women, was the Amazon hiring tool (Gershgorn 2018): The hiring tool was supposed to make objective predictions of the quality and suitability of applying job candidates. Problematic was that the algorithm was “fed” by application data of the last decade – which included a significant higher proportion of male (and probably white) candidates. The training data, therefore, lacked diversity regarding women. As a consequence, the hiring tool “concluded” that women were less qualified for the job, resulting in discriminatory recommendations (Gershgorn 2018). Similarly, whenever training data is only taken from reality and not created artificially, there is a high probability that it will lack diversity – especially in jobs that have typically a higher number of men (as STEM areas - Science, Technology, Engineering, and Mathematics (Wikipedia 2022)) or women (as care and social work) or lack – so far – People of Colour (PoC) or candidates with an immigration, LGBTIAQ* or disability background, as in these jobs the representation of these groups might be extraordinarily higher or lower than the one of other groups (Reiners 2021; Sheltzer and Smith 2014, 10107–10112). The effect of missing diversity in training data was also shown in face recognition software using machine learning algorithms: Face recognition software that was trained mainly with photos from white and male people, afterwards had stronger problems to identify black or female and especially female black persons (Buolamwini and Gebru 2018).

2.2 (Unconscious) Bias in Training Data

The second, very influential reason why recommender systems often show discriminating results is the fact that the training data very often contains data from real life people and therefore, also reflects their conscious or unconscious bias. For example, there has been a study of the University of Bonn regarding “Gender Differences in Financial Advice” (Bucher-Koenen et al. 2021; Cavalluzzo and Cavalluzzo 1998, 771–792) analysing the recommendations financial advisors gave to different people looking for advice. The study shows that usually the recommendations women receive are more expensive than those of male candidates. There are several explanations, e.g. the fact that women very often are more risk adverse, resulting in more expensive but also safer investments. Another possible reason was that men often look for advice to get a “second opinion”, while women do not consult other advisors and lack the information male candidates may already have (Bucher-Koenen et al. 2021, 12 ff., 14 ff.). An algorithm that “learns” from this data might conclude that women always should get the more expensive recommendations without looking at the concrete woman applying. The whole problem can be enhanced by data labelling practises. Training data usually gets labelled as “correct” or “incorrect” (or “good” or “bad”) to enable the learning process of the algorithm. Whenever the decision whether a résumé or a person’s performance is “good” is not only based on the hiring decision, but furthermore, a separate person labels it as “good” or “bad”, the labelling decision can contain an additional (unconscious) bias of the labelling person (Calders and Žliobaitė 2013, 48 ff.). For example, there are algorithms that recommend professionals or professional services, often based on users’ recommendations. Very often a ranking is made with those receiving the highest recommendation coming first (thus having the label “good”).Footnote 2 This can discriminate e.g. women or members of minority groups: There is research that people rate members of these groups or women typically less favourable than a man not belonging to a minority group even though the performance is the same. Research shows, e.g., that equal résumés with a male or female name on it are evaluated differently, usually the female one less favourable (by male and female evaluators equally) (Moss-Racusin et al. 2012, 16474–16479; Handley et al. 2015, 13201–13206). The same applies to teaching materials in law schools (Özgümüs et al. 2020, 1074).

If a machine learning algorithm, thus, ranks the recommendation of professionals or professional services based on these user evaluations, the probability is high that these (unconscious) biases that led to a less favorable rating in the first place, will also lead to a lower ranking in the recommendation – with negative influence e.g., on the income and career of the professional. For instance, a study looking at the company Uber that based the ranking of its drivers on consumers’ ratings, shows these biases clearly (Rosenblat et al. 2017, 256–279).

2.3 Modelling Algorithm

Finally, the algorithm can be modelled in a way that it enhances biases already contained in the training data. One reason can of course be the selection of the relevant features the algorithm uses for the selection – e.g., if a personalized ad algorithm filters ads only according to the gender of the user, the result might be that women always receive recommendations for sexy dresses and make-up while men might always receive recommendations for adventure trips, barbeque and home building tools (Ali et al. 2019, 1–30).

Flaws in the modelling can also have a tremendous impact depending on the area where they are used. Another example of discriminating results with regard to the programming of the algorithm could be found in the famous COMPAS program used by several US-States (Angwin et al. 2016; Flores et al. 2016, 38–46; Martini 2019, 2 ff.). This program aimed at recommendations regarding the re-offense probability of criminal offenders. A high risk for future crime would lead to a less favorable treatment in detention – e.g. a higher bail or an exclusion of the possibility to be bailed out. The program reflected the unconscious bias the judges had towards Afro-American candidates, assuming their re-offense risk would be higher and towards Caucasian offenders, assuming their re-offense risk would be lower. These two flaws from the real world could have been at least mitigated by a calibration of the algorithm allocating different error rates to different groups within the training data, making the algorithm “learn” to avoid the same bias. This is important whenever different groups have different base rates, i.e., rates of positive or negative outcomes.

Therefore, one problem in the modelling algorithm was that the allocation of error rates analyzing the existing data was equal towards both groups, even though it should have included the fact that a Caucasian person had two biases in his/her favor (no assumption of higher re-offence risk and assumption of lower re-offence risk) while the Afro-American person had only one against him (the assumption of a higher re-offence risk), thus, different base rates. The probability that the outcome regarding of an Afro-American person would be a false (and negative) prediction, therefore, was higher. This should have been reflected in the error rate.

Therefore, an equal allocation of error rates even enhanced the biases already contained in the training data (Chouldechova 2017, 153–163; Rahman 2020; Barocas et al. 2018, 23, 31, 68). Problematic, on the other hand, is that different error rates assume that there are differences in groups, thus making a distinction even though a distinction was supposed to be avoided (Barocas et al. 2018, 47 ff.).

2.4 Interim Conclusion and Thoughts

Recommender systems can discriminate as they can reinforce and deepen stereotypes and biases already found in our society. Several problems can lead or enhance those outcomes: First, the data used from experience, is always from the past, thus reflecting biases and difficulties from the past. Thus, the person selecting the training data must always have in mind that the past has no perfect data to reflect the diversity of our society. So-called “counterfactuals” that have to be created artificially can help to avoid the lack of diversity (Cofone 2019, 1389–1443; Mothilal et al. 2020; Oosterhuis and de Rijke 2020).Footnote 3 Counterfactuals refer to artificially created data sets that can counterbalance the aforementioned lack of diversity in data stemming from reality – e.g., the aforementioned lack of female résumés in the STEM areas can be counterbalanced by introducing artificially created female résumés.Footnote 4

Second, while it is easy to avoid differentiation features that are obviously discriminatory, such as “race” or “gender”, the compilation of data can have similar effects as such direct discriminating features (Ali et al. 2019, 1–30; Buolamwini and Gebru 2018, 12). For example, the postal code of a person in many countries is highly correlated with ethnicity or social background, thus, if an algorithm “learns” that résumés from a certain area are usually “bad”, this indirectly leads to discrimination based on the social or ethical background (Calders and Žliobaitė 2013, 4–49).

Third, because of these effects caused by certain data compilations that are difficult to predict ex ante, especially if the algorithm is self-learning, it is also difficult to predict under which circumstances discrimination will be caused by which reason. This unpredictability makes it necessary to monitor and adjust such algorithms on a regular basis.

3 Legal Frame

So far, there is no coherent legal frame to tackle discrimination by recommender systems. Nevertheless, certain approaches can be derived from the existing legal frame: Existing solutions are either based on agreement (1.), information (2.), or a combination of both approaches. Finally, the general rules of anti-discrimination law apply (3.).

3.1 Agreement – Data Protection Law

The first approach that is based on user agreement can be found in data protection law, especially Article 22 para. 1 GDPR (2016, 1). According to that rule, “[t]he data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her.”

Recital 71 gives more specifications on the Article and makes clear that the controller of the algorithm should “prevent, inter alia, discriminatory effects on natural persons on the basis of racial or ethnic origin, political opinion, religion or beliefs, trade union membership, genetic or health status or sexual orientation, or processing that results in measures having such an effect.”

While this rule at first glance sounds like a clear prohibition to create recommendations based on exclusively algorithmic decisions, there are several problems in the application of the rule that make it questionable whether it is sufficient to resolve the problem. First, one can question in general whether data protection law is the proper venue to prevent discriminatory results. Data protection law primarily is intended to protect the personal data of natural persons and to give them control on how this data is used. It aims at the protection of the personality rights of such a person. Anti-discrimination law, on the other hand, tackles certain inequalities that exist in society, and protects the individual from discrimination – independently of the data used or concerned. While, of course, discrimination can also lead to the infringement of a personality right, the protective function is a different one.

Furthermore, literature disagrees under which circumstances there is a “decision” in the sense of Article 22 GDPR regarding recommender systems. While recital 71 clarifies that such a “decision” is the case when we have a refusal, e.g., “of an online credit application or a recruiting practice without any human intervention”, the case becomes less clear when the algorithm only proposes a certain job opportunity (or not) (Lambrecht and Tucker 2020, 2966–2981) or ranking that afterwards will be subjected to the decision of a person. While some voices regard such a preliminary recommendation as excluded from the cope of Article 22 DGPR (German government 2000, 37; EU Commission 2012, 26 et seq.; Martini 2019, 173; see also OLG Frankfurt/M. 2015, 137), others limit the notion of decision to the exclusion of a person (from e.g. a ranking).Footnote 5

Nevertheless, even if we apply Article 22 to all recommendations, a justification is possible if the controller uses the algorithm, inter alia, “to ensure the security and reliability of a service provided by the controller, or necessary for the entering or performance of a contract between the data subject and a controller, or when the data subject has given his or her explicit consent” (Recital 71, also Article 22 para. 2). Para. 3 then introduces some procedural safeguards for the protection of the personality rights of the person concerned. Nevertheless, the basic rule is that whenever the data subject has given the explicit consent for the processing of the data, the infringement within the meaning of Article 22 para. 1 is justified under Article 22 para. 2 GDPR (Vöneky 2020, 9–22, 13; Martini 2019, 171 ff.). This is problematic as research shows that the majority of internet users are willing to give their consent to proceed on a website without really dealing with the content of the agreement (Carolan 2016, 462–473; in detail see also Machuletz and Böhme 2020, 481–498). If an agreement is easily given without a conscious choice, Article 22 GDPR does not provide a very stable protection against discriminatory results.

3.2 Information – Unfair Competition Law

The second approach can be called an information-centered approach. The main measure consists in giving information to the user about the available ranking parameters and the reasons for the relative importance of certain parameters to others. We can see that approach on the Business-to-Business (B2B) level in Article 5 P2B Regulation (Regulation (EU) 2019/1150. 2019, 57 ff.) regarding online providers and businesses using their platforms. A similar rule has also been introduced into the UCP Directive (Directive 2005/29/EC 2005, 22) regarding the Business-to-Consumer (B2C) level in its 2019 amendment (Article 3 Nr. 4 lit. b) (Directive (EU) 2019/2161 2019, 7 ff.). Article 7 para. 4a of the UCP Directive provides that whenever a consumer can search for products offered by different traders or by general consumers “information […] on the main parameters determining the ranking of products presented to the consumer as a result of the search query and the relative importance of those parameters, as opposed to other parameters, shall be regarded as material,” meaning that this information has to be part of the general information obligations towards the consumer. The effectiveness of these measures to combat discriminatory recommendations is doubtful.

First, both rules only contain information obligations, meaning that the effectiveness mainly depends on the attention of the user and his or her willingness to read the information, understand what the “relative importance of certain parameters” means for his or her concrete use of the platform and act upon that knowledge. Even if a trader or intermediary indirectly gives the information that the recommendation can be discriminatory, in most cases the platform or search possibility will most probably still be used as the majority of users will not notice it (Martini 2019, 188; Bergram et al. 2020). Furthermore, the information necessary to understand the logic of a discriminatory recommender system might not be part of the information that is part of the information obligation. The limit will most probably lie behind the protection of trade secrets of the provider of the algorithm – including the algorithm or at least some of its features. So, discrimination caused by a certain algorithm model will probably stay undetected despite the information obligation.

3.3 General Anti-discrimination Law

Specific rules regarding recommender systems or algorithms do not seem sufficient to tackle discriminatory recommendations. Nevertheless, they are not exhaustive in that area – also the general anti-discrimination rules apply and might sufficiently prevent discriminatory recommendations.

These rules, usually, on the national or EU level, e.g., forbid an unjustified unequal treatment according to certain personal features such as gender, race, disability, sexual orientation, age, social origin, nationality, faith or political opinion (list not exhaustive, depending on country or entity) (TFEU (EU) 2007, Art. 19; CFR (EU) 2012, Art. 21; Fundamental Law (Ger) 1949, Art. 3 para. 3; AGG(Ger) 2006 sec. 1). While many important features, therefore, are included, there is no general prohibition to treat people differently, e.g., for the region they live in or the dialect they speak or the color of their hair (Martini 2019, 238; Wachter forthcoming). Of course, those features can accumulate to features protected by anti-discrimination law, e.g., the region and the dialect of a person can allow conclusions regarding the ethical or social background (see above, Sect. 2.4.). But the general rule remains that discrimination is allowed as long as an explicitly mentioned feature is not the reason.

Applying anti-discrimination rules to the relationship between the provider of a recommender system and a user raises some further issues. First, those anti-discrimination rules primarily were drafted to protect the citizens against the State. If a public agency, for instance, uses a recommender system as a recruiting tool, anti-discrimination law applies directly.Footnote 6 On the other hand, the effect of these rules in private legal relationships, where the majority of recommender systems is used, is less easy to establish and highly disputed (Knebel 2018, 33 ff.; Perner 2013, 143 ff.; Schaaf 2021, 249; Neuner 2020, 1851–1855). Additionally, recommender systems are often used without the conclusion of a contract, thus, they move in the pre-contractual area where the parties’ responsibility is traditionally harder to establish. Nevertheless, a tendency can be observed that the prohibition of discrimination slowly creeps into private relationships, especially contract law and employment law, at least in the EU (AGG (Ger) 2006, sec. 2, 7 para.  2, 21 para.  4; Hellgardt 2018, 901; Perner 2013, 145 ff.). Several EU anti-discrimination directives (Directive 2000/43/EC 2000, 22; Directive 2000/78/EC 2000, 16; Directive 2002/73/EC 2002, 15; Directive 2004/113/EC 2004, 37) as well as a constant flow of case law from the CJEU have enhanced this process and extended it to the pre-contractual level as well (CJEU 1976 Defrenne/SABENA, para 39; CJEU 2011 Test-Achats; Perner 2013, 157 ff.; Grüneberger and Reinelt 2020, 19 ff.). However, whether and to whom a provider of a recommender system is responsible if the recommender system is discriminatory, is unclear.Footnote 7

Furthermore, there is the problem of indirect discrimination. As mentioned above, it will be easy to detect discrimination if the modelling algorithm uses a forbidden differentiation criterion. Nevertheless, a combination of other, not directly forbidden criteria, can lead to the same result (Ali et al. 2019, 1–30; Buolamwini and Gebru 2018, 12). Recruiting tools, for example, have often regarded résumés with longer periods without gainful employment as a sign of a weaker working performance. However, these periods can also be caused by breaks such as parental leaves or additional care obligations, typically involving more women than men. Thus, differentiating regarding that criterion, in consequence, can lead to the discrimination of women.

In anti-discrimination law it has been recognized that indirect discrimination can be forbidden as well (see Sect. 3 para. 2 AGG). The difference can become relevant for the requirements for the justification of unequal treatment. Unequal treatment can be justified if there are equally weighing values or interests on the other side to makeup the differentiation. This leads to a balancing of interests and risks of the people involved. Usually, direct discrimination weights more heavily and is almost impossible to justify, compared to an indirect one is (von Ungern-Sternberg forthcoming). Of course, the result also depends on the area of life where the recommender system is used in. Thus, personalized ads are not as risky and relevant for the person involved as, for example, a job proposal or the exclusion of a job proposal.

Finally, the chain of responsibilities can be difficult. Often the recommender system is used by a platform but programmed by another business while the contract in question will be concluded between a user of the platform (e.g., an employer) and another user (e.g., the job seeker). Anti-discrimination law usually only has effects between the latter two, meaning that afterwards the possible employer must seek compensation from the platform provider who, in return, can seek compensation by the programmer. To ensure that the person of business finally responsible for the discriminatory algorithm is really forced to compensate the other parties, and, consequently, has an incentive to change the algorithm, is difficult in this way. Additionally, a justification might be possible if the functioning of the algorithm was not predictable to him or her as especially a self-learning algorithm is difficult to control regarding the data input and the improvement of the algorithm (black box problem).

3.4 Interim Conclusion

The legal frame only partly deals with discrimination by algorithms and is not sufficient to efficiently tackle it. Furthermore, the existing anti-discrimination law bears several uncertainties for all the parties involved.

4 Outlook

From these first conclusions, the next question is what should be done.

4.1 Extreme Solutions

One extreme possibility could be the prohibition to use machine learning algorithms in recommender systems at all. This would, of course, stop discriminations by recommendations, but also impede any progress regarding the use of machine learning algorithms or the development of recommender systems.

The other extreme solution could be a hands-off-approach and to leave it to the market powers to regulate the use of recommender systems. This approach also does not seem feasible as the past has shown that the mere play of market powers has been unable to prevent discrimination.Footnote 8

4.2 Further Development of the Information Approach

One possible solution between those two extreme positions could be a further development of the already existing information approach (Martini 2019, 187). Providers of recommender systems should also provide the necessary information for users to foresee and understand the risks of discrimination by a certain system in combination with an opt-out or opt-in possibility, meaning that they should not only have the choice to use the system or not, but also to use the system with the possible discrimination but also with alternatives. Furthermore, providers should be obliged to use a legal design that ensures that the people involved really read and understand the information (Martini 2019, 189; Kim and Routledge 2022, 97 ff.).Footnote 9

This approach has also been chosen by a recent EU regulation, the Digital Service Act (DSA 2022/2065 (EU)). Article 27 para. 1 DSA states an explicit obligation for recommender systems proved by “online platforms” (not including “micro and small enterprises”, Art. 19 DSA) to “set out in their terms and conditions, in plain and intelligible language, the main parameters used in their recommender systems, as well as any options for the recipients of the service to modify or influence those main parameters”. Furthermore, according to Article 38 DSA, providers of very large online platforms that use recommender systems “shall provide at least one option for each of their recommender systems which is not based on profiling”. Moreover, another proposed EU Act, the Artificial Intelligence Act (AIA 2021 (EU)), foresees that “AI” must be transparent and explainable for the user in Article 13 of the Commission Proposal (Kalbhenn 2021, 668).

This approach, in general, is a good step in the right direction. However, it has two flaws. First, Article 38 DSA only address “very large online platforms”, platforms with more than 45 million recipients each month and designated as such by the Commission (Article 33 para. 1, 4 DSA). Recommender systems can, nevertheless, also be used in certain niche areas and be of high importance for the live of the parties involved, e.g., in certain job branches where highly specific people are recruited or searched. The AIA does not have this restriction. Besides, Article 38 only provides an “opt-out”, meaning that users actively must choose not to use the proposed algorithm. The AIA does not provide any comparable consequences. Studies show that most users do not read the information but only continue to click to progress with the process they visited a certain platform for (Bergram et al. 2020; Martini 2019, 188). An opt-out possibility, therefore, is less efficient than an opt-in and nudges the users to just use what is already provided.

4.3 Monitoring and Audit Obligations

The DSA also provides another interesting feature to control very large online platforms by establishing an obligation for regular audits (Article 37 DSA) to ensure that certain standards are met (Kalbhenn 2021, 671). Unfortunately, the audit obligation does not include recommender systems and possible discriminatory outcomes as mentioned in Articles 27, 38 DSA. An audit obligation, however, could be extended to possible discriminations, especially in areas where such discrimination can have massive effects on the life of the person involved, e.g. in questions of employment or job evaluation (Buolamwini and Gebru 2018, 12).

Therefore, it is no coincidence that another proposal for an EU Act, the Artificial Intelligence Act (AIA), also establishes an audit obligation for AI that is used in “high risk” areas, referring to areas that bear a high risk for the involved subjects. Contrary to the DSA, it applies no matter how many users a platform or provider has.

A similar approach can also be seen in other countries: The “Automated Employment Decision Tools” Bill by New York City (Law No. 2021/144 (Int 1894–2020) (NYC)) only allows the use of algorithms in employment decisions if the algorithm is subjected to a yearly audit. The advantage of such an audit is that the algorithm can be analyzed by specialists and nudge businesses to improve them (Raji and Buolamwini 2019). On the other hand, businesses only have to hand over trade sensitive information to those auditors, thus, their trade secrets can be respected and protected as well.

4.4 Interim Conclusion and Thoughts

To conclude, both (proposed) approaches of DSA and AIA, information/transparency and a regular audit obligation, should be combined for the use of recommender systems, at least in highly risky/sensitive areas for the person involved. An information obligation together with an opt-in possibility (rather than the opt-out-option provided in the DSA) and not limited to “very large platforms” would be feasible in those areas. Furthermore, a regular audit should be obligatory to ensure that possible discriminations in recommender systems can be found by the auditors and countered by them or others.

5 Conclusions

  1. 1.

    Recommender Systems based on algorithms can cause discrimination.

  2. 2.

    The existing legal framework is not sufficient to combat those discriminations. It is limited to certain information obligations and general non-discrimination rules that cannot provide the necessary legal certainty.

  3. 3.

    Information about the consequences of using a certain recommender system should be available for the people involved and phrased in a way that the users can understand it. Also, similar to Articles 27 para. 1, 38 DSA, at least an “opt-out” possibility should be provided, even though an opt-in possibility would be preferable.

  4. 4.

    A regular audit should be required, at least in areas that are highly sensitive to discrimination. This audit would allow the analysis by experts to find the reasons for discriminatory recommendations without endangering the trade secrets of the provider of the algorithm.