1 Introduction

The impact of data on our society is getting higher and higher, with data about people being more and more often exploited as the basis to make decisions that might impact people’s lives: we are inside the data science era. Decisions based on data are taken, e.g., to determine whether to admit a student to a school, to hire an employee, to offer a loan to an applicant, and event to grant socially useful work to an arrested person. Consequently, the downsides and potential risks of a deep use of people-related data in decision processes are becoming more and more evident: the probability of inequities is likely to increase and be amplified by data-driven automated decision systems (ADS), relying on data to take guided decisions. Such systems should therefore evolve into data-informed ADS, that take humans in the loop in all the data processing steps, from acquisition to analysis (Stoyanovich et al. 2020). Indeed, to obtain insights, data from disparate sources first need to be organized in a clean unified dataset, then undergoing a data processing pipeline, i.e., a sequence of complex tasks usually involving, besides learning tasks, many data preparation steps like data cleaning, transformation, matching, and merging (Doan 2018).

Data-informed ADS should deal with data in a responsible way: besides ensuring transparency and interpretability, allowing people to understand the process and the decisions, they should guarantee nondiscrimination with respect to all the considered groups of individuals (Stoyanovich et al. 2020). More concretely, consider a population upon which a data processing (either operational or analytical) task is to be applied. Suppose that a subset of our population shares some characteristics that should not be employed for discrimination (e.g., race, gender, disability status). It is important to guarantee that the result of the processing task is not discriminating with respect to the considered sensitive attributes. This may include ensuring a fair probability of selection, not giving undue relevance to specific groups of individuals sharing these properties, or other related constraints.

Nondiscrimination can be characterized in terms of different properties. In social sciences, one key concept to guarantee nondiscrimination is equity, i.e., absence of bias against minorities. As stated by Jagadish et al. (2021), equity promotes fairness by treating people differently depending on their endowments and needs (equality of outcome), whereas equality aims to achieve fairness through equal treatment regardless of need (equality of opportunity). Besides fairness, other issues like diversity, i.e., the degree to which different kinds of objects are represented in a dataset, are raising interest from the scientific community, due to their relevance in society.

Fairness and diversity are not new concepts. The relevance of fairness is well recognized by the machine learning and data mining communities (Mehrabi et al. 2021). On the other hand, diversity is one of the main relevant concepts in recommender systems (Kaminskas and Bridge 2017). This last mile of data analysis, i.e., the decision-making components, is indeed the most visible part of data science. More recently, the importance of a lifecycle view of data science lead to realize that the achieved results are not enough (Asudeh 2021). As first pointed out in Abiteboul et al. (2016), algorithmic fairness has to be tackled by developing a holistic treatment of nondiscrimination, tailored to incrementally enforcing non-discriminating constraints along the pipeline at the basis of ADS, through individual independent choices, rather than as a constraint on the set of final results. Any automated task can indeed introduce technical bias by exacerbating pre-existing bias that may lead to inequity in society. This type of requirements is not only made desirable by the ethical need to take responsibility, but also mandatory by the recent General Data Protection Regulation (GDPR) of the European Union (Bonatti and Kirrane 2019). The GDPR imposes that this type of guarantees is provided by design, i.e., intrinsically embedded in the mechanisms of the data processing workflow.

Starting from those considerations, the aim of this paper is to critically survey how nondiscrimination can be modeled and how it can be guaranteed in the context of complex data science pipelines at the basis of data-informed ADS, thus complementing the already existing reviews on fairness in machine learning and AI, not further discussed in this paper, with a broader focus on all data processing tasks. In particular, this paper focuses on the following research questions:

  • RQ1: For which data processing tasks have nondiscrimination solutions been proposed?

  • RQ2: Which communities (defined in terms of author geographical location) have been most active in this field?

In the remainder of the paper, we first present the typical structure of data processing pipelines (Sect. 2) and we classify the main properties proposed for modeling nondiscrimination (Sect. 3). Then, we present the methodology used in our literature review and we briefly survey the main achievements for each data processing task (Sect. 4). A discussion on the review results concludes the paper (Sect. 5).

2 Data processing pipelines in data science

As pointed out in Jagadish et al. (2014), creating value from Big Data is a multi-step process: data acquisition, data cleaning, data integration, and analysis.Footnote 1 The steps of a typical data processing pipeline in ADS are graphically depicted in Fig. 1 and discussed in what follows (with reference example a college admission system). The figure also contains the term back-end, commonly used in the data warehousing context to refer to all the extraction and transformation processes data undergoes before feeding the centralized repository on which the front-end components, with which the decision-maker interacts, perform analyses.

Fig. 1
figure 1

Steps in a data processing pipeline

Data acquisition. Data at the basis of data science is a record of some underlying activity of interest. It can be gathered as any effect of any interaction with or observation of  the world around us, ranging from any application relying on an operational database, to logs of user-activity on a website or event-logs in a software, to physical sensors in Internet of Things systems. Much of these data can be filtered and aggregated without compromising our ability to reason about the underlying activity of interest. One challenge is to define these “on-line” filters in such a way they do not discard useful information. Effective data-driven decisions can be enabled by acquiring data from multiple heterogeneous data sources.

Data cleaning. Collected data can be structured, semi-structured, or unstructured, and rarely are in a format ready for analysis. Even limiting to structured data, most data sources are notoriously unreliable: data entry can be partial, sensors can be faulty, humans may provide biased opinions, remote websites might be stale, and so on. As a result, data may suffer of many data quality issues. In the context of a college admission system, for example, some grades or personal information can be missing, grades can refer to different scales or be out-of-scale; the same student can be modeled by distinct records. We cannot leave the data in this form and still effectively analyze it. Rather, data cleaning techniques, facing the possible sources of errors, are applied to the acquired data, removing or alleviating the data quality issues.

Data integration. Effective large-scale analysis often requires the collection of heterogeneous data from multiple sources. For example, in the context of a college admission system, to obtain the best-informed decisions, it would be useful to have a 360 view of applicants (including scores and assessments, but also, e.g., the essay from which we can derive information about interests) and information about the academic careers of previously enrolled students. The issue here is to resolve heterogeneities in data structure and semantics, obtaining a reconciled integrated dataset that is uniformly interpretable within a community, fitting its analytical needs. This is achieved through a number of transformations, which can be supported by integration tools.

Since the cost of full integration is often formidable and the analysis needs quickly shift, “pay-as-you-go” integration techniques (Madhavan et al. 2007) perform much of this work on-the-fly in support of ad hoc exploration. In this case, especially when the data sources are not databases with well-known schemas, the term data wrangling (Rattenbury et al. 2017) is used, referring to the activities for cleaning, integrating, and organizing data before it can be analyzed. The result of data wrangling can provide important metadata for further insights about the data.

Analysis. All the methods for querying and mining data, to extract valuable information for decision-making and strategical actionable knowledge, fall in front-end components. Here, we focus on analytical queries rather than on mining and learning algorithms. Specifically, we consider ranking and set selection techniques, as well as OLAP queries.

The aim of ranking is, given a set of items, to produce a rank for each item in the set. In score-based ranking, a given set of candidates is sorted on the score attribute, which may itself be computed on the fly, and returned in sorted order.Footnote 2 We typically return the best-ranked k candidates, the top-k. As an example, taken from Zehlike et al. (2021), consider an admissions officer at a university who selects candidates from a large applicant pool. Each applicant submits several quantitative scores, all of them can transformed to a discrete scale of 1 (worst) through 5 (best): the high school GPA (grade point average), the verbal portion of the SAT (Scholastic Assessment Test) score, and the mathematics portion of the SAT score. The score of each candidate can be obtained based on a formula that the admissions officer gives, and then return some number of highest-scoring applicants in ranked order. This scoring formula may, for example, specify the score as a linear combination of the applicant’s high-school GPA and the two components of the SAT score, each carrying an equal weight. The admissions officer will take the order in which the candidates appear in the ranking when deciding whom to consider more closely, interview, and admit.

The main goal of set selection is to select a specific number of items from a given set of items, according to some specific conditions. More precisely, assume that we have a set of items, each with associated attributes. From this set, we wish to select k items to maximize a utility score, defined in terms of the attributes. In the university admission example, we could select the k candidates with the highest score computed as a function of the candidates’ attribute (GPA and SAT scores). The items in the set may be presented to us together or one at a time. Set selection is thus a special case of ranking that ignores the relative order among the top-k, returning them as a set.

Online analytical processing (OLAP) is an essential task of decision-support systems. OLAP queries are queries defined against a data cube that introduce a multidimensional range (via specifying an interval for each dimension of the data cube) and a SQL aggregate operator and return as output the aggregate value computed over cells of the data cube contained in that range. With reference to the university admission domain, examples of OLAP queries are those returning the average GPA of candidates that completed a public high school in Italy in 2020, or the average GPA of candidates admitted to a certain university in 2021, per gender and major.

3 Fairness & friends

To ensure a responsible data processing, any automated task in a data processing pipeline should guarantee nondiscrimination with respect to minorities and protected groups of individuals, defined in terms of specific sensitive attributes. Nondiscrimination can be characterized in terms of different properties, briefly surveyed in what follows.

3.1 Fairness

Fairness is generally associated with the lack of discrimination; it can be broadly defined as the impartial treatment of individuals and of demographic groups. A multitude of interpretations and technical definitions have been provided, initially tailored to supervised learning tasks (Verma and Rubin 2018) and nowadays used for any processing activity. A first distinction is between individual fairness and group fairness. Individual fairness, first defined in Dwork et al. (2012), refers to the requirement that similar individuals, defined in terms of a given metric, are treated similarly; group fairness, also known as statistical parity or demographic parity, refers to the requirement that demographics of those receiving a particular positive outcome are identical to the demographics of the population as a whole (Dwork et al. 2012). Demographics are defined in terms of a given sensitive attribute, like gender or race, and statistical parity requires the distribution of values of the considered attribute in the result of the task be the same as its distribution in the input population. As an example, consider a set with 10 students (5 paying reduced fees and 5 paying standard fees) and suppose you should select 4 of them. The result of the selection will satisfy group fairness only if the distribution of the two groups of students in the result and in the population coincide, thus both groups will receive the same treatment. This means that 2 students paying reduced fees and 2 paying standard fees should be selected. Now suppose that, in the first group, there are 2 Italian and 3 non-Italian students paying reduced fees and, in the second group, there are 2 non-Italian and 3 Italian students paying standard fees. The result of the selection satisfies individual fairness if all individuals are treated similarly, thus, in this example, if the 2 selected students paying reduced fees are Italian and the 2 selected students paying standard fees are non-Italian, respectively.

Many variations of group fairness have been proposed, all together referred to as associational fairness (see, e.g., Salimi et al. 2019a) since they all focus on correlating the outcome of a classification task with the values for the considered sensitive attributes. Among them, we recall (Chouldechova 2017): conditional statistical parity, a variation of group fairness that checks for a set of admissible factors, e.g., we want our individuals to be from Italy (Corbett-Davies et al. 2017); equalized odds, which requires that both protected and privileged groups have the same false positive rate and the same false negative rate; predictive parity, requiring that both protected and unprotected groups have the same predicted positive value; intersectional fairness, introduced as a way for extending group fairness to subgroups, defined by considering the intersection of several demographical variables (e.g., gender, race, age).

Unfortunately, associational fairness constraints are context-specific and might fail to distinguish the right correlations between a sensitive attribute and the outcome of a given algorithm (Dwork et al. 2012). Additionally, they can be mutually exclusive, thus they lack universality; as an example, it has been shown that equalized odd and predictive parity are incompatible (Chouldechova 2017). This observation has motivated the definition or alternative causal fairness constraints, under which fairness holds when the sensitive attribute has no causal influence on the outcome of a given task, thus avoiding anomalous correlations. Even in this case, many variations have been proposed. For example, under counterfactual fairness, the sensitive attribute should not be the cause of the outcome at the individual level; it coincides with individual fairness only under some strong assumptions (Kusner et al. 2017) and cannot be estimated from data. To avoid this limitation, proxy fairness considers counterfactuals at the group level, but it does not capture group fairness as defined above (Gupta et al. 2018; Kilbertus et al. 2017). Other causal fairness notions rely on various properties of the causal graphs with the aim of avoiding specific paths from the sensitive attribute to the outcome (see, e.g., Nabi and Shpitser 2018) but often rely on very restrictive assumptions and are of limited use. In general, causal fairness constraints rely on background information regarding the underlying causal model, which might not be available in practice. An alternative causal fairness notion that does not need such knowledge, capture at the same time group-level fairness and can be easily tested on data, is interventional fairness (Salimi et al. 2019b): besides the sensitive attribute and the outcome variable, it relies on a set of admissible variables through which it is admitted for the protected attribute to influence the outcome.

There is currently no consensus as to which classes of fairness constraint, and which specific formulations, are appropriate for the various data processing tasks. Rather, a lot of ongoing work is devoted to understanding the relationships between the various definitions and their usage inside specific data processing tasks (see, e.g., Salimi et al. 2019a; Asudeh 2019; Zehlike et al. 2021).

3.2 Diversity

Diversity is a general term used to capture the quality of a collection of items with respect to the variety of its constituent elements. Like fairness, diversity is inherently a socio-technical concept that gives rise to a multitude of interpretations and has an important impact from an ethic-point of view; indeed, a lack of diversity can lead to exclusion. Diversity has been extensively considered in the contexts of information retrieval and content recommendation, with the aim of designing models and algorithms enforcing diversity in the output of an algorithmic task (see Kaminskas and Bridge 2017 for a survey). Only recently such property has been considered in the context of other processing tasks.

Given a set of N elements, obtained as output of a given processing task, a number k of elements to be selected, and a function quantifying diversity between elements, diversity aims at selecting the k elements out of the N that maximize such function. Diversity can be defined in terms of different types of measures to be computed over a set, usually obtained as result of a data processing task. According to Drosou et al. (2017), we can distinguish distance-based diversity, novelty-based diversity, and coverage-based diversity. Distance-based diversity relies on a pairwise distance or similarity measure between elements and the diversity of a set is the aggregate (usually the average or the minimum) distance value among its pairs (Agrawal et al. 2009). The problem here is how to identify the right distance function, which also has an impact on the performance of diversity computation. When the diversity function is a metric, efficient approximation algorithms exist for the diversification problem. Coverage-based diversity relies on the existence of a predefined number of aspects and measures the extent to which the elements of the set cover the considered aspects (Agrawal et al. 2009; Clarke et al. 2008). Novelty-based diversity aims at reducing redundancy and is defined with respect to the elements seen in the past. The elements of the result set are selected one at the time, maximizing the distance-based diversity between the new element and those selected in the past (Lathia et al. 2010). Popularity and serendipity are notions related to novelty: novel elements are defined as the most unpopular (and thus, probably not seen in the past) (Ziegler et al. 2005) and as the most unusual or surprising elements (Herlocker et al. 2004), respectively.

3.3 Fairness and diversity: two definitions, one objective

While fairness is generally linked to the notion of lack of bias, diversity refers to the degree to which different kinds of objects are represented in a dataset. Even if they allow the representation of different kinds of nondiscrimination constraints, sometimes fairness and diversity can lead to the generation of similar results. For example, statistical parity is a fairness objective, but it can also be interpreted according to diversity since, like diversity, it is stated as a property of the value distribution inside a collection of items. Fairness and diversity are however slightly different. Consider for example a population including 10% Italian and 90% non-Italian individuals. While group fairness will preserve the same rate in the output obtained by a processing task, diversity might require the same result to contain 50% Italian individuals and 50% non-Italian.

Coverage is another example of nondiscrimination constraint that recently received attention and that can be associated with both fairness and diversity (Asudeh et al. 2019b). It has been initially introduced in the context of diversity with the aim of re-balancing the distribution of categories in a task outcome. However, it is also related to the concept of intersectional fairness (Chouldechova 2017): coverage constraints can be used to avoid an under-representation of protected categories of interest in a dataset, defined in terms of one or many sensitive attributes, possibly introducing bias in following analyses, by specifying how many items of a given protected category should be available inside the result of a data processing step. Lack of coverage in a dataset opens the door to adversarial attacks (Biggio et al. 2013): poorly covered regions in the training dataset provide an adversary with opportunities to create examples that are misclassified by a trained model.

4 Contributions of data processing tasks to nondiscrimination

To investigate nondiscrimination issues in the context of the main data processing tasks, we conducted a literature review over Scopus, a widely used search engine for literature review. To make the search effective, we combined conditions on the paper titles, abstracts, references, publication years, and subject areas as follows (see Table 1 for the precise query specification):

  • Paper titles include one general keyword related to nondiscrimination and one more specific keyword related to ADS pipelines. Additionally, papers containing specific combinations of those words, relevant for the considered field, are included. This made the search more effective since most papers dealing with nondiscrimination in machine learning and AI do not satisfy such conditions.

  • Paper references include a fairness-related keyword: this helps in excluding papers using the specified keywords under different semantics, unrelated to nondiscrimination.

  • Papers have been published from 2016, year in which the issue of nondiscrimination in data management has been first identified (Abiteboul et al. 2016).

  • Only papers of the computer science area are considered since this is the reference field of our research. A single exception is related to papers appearing in the proceedings of the Int. Conf. on Information and Knowledge Management (CIKM), since it is a relevant computer science conference, classified in a different way in Scopus.

Table 1 The Scopus search query

This initial search returned 335 papers.Footnote 3 The papers have then been carefully inspected, looking at their title, abstract, and context, with the aim of excluding those that either: (1) include the search keywords under a meaning unrelated to nondiscrimination; (2) do not refer to the tasks introduced in Sect. 2; or (3) propose solutions based on supervised approaches, more tailored to the machine learning context. After this step, we obtained 61 papers, further classified into: (1) seminal papers, general architectures, and surveys (G); (2) papers related to data acquisition (A); (3) papers related to data cleaning, integration, and wrangling (W); (4) papers related to analytical queries (Q); (5) papers related to techniques, demos, and systems for the analysis of (portion of) analytical pipelines (P).

The number of papers for each considered group and publication year is presented in Table 2. Each group will be discussed in the following.

Table 2 Paper distribution with respect to the data processing task and the publication year

4.1 Seminal papers, general architectures, and surveys

Among the retrieved general papers, 6 out of 17 deal with ethic problems in data science at a very high level. The other 11 address this issue from a more technical point of view and are briefly discussed in what follows.

One of the first attempt to draw the attention of the data management community to the various facets of responsibility was a tutorial proposed at EDBT 2016 (Stoyanovich et al. 2016). Right after, in Stoyanovich et al. (2017), fairness (but also accountability and transparency) properties are advocated to be considered as database system issues, since bias may be introduced at any processing steps. The Fides platform was proposed with features to encourage (and, in some cases, enforce) best practices at all stages of the data science lifecycle. In the same year, the role of diversity on Big Data management ethics was discussed in Drosou et al. (2017).

In the next years, many special events of the main data management conferences were devoted to this issue (Stoyanovich et al. 2018a) and many further papers have been published (Stoyanovich 2019; Abiteboul and Stoyanovich 2019; Firmani et al. 2019a; Jagadish et al. 2021). The considered ethic-related properties can be interpreted as special social-minded dimensions for the more general data quality issue: this is the focus of Firmani et al. (2019b), Pitoura (2020).

A very recent survey on machine learning and data management approaches for measuring and mitigating bias in data-driven decision support systems is presented in Balayn et al. (2021).

4.2 Data acquisition

Fairness can be considered during data acquisition to guarantee to start the processing with a dataset that does not lead to bias. A specific data management approach is repairing, i.e., modifying, the input dataset so that the new dataset satisfies the considered fairness constraints and the distance between the two datasets is minimized.

Causal fairness, and specifically interventional fairness, has been considered for repairing datasets to be used by classifiers in Salimi et al. (2019a, 2020), Getoor (2020). The repaired training dataset can be seen as a sample from a hypothetical fair world in which the effect of any discriminatory causal relationship between the sensitive attribute and the classifier outcome is removed.

Data repair solutions based on coverage constraints have been first introduced in Asudeh et al. (2019b). Specifically, efficient techniques for determining the least amount of additional data to be collected for guaranteeing coverage with respect to multiple sensitive attributes are proposed. An efficient approach for coverage analysis, given a set of attributes across multiple tables, is presented in Lin et al. (2020). The previous proposals are limited to categorical attributes with low-cardinality. In Asudeh et al. (2021), the coverage-based data repair problem is addressed by considering ordinal and continuous-valued attributes.

An alternative approach to detect and correct biases and discrimination in datasets exploits the notion of functional dependency, a particular type of constraint on the data, to recognize cases where the value of a certain attribute (e.g., gender, ethnicity or religion) frequently determines the value of another one (such as the range of the proposed salary or the social state) (Azzalini et al. 2021a, b).

4.3 Data cleaning, integration, and wrangling

Fairness-enhancing data cleaning interventions have been considered in Tae et al. (2019), that mitigate unfairness during data sanitization, considering demographic parity as the reference nondiscrimination constraint.

Fairness has been considered in data wrangling, in the context of an approach for the automatic identification of ways for integrating the data, in Mazilu et al. (2020, 2021). They consider two potential sources of dataset bias: those arising from unequal representation of sensitive groups and those arising from hidden biases through proxies for sensitive attributes. Both proposals analyze problems that may arise during data wrangling and lead to bias in downstream analyses and propose an approach to respond to them in a system automating the generation of data wrangling pipelines. Discriminatory bias has been considered in Yan and Howe (2021), where a learning approach to generate integrated representations (EquiTensors) of heterogeneous datasets is proposed and adversarial learning is used to remove correlations with a sensitive attribute. The impact of widely adopted data preparation procedures and of the sensitive attribute usage on the fairness of machine learning approaches is further considered in Valentim et al. (2019).

Coverage-based data transformations are considered in Accinelli et al. (2020, 2021b). In this case, the focus is on back-end transformations defined in terms of a Select-Project-Join query, whose result violates coverage constraints. In this case, the transformation is rewritten into the “closest” one satisfying those constraints. Coverage is also considered in Nargesian et al. (2021), investigating how to acquire, in the most cost-effective manner, new data for integration when the desired distribution requirements are not satisfied by the dataset at hand.

4.4 Analytical queries

Rankings are at the basis of many important decision processes and have a potentially enormous impact on the livelihood and well-being of individuals. Thus, most of the proposed analytical querying approaches taking nondiscrimination into account are ranking approaches. They have been recently surveyed in Pitoura et al. (2021a) and tutorials have been proposed in Pitoura et al. (2020, 2021b), demonstrating the research activeness in the area. Non-discriminatory ranking approaches address many issues: (1) the design of ranking schemes (Yang and Stoyanovich 2017; Asudeh et al. 2019a; Yang et al. 2020; Kuhlman et al. 2019, 2021; Garcı́a-Soriano and Bonchi, 2021); (2) the design of ranking schemes for specific domains (e.g., online job marketplaces) (Elbassuoni et al. 2019); (3) the design of approaches for intervening on the ranked outcome (Celis et al. 2018; Yang et al. 2019). Most non-discriminating ranking approaches consider group and associational fairness. Recently, there was an interest on causally fair ranking schemes (Yang et al. 2020) and coverage-based diversity (Celis et al. 2018; Yang et al. 2019).

Ethic-based set selection guarantees that the selected set satisfies specific nondiscrimination constraints. Specifically, coverage-based diversity and group fairness constraints have been considered in Stoyanovich et al. (2018b) whereas (Moumoulidou et al. 2021) focuses on maximizing diversity in set selection, while offering fairness guarantees.

In the context of OLAP queries, causal fairness has been considered for detecting bias in OLAP queries and limiting it through rewriting (Salimi et al. 2018b). Vázquez-Ingelmo et al. (2020) focus on the role of visual tools in assisting decision-making processes and raising awareness regarding potential data issues.

4.5 Pipelines and systems

One important issue in ethic-based data processing concerns the effective and efficient use of existing ethic-based approaches inside complex data processing pipelines, e.g., those provided by data processing environments like Pandas, scikit-learn, and Tableau.

In this respect, a framework for evaluating different types of fairness guarantees for pipelines is proposed in Dwork et al. (2020) while in Biswas and Rajan (2021), the impact of fairness on pre-processing stages in ML pipelines and, through composition, on the global fairness of the pipelines is investigated.

Many systems have also been developed for detecting nondiscrimination along the data processing pipeline. Among them, the open-source Python toolkit for algorithmic fairness, AI Fairness 360 (Bellamy et al. 2019), and FairTest (Tramer et al. 2017) support the user in checking algorithmic fairness and associations between application outcomes (such as prices or premiums) and sensitive user attributes (such as race or gender) with a debugging focus. Other systems refer to specific data processing tasks, prototyping many techniques discussed in the previous sections:

  • Data acquisition: MithraLabel, providing a user with information, in the form of “nutritional labels”, helping in determining the fitness of the dataset for the task at hand (Sun et al. 2019); MithraCoverage, investigating population bias in terms of coverage over the intersection of multiple attributes (Jin et al. 2020).

  • Data cleaning, integration, and wrangling: FairPrep (Schelter et al. 2020), an environment for investigating the impact of fairness-enhancing interventions inside data processing pipelines, with a special reference to data cleaning; covRew (Accinelli et al. 2021a), a Python toolkit for pre-processing pipeline rewriting ensuring coverage constraint satisfaction.

  • Analytical queries: HypDB, detecting, explaining, and resolving bias in decision-support queries (Salimi et al. 2018a); FairSight, a visual analytic system designed to achieve different notions of fairness in ranking decisions (Ahn and Lin 2019); FairRank, an interactive system to explore fairness of ranking in online job marketplaces (Ghizzawi et al. 2019); MithraRanking, a system for interactive ranking design, analysis, and repair (Guan et al. 2019).

5 Discussion

To answer the research questions RQ1 and RQ2, pointed out in Sect. 1, we grouped the 61 considered papers with respect to the publication year (Table 2) and the geographical locations of the authors (Table 3).

Table 3 Paper distribution with respect to the author geographical location and data processing task

Table 2 shows that in the first years of the considered period (up to 2019), most contributions refer to either general papers (seminal papers, papers describing general architectures, and surveys), due to the need of positioning the research area inside the data management community, or analytical queries, due to their relationship with tasks already investigated in other areas (e.g., recommender systems). More recently, the number of papers proposing specific discrimination-aware technical solutions, related to single data processing tasks or the whole pipeline, has increased.

The community analysis (RQ2) results in two main findings. First, only few research communities are currently active in the considered research area. Indeed, as shown in Table 3, only 6 countries, in the considered period, have contributed to publishing at least 3 papers. Second, most of such communities are in the United States. As a consequence, the developed solutions often rely on specific US-tailored laws and case studies (e.g., employment, specific steps of the US judicial system); less approaches have been designed starting from European case studies even if the EU GDPR now calls for such kind of proposals (Bonatti and Kirrane 2019).

In this respect, our group at the University of Genoa has recently started a project aiming at proposing responsible data processing pipelines, with a special reference to data wrangling, in the higher education domain, also relying on data generated from online learning activities. The education context is just one possible example but, to design effective responsible ADS approaches, additional reference domains and real-world scenarios are needed: we hope that further data management communities, with a special reference to Europe, will invest resources in this relevant field soon.