1 Introduction

Artificial Intelligence (AI) applications in the higher education sector have been acknowledged to have great and diverse potential (Attaran et al., 2018; Daniel, 2015). For example, there are already AI applications that assign grades (Kotsiantis, 2012), predict dropouts or academic performance (Alyahyan & Düştegör, 2020; Alturki et al., 2022; Berens et al., 2019; Kemper et al., 2020; Armatas et al., 2022), or chatbots that can answer various questions from students (Nguyen et al., 2021; Pelletier et al., 2021; Vieira et al., 2022). Developers and users hope that Artificial Intelligence in Education (AIEd) will lead to a higher number of graduates and improved student performance, for example, by offering students better individual feedback and support to reduce dropout rates (Attaran et al., 2018; Daniel, 2015; Zawacki-Richter et al., 2019). However, debates about the social justice of corresponding systems repeatedly come into focus (Baker & Hawn, 2022; Fazelpour & Danks, 2021; Hsu et al., 2021; Keller et al., 2022; Marcinkowski et al., 2020; Muñoz et al., 2016; Slade & Prinsloo, 2013). An illustrative example is provided by an automated study place allocation system from France called ParcourSup (Wenzelburger & Hartmann, 2022). The system was quickly criticized because, in addition to a lack of transparency, it was feared that individual student groups would be discriminated against in the allocation of study places, thus threatening to exacerbate existing social inequalities (Orwat, 2020). Further concerns have been voiced in instances where algorithms predict the likelihood of graduation even before students start studying. It is feared that such a system could lead to discrimination against students from lower-income backgrounds (Muñoz et al., 2016). Similar applications that predict academic performance based on past performance, among other factors, also point to the danger of different forms of bias, for example, regarding the input data (Fazelpour & Danks, 2021).

Accordingly, questions of algorithmic fairness are becoming virulent not only in the higher education sector but also in numerous other areas, such as criminal justice or hiring processes (Angwin et al., 2016; Kaibel et al., 2019). Consequently, the state of research in fair machine learning (FAIR ML) offers numerous mathematical fairness notions that enable equitable distribution (Verma & Rubin, 2018; Makhlouf et al., 2021; Friedler et al., 2021). However, there has yet to be a consensus on which of these notions seems most appropriate in preventing systematic discrimination. Instead, the choice of an adequate notion of fairness is highly context-dependent, and as a result, the perceptions of fairness may vary (Lee et al., 2019; Starke et al., 2022; Wong, 2020). While the various fairness notions tend to be developed at the theoretical level, only scattered evidence on stakeholders’ perceptions of the same is available (Saxena et al., 2020; Srivastava et al., 2019). Thereby, it is shown that the fairness perception of an algorithm has a significant impact on the trust in and acceptance of the AI system in question as well as the legitimacy of the ensuing decision-making (Lünich & Kieslich, 2022; Shin, 2020, 2021; Shin et al., 2020; Sun & Tang, 2021). Against this background, it is necessary to include the fairness perceptions of important stakeholders in the development of an AI system (Cheng et al., 2021; Keller et al., 2022; Webb et al., 2018) in order to achieve trustworthy AI (European Commission, 2019; Mäntymäki et al., 2022; Shneiderman, 2020).

To answer this call for research on fairness perceptions of those most affected by AIEd, in this paper, we investigate students’ preferences for different understandings of justice in the context of the distribution of support measures based on an academic performance prediction (APP) system. However, we do not focus on specific mathematical fairness notions but on students’ underlying justice principles concerning the AI-based distribution of support measures. We draw on the four-dimensional concept of organizational justice (Cropanzano & Ambrose, 2015; Greenberg, 1993) and focus primarily on the dimension of distributive fairness. Based on a factorial survey of German students, we provide an explanatory analysis for the three following questions: First, we investigate how students perceive different norms for just distribution as fair when evaluating the APP system. Furthermore, we investigate whether the fairness perceptions of AI-based APP systems (i.e., so-called algorithmic decision-making, ADM) differ from those of human decision-making (HDM). Third, we look at the different fields of study and investigate whether the choice of field of study impacts the fairness perception of the APP system. In answering these questions, this paper adds to the literature on AIEd and points to relevant concerns of students, as the arguably most important stakeholders in this field, concerning distributive justice, connecting to the literature on FAIR ML.

2 A Question of Justice: Perils and Pitfalls of Academic Performance Prediction

Concerning AIEd, a systematic review by Zawacki-Richter et al. (2019) shows four application areas: “profiling and prediction, intelligent tutoring systems, assessment and evaluation, and adaptive systems and personalisation” (p. 20). A so-called APP system is an example that falls within the first and third category. As the name suggests, the system’s function is to use predictive analytics to forecast students’ performance or, in some cases, to prognosticate whether they will complete their studies (so-called dropout detection). APP, like other systems of AIEd, is based on historical student performance data, sometimes supplemented in part by non-academic data (e.g., sociodemographics) (Abu Saa et al., 2019; Olaya et al., 2020). Universities, as the systems’ users, hope to increase retention and graduation rates based on the predictions (Attaran et al., 2018; Mah, 2016). APP is already being used at various higher education institutions (HEI) worldwide, for example, in the USA (Arnold & Pistilli, 2012; Ornelas & Ordonez, 2017), Australia (Adams et al., 2017; Niall & Mullan, 2017), Germany (Berens et al., 2019; Kemper et al., 2020), or Bangladesh (Ahmed & Khan, 2019). Based on the algorithmic predictions, HEIs may offer individual feedback and support measures that can be tailored explicitly to the needs of the students and automatically assigned to them with the help of APP (Ekowo et al., 2016; Muñoz et al., 2016; Pistilli & Arnold, 2010).

Universities use APP to allocate limited student support resources faster and more equitably and, as a result, increase student success and receive higher graduation rates. Such support resources could be academic and non-academic interventions (Keller et al., 2022), as Tinto (1975) distinguishes two types of integration that are important goals to achieve student retention: namely, academic integration and social integration. Interventions for academic integration are used to improve the student’s academic performance. Such improvement can be achieved, for example, through measures such as tutorials, advising sessions for study planning, or student coaching (Bettinger & Baker, 2014; Chiteng Kot, 2014) but also through individually tailored feedback (Brade et al., 2018; Tinto, 1975), which could be greatly facilitated and improved by an APP system. While these support interventions come from above, others come from below (Morosanu et al., 2010), such as those for social integration. These latter interventions build a social support network and can be fostered especially through contact with experienced teachers and other students, for instance, through mentoring programs. In addition, however, the sense of belonging to the institution–that is, the university–also plays a decisive role (Hausmann et al., 2009; Neugebauer et al., 2019; Tinto, 1975). Assuming limited organizational resources (financial and human) that a university can muster to support students, these interventions often cannot be made available to all students. Instead, they can only be deployed efficiently, for instance, if they are utilized (only) by distressed students in need or–should a university pursue a strategy of promoting only the best students–by promising and already successful students.

According to the 2021 EDUCAUSE Horizon Report, the COVID-19 pandemic has further accelerated the development and use of hybrid learning models and AIEd (Pelletier et al., 2021). However, the authors further state that “the arrival of AI in higher education has opened a Pandora’s box. Going forward, higher education will need to be a careful and ethical user of AI” (Pelletier et al., 2021, 15). In this sense, APP’s implementation must be thoroughly reflected and systematically evaluated. Almost all commonly discussed issues that come along with AI, in general, can also be applied to AIEd and the APP context, respectively. These concerns range from technological over methodological to societal challenges (Hagendorff & Wezel, 2020; Wirtz et al., 2019) and include, for example, questions of transparency (de Fine Licht, 2020; Ananny & Crawford, 2018), accountability (Busuioc, 2021; Diakopoulos, 2016), and fairness (Veale & Binns, 2017; Shin, 2019). Because the use of AIEd can significantly impact the prospects of young adults, it is crucial to weigh the opportunities and risks of any proposed system carefully.

For instance, an automated assessment system in the UK offers a negative example of problematic AI systems in the educational sector. Since no A-level examinations could be conducted during the first lockdown of the COVID-19 pandemic in 2020, an algorithm was implemented to predict the A-level grades of students in the UK. However, this led to massive protests because students claimed the prediction fell far from their expectations and actual performance as they were downgraded (Edwards, 2021). The procedure was particularly problematic from a fairness perspective, as the algorithm disadvantaged students from more deprived families compared to those from private schools, and the grades awarded impact further education and working life (Adams & McIntyre, 2020; Smith, 2020). While the British algorithmic grading system was abandoned after the protests, a similar algorithm in France, which also predicted baccalaureate grades, was accepted. However, it resulted in higher pass rates than in prior years, leading to higher demand for university places (Smith, 2020).

These examples show that without reflection and adjustment, APP systems may perpetuate social injustices and biases inherent to human social structures (Attaran et al., 2018; Fazelpour & Danks, 2021). Thus, when it comes to the use of APP, it is essential to ensure not only that the system provides an appropriate level of transparency and explainability (of both process and outcome) but that it is clear who can be held accountable in the event of poor decisions, or that privacy concerns are addressed, but also that APP is free of bias and that no one is systematically disadvantaged (Keller et al., 2022). Faced with the threat of discrimination by AIEd, as evidenced by the UK grading algorithm, much of the academic literature addresses the issue of fair distribution, the translation of fairness into mathematical notions, and the reduction of algorithmic bias in AIEd (Jiang & Pardos, 2021; Martinez Neda et al., 2021). Beyond this, however, special attention must be paid to the perceptions of those affected. Therefore, many authors stress the importance of stakeholder involvement in developing and introducing AI systems (Cheng et al., 2021; Howell et al., 2018; Keller et al., 2022; Knox et al., 2022; Lee et al., 2019; Webb et al., 2018). Otherwise, an APP system introduced with initially good intentions and presumably bias-mitigating execution may still fail in the face of reality.

Consequently, in addition to the properties of the system itself, such as the use of sensitive attributes (e.g., sociodemographics) (Nyarko et al., 2021), it can be assumed that the consequences that follow from APP also influence people’s perceptions of fairness. Such individual perceptions, however, are often neglected in the literature (Starke et al., 2022). So, what does the cognitive process entail in concrete terms for the students affected? In addition to their self-reflective fairness assessment of the APP assigned to them, students’ perceptions may also relate to the resulting distribution of support measures based on the forecast. For example, students may perceive it as unfair if those measures are not open to everyone in principle. The prediction could then become problematic if it leads to unintended adverse perception effects, which affect students’ attitudes and behavior; for instance, should non-prioritized students also demand access to support measures. Consequently, a positive and a negative performance prediction could hurt study motivation (Fazelpour & Danks, 2021; Legault et al., 2006; Mah, 2016) and potentially put students under additional pressure (Yilmaz et al., 2020). While high-performing students might rest on the positive feedback, lower-performing students might become frustrated given their previous study efforts. Furthermore, students might see the prediction as an incentive to adapt to the algorithm’s logic and change their behavior accordingly (Dai et al., 2021; Fazelpour & Danks, 2021), which may also lead to positive and negative effects. Eventually, negative perceptions of prediction outcomes of APP and decisions based on such predictions may have detrimental effects on students’ attitudes towards HEIs deploying AIEd and the perceived legitimacy of AI systems (Lünich & Kieslich, 2022; Marcinkowski et al., 2020). Therefore, the question of a fair distribution of student support measures based on the APP system arises.

3 Justice, Fairness and the Distribution of Support Measures

Rawls (1999) lays the foundation of the question of fair distribution by describing fairness as the basis of justice. In his understanding, negotiating principles of justice requires a fair societal system that allows cooperation between equal and free individuals (Rawls, 1999; Sen, 2009). Murphy (2011) also highlights the link between both constructs by describing: “If proposed principles are deemed unfair, they are unjust” (S. 337). Accordingly, fairness judgments arise from a perceived injustice, whereby a person must be held accountable for that injustice (Folger & Cropanzano, 2001). A perceived (in-)justice usually refers to the distribution of goods but can also include the distribution of rights and freedoms (Rawls, 1999). While the distribution of support measures, in our case, refers to goods, denied access could also be perceived as a restriction on student freedom. As HEIs may see APP as a basis for decision-making on how to support and enable students to improve their academic performance, the ensuing distribution of said support measures poses a distribution issue, if only because it must be partly about the distribution of limited goods and resources. For example, as mentioned before, due to limited resources, offering tutoring and additional personal counseling may not be available to all students.

The perception of an equitable distribution of goods is also addressed by the organizational justice literature, which examines how intra-organizational decision processes should be designed to achieve a maximum degree of satisfaction and commitment of organizational members (Greenberg, 1987), which in turn is necessary to prevent students from protesting or, at worst, leaving the university as a result of using or being dissatisfied with APP (Marcinkowski et al., 2020). Concerning the concept of organizational justice four dimensions of fairness have emerged: In addition to distributive fairness, there is procedural, interactional, and informational fairness (Greenberg, 1993). In this paper, however, we focus exclusively on distributive fairness to capture the essence of the distributive justice norms of a fair distribution of support measures.

3.1 Distributive Justice Norms

In light of the question of how different norms of just distribution are perceived by students when evaluating the APP system, we focus on the dimension of distributive fairness that asks for the validity of a decision, respectively an outcome (Cropanzano et al., 2001). This dimension is based on equity theory (Adams, 1965). According to equity theory, employees compare the ratio of their individually received outcome and the individually perceived input with the corresponding ratio of other employees. If this comparison of the two ratios does not match, the employee with the greater share is considered to be unjustly overpaid. In comparison, the employee with a lower share is considered to be unjustly underpaid (Greenberg, 1990). According to this, equity is oriented toward merit and is based on comparing the input produced by oneself with the output received (Cropanzano & Ambrose, 2015). However, such a meritocratic distribution is only one of three different distributive justice norms besides equality and needDeutsch (1975). In the case of APP, a distribution according to the equity principle could mean that only students who have already performed well so far are entitled to the support measures presented. The need principle offers an alternative option. This norm stipulates that only those students in particular need are provided with support interventions. Accordingly, only students with poor APP outcomes could benefit from support measures. Furthermore, a distribution according to the equality principle is also conceivable. In this case, no distinction is made, and all students receive access to support measures, which could reach its limits in everyday life because of the institution’s limited resources. Of course, when it comes to applied distribution rules, these ideal-typical distribution norms may come in any combination and with different gradations and overlaps.

3.2 Fairness Perceptions of Distributive Justice Norms

Empirical research gives evidence that the choice of the right (i.e., publicly approved) distributive justice norm is highly context-dependent (Lee et al., 2017; Starke et al., 2022; Wong, 2020) and can also vary among different stakeholders (Cheng et al., 2021; Lee et al., 2019). To date, however, the literature on FAIR ML focuses mainly on the formulation of mathematical fairness notions (Dwork et al., 2012; Kusner et al., 2017; Verma & Rubin, 2018), their evaluation by those affected (Cheng et al., 2021; Debjani et al., 2020; Srivastava et al., 2019; Zhou et al., 2021) or takes distributive fairness in general into consideration (Hannan et al., 2021; Marcinkowski et al., 2020; Wang et al., 2020). However, the underlying norms of just distribution and respective preferences for distinct norms should be addressed.

Nevertheless, some authors conclude that an egalitarian distribution, as suggested by the equality principle, is preferred in many contexts (Cropanzano & Ambrose, 2015; van Hootegem et al., 2020). Such a preference seems to be especially the case when stakeholders see themselves as part of a (small) group, and resources must be shared collectively (Allison et al., 1992), as empathy is also a predictor of a preference for equality as a distributive justice norm (Huppert et al., 2019). This preference is explained with the help of decision heuristics, which are based on the assumption that all parties concerned unspokenly accept the most simple solution of an equal distribution as the correct decision since they assume that this represents a satisfactory outcome for all (Allison & Messick, 1990; Allison et al., 1992). Therefore, this type of heuristic is also called equality heuristic or inequality aversion and is shown to come into play in different contexts (Asaria et al., 2023; Morand et al., 2020).

Concerning ADM systems, Lee and colleagues (Lee & Baykal, 2017), for example, show within qualitative interviews that some stakeholders prefer altruistic distributions and are even willing to compromise and forgo unequally distributed allocations. Some of them reasoned for this preference by prioritizing the group’s happiness. While the authors substantiated this result in a second study, an international comparison between India and America also showed cultural differences in evaluating distributive justice norms. In India, need was preferred, while for Americans, the equality or equity norm was more in favor (Lee et al., 2017). In the case of a healthcare scheduling system, employees primarily preferred an equal distribution norm, although consideration of individual needs received equal attention (Uhde et al., 2020). In a similar use case of a denied vacation request, however, Schlicker et al. (2021) do not find any evidence for the assumption that equality explanations lead to a higher distributive fairness perception than equity explanations. Saxena et al. (2020), on the other hand, come to a different conclusion when asking for a preferred loan decision. Their results indicate that respondents favor a proportional ‘ratio’ distribution compared to meritocratic or equality norms. Kasinidou et al. (2021) confirm this insight by surveying computer science students who prefer a proportional distribution to an equality norm. However, due to the contextual nature of fairness perceptions and the inconsistent empirical insights, the potential perceptions of students toward a distribution of support interventions based on an APP system can only be inferred with qualification based on these findings. Therefore, evaluating different fairness norms in the specific context of APP and the distribution of student support measures is crucial. As a result, in RQ1, we ask: How fair are different norms for fair distribution perceived by students when evaluating an APP system?

3.3 Algorithmic vs. Human—Based Distribution of Support Measures

The main driver of the implementation of AI technologies, especially when it comes to AI in public administration, are expectations for AI to lead to cheaper, faster, more objective, and reliable results in decision-making compared to humans (König & Wenzelburger, 2022; Wirtz & Müller, 2019; Gomes de Sousa et al., 2019). While the superiority of temporary AI applications may often be dubious, even if AI proves to make better decisions, this must not necessarily lead to positive evaluations by the persons affected. In analogy to the potential difference between notions of factual and perceived fairness, preferences regarding the nature of the decision-makers may not solely be based on the quality or goodness of the outcome. Two contrasting strands of literature address whether people prefer AI systems over humans regarding decision-making. On the one hand, there is algorithm aversion (Dietvorst et al. 2015; Dietvorst & Bharti 2020; for an overview, see Burton et al.2020), and on the other hand, there is algorithmic appreciation (Araujo et al., 2020; Logg et al., 2019; Thurman et al., 2019). Here, the former suggests a preference for HDM over ADM. Such preference is especially the case when errors are observed, as people lose confidence in algorithms more quickly than in other humans (Dietvorst et al., 2015). This rejection is particularly evident in the case of unpredictable and risky decisions (Dietvorst & Bharti, 2020). The latter, algorithmic appreciation, however, sees ADM as superior and therefore assumes that people are more likely to follow the recommendation of algorithms than that of humans (Logg et al., 2019), based on the assumption that algorithms are less biased and decide more objectively (Araujo et al., 2020).

Closely connected to aversion or appreciation of ADM is whether AIEd is wanted by students, administrators, and other educational sector stakeholders at all. Again, several studies indicate that the answer to this question can vary in different contexts and regarding the stakeholders involved (Starke et al., 2022; Wong, 2020; Marcinkowski et al., 2020). Empirical research, on the one hand, gives evidence that ADM systems are perceived as fairer than HDM, for instance, in logistics, even if those results do not explicitly focus on the dimension of distributive fairness (Bai et al., 2022). Nevertheless, higher distributive fairness is also attested to ADM in university admission processes (Marcinkowski et al., 2020) and without reference to a specific use case (Helberger et al., 2020). On the other hand, there are some use cases where HDM systems are attested to have higher distributive fairness than ADM, such as resource allocation in social division tasks (Lee & Baykal, 2017) or criminal justice (Harrison et al., 2020). Another example is the decision on skin cancer detection, where the fairness perception of ADM is also lower than those in HDM, which is also reflected in the level of trust towards the decision. Nevertheless, this is only true for people with low mistrust in human systems, whereas a high mistrust leads to equally (un-)fair evaluations of ADM and HDM (Lee & Rich, 2021). In this context, the fairness perception of ADM and HDM also depends on the nature of the decision. For example, it plays a role in whether mechanical or human-attributable skills are required (Lee, 2018). Furthermore, whether the decision’s impact is high or low is equally important (Araujo et al., 2020).

However, there are also reports of ambivalent results, where no preference is given to the decision taken by an algorithmic system or a human decision-maker, for example, in the already mentioned denial of a vacation request (Schlicker et al., 2021) or the allocation process of COVID-19 vaccines (Lünich & Kieslich, 2022). All of these inconsistent findings underscore the importance of reflecting on and examining ADM fairness perceptions on a case-by-case basis, not just using ADM systems because the data are accessible and it is technically possible (boyd, danah, and Kate Crawford., 2012). In line with the ideas of human-in-the-loop (Holzinger, 2016; Starke & Lünich, 2020), the decision to use or not to use ADM systems offers a range of possible hybrid courses of action, for example, introducing cooperation between humans and algorithms. Thus, appropriate cooperation may also lead to higher perceived fairness, although it is still preferred that humans retain control over the decision (de Cremer & McGuire, 2022). Therefore, our RQ2 asks: To what extent do perceptions of fairness APP systems using ADM differ from those using HDM?

3.4 Differences in Perceived Fairness Based on Students’ Study Subject

Focusing on the specific context of AIEd, we also need to consider the varying perceptions of computer applications within different study fields and thus ask whether the study subject of students impacts their fairness perception of the APP system. Even though it can be assumed that today’s students all have grown up with computers and are familiar with their use, there are different digital literacy levels between study subjects, which may shape perceptions of computer applications (Gibson & Mourad, 2018). As Lee and Baykal (2017) show, fairness perceptions of ADM systems vary depending on the respondents’ computer programming knowledge. Contrary to their hypothesis, their findings indicate that individuals with lower programming skills perceive ADM systems as fairer than respondents with higher knowledge. The authors assume that this evaluation corresponds with a feeling of loss of control over the ADM as students with more pronounced programming skills presumably know more about the algorithm’s limitations. Logg et al. (2019) come to a similar conclusion, as their results indicate that experts relied less on algorithmic advice than laypeople. Jang et al. (2022) also point out that “students who had prior experience with AI education had a more sensitive attitude toward the fairness of AI compared to students without experience” (p. 20). In addition, and as mentioned before, Kasinidou et al. (2021) point to differences in the evaluation of different distributional norms, with computer science students preferring a ratio distribution to an equal distribution. However, they do not examine differences between different fields of study. Instead, they find marginal differences between undergraduate and postgraduate students, with the former viewing a given decision on loan allocation as more unfair than the latter when sociodemographic factors are considered.

To our knowledge, however, there is no evidence yet on the fairness perceptions of different fields of study towards AIEd and APP, respectively. Against this background, when it comes to APP, it seems to be fruitful to take a closer look at the comparison of students from so-called STEM (science, technology, engineering, and mathematics) and SHAPE (social sciences, humanities, the arts for people and the economy) fields of study (Jones et al., 2020). On the one hand, we reason that STEM students are more likely to be familiar with the mathematical and engineering foundations of computer systems, algorithms, and AI techniques such as machine learning, which may shape their perspective on APP. On the other hand, we assume that SHAPE students have a greater familiarity and sensitivity with social questions of human nature and society that may reflect in their perceptions of a sociotechnical system such as APP.

That this distinction between study subjects may have an important influence on fairness perceptions is supported by evidence concerning additional differences in traits between the respective student groups. Such factors and their consequences for individual approaches to social questions in the application of data and prediction systems may influence distinct evaluations of data-driven prediction systems. For instance, according to the literature, personality traits and cognitive skills predict study choice. While the former mainly refers to the Big Five personality traits (eg., Humburg 2017; Vedel 2016), the latter is, for example, focused on mathematical skills but also on expected future earnings (eg., Arcidiacono et al., 2012). In this sense, it shows that higher expected future earnings, compared to current prospects, would also increase the probability of choosing SHAPE (Arcidiacono et al., 2012). Regarding personality traits, it can be stated that compared to STEM students, students of SHAPE subjects are more often willing to experiment and show higher levels of emotionality but are less conscientious (Verbree et al., 2021). Next to this, however, neuroticism also is attested to these students (Vedel, 2016), and they also seem to show more flexibility than students of STEM (Sherrick et al., 1971), who are instead less extrovert and more emotionally stable (Coenen et al., 2021; Humburg, 2017; Pozzebon et al., 2014). Nevertheless, Coenen et al., (2021) find that openness to experience also positively influences the study choice for STEM. Besides, the most interesting finding for our case is that SHAPE students show a higher preference for altruistic views compared to other study fields (Pozzebon et al., 2014). Following this, a study of German students shows that students who consider social engagement particularly important are underrepresented in STEM subjects (Stegmann, 1980). Accordingly, we ask RQ3: To what extent does the field of study of students have an impact on the fairness perception of the APP system?

4 Method

We conducted a cross-sectional factorial survey using a questionnaire with standardized response options to address the research questions. Assessing the results, we performed the data analysis in R (version 4.0.3) using the packages lavaan (Rosseel, 2012) and semTools (Jorgensen et al., 2019). The data and code of our analyses are shared online via OSF (https://osf.io/djt39/). Due to the exploratory nature of the study, there was no pre-registration. Accordingly, for our analysis, in which we sought to identify even small effect sizes that may hold consequential implications, we did not perform an a priori power analysis. First, we still needed sufficient information regarding the fit of our measurement models and the distribution of the underlying variables. On the other hand, against this background, the power analysis of covariance-based structural equation modeling (SEM) is also rather intricate. Given this, we turned to G*Power 3.1.9.7 for a post hoc sensitivity analysis targeting the effect size (Faul et al., 2009). With an alpha error probability set at 0.05 and a high power (1-beta err probability) of 0.999, our sample size in the smallest group was 453, and for the whole sample was 1378. First, our latent means model compared means for three groups. With a numerator degrees of freedom (df) of 2 and a denominator df of 1384, the smallest effect we could have reliably detected with our sample size was determined to be a f ~ 0.15 (i.e., a small effect). Second, our regression model encompassed six predictors. This analysis rendered the smallest effect we could have reliably detected with our sample size to be a \(f^2\) of ~ 0.08 (i.e., a small effect). These parameters guide our inquiry, emphasizing our interest in detecting even subtle effects that may be overlooked in less rigorously designed analyses. Remember that due to the different statistical approaches of SEM, our results below may deviate from this sensitivity analysis.

4.1 Procedure

For screening purposes, respondents had to indicate their occupational status to determine if they could participate in the student survey. Afterward, they were informed about the nature and aim of the study, the duration of approximately 15 min to answer the questionnaire, and the use and protection of their data. After giving informed consent, respondents started to answer the questionnaire that included the vignettes of the factorial survey design, the ensuing fairness perception of a university’s APP procedure, questions regarding their self-assessed propensity for dropout, their expected probability that an ADM would assign to their dropout, and their sociodemographics–i.e., age, gender, study subject, numbers of semesters of study, their intended degree, and the type of their recent HEI.

Lastly, participants were thanked, debriefed, and redirected to the provider of the online access panel (OAP), where they received monetary compensation for participation. The average time to complete the questionnaire was 13.72 min (SD = 5.6).

4.1.1 Vignettes of the Factorial Survey Design

Concerning the nature of the factorial survey, following a 2 \(\times\) 3 design, each participant was randomly presented with one of six possible scenarios (see the wording of all vignettes in the appendix). In all vignettes, participants first received information that an HEI offers support measures for students but, due to limited resources, cannot offer those to every student. The institution either deploys 1) an ADM system or 2) university staff (HDM) to analyze student performance data, aiming to assess and predict students’ performance.

In the experimental condition with the vignette regarding the use of AI for the distribution of support measures, participants were given a brief explanation of the term AI, in which ADM was explained as a specific form of AI. Next, the use case was introduced in which an ADM system was used to assess and predict student performance based on student performance data. Regarding the use of university staff, the vignette read that human members of the university were in charge of this assessment and the prediction of performance.

Eventually, in each scenario, respondents learned that the university decides to use or not use this information to either 1) allow all students to sign up for support measures irrespective of the available APP outcome (i.e., the equality norm), 2) allow only students with a good APP outcome to sign up for support measures (i.e., the equity norm), or 3) allow only students with a poor APP outcome to sign up for support measures (i.e., the need norm).

4.2 Sample

Participants were recruited with the online access panel (OAP) of the market research institute INNOFACT AG. Survey field time was between January 27 and February 1, 2022.

All in all, 43776 respondents from the OAP were invited to participate in the survey.Footnote 1 The questionnaire was accessed 3906 times, and 3680 persons started answering the questionnaire. Of those respondents, 1806 persons were screened out as they were not eligible for our survey as they did not belong to the investigated population of German students. The survey’s non-completion rate was 6.1%, and dropouts were equally distributed over all questionnaire pages. Additionally, in the middle of the questionnaire, there was an attention check where respondents were asked to click on a specific scale point (“Please click the ‘tend to disagree’ item here to confirm your attention.”). Altogether, 333 respondents did not answer correctly and were screened out. The final sample consists of 1378 participants.

The average respondent age was 23.22 (SD = 3.32). All in all, 1012 (73.4%) respondents identified as women, 358 (26%) as men, and 8 (0.6%) identified as non-binary.

4.3 Measurements

Perceived Distributive Fairness

Assessing the fairness involved respondents rating four statements on a five-point Likert scale ranging from 1 = do not agree at all to 5 = totally agree.

The four statements translated from German to English are as follows:

  • “A fair balance of interests between different concerns has been achieved.”

  • “The distribution of support measures is fair.”

  • “No one is unduly disadvantaged by the distribution of support measures.”

  • “The distribution of funding measures is just.”

The four indicators suggest good factorial validity (see Table 1).

Table 1 Reliability values of perceived fairness

Study subject

To inquire about the study subject of the respondents, we followed the study subject classification of the German Statistical Office (Destatis, 2021). Respondents could choose up to the second level of the classification from a list of 79 study subjects for brevity. On the first level of the classification, there were eight fields of study, of which we counted four as STEM-related (i.e., mathematics and natural sciences; human medicine and health sciences; agricultural, forestry, and nutritional sciences, veterinary medicine; engineering) and five as SHAPE-related (humanities; sports; law, economics, and social sciences; arts; as well as the category “other”).

Eventually, based on the indicated study subject, respondents were assigned to either a group that studies a STEM subject or a SHAPE subject. This procedure results in a dummy variable coded as 0 ‘SHAPE’ and 1 ‘STEM’.

Self-assessment of the individual expected probability that an ADM would assign to one’s own dropout

The self-assessment individual expected probability that an ADM would assign to one’s own dropout was used as a control variable for the analysis of RQ3. Respondents were asked the following question: “Thinking about your own current studies, what do you think is the likelihood that an AI would predict that you would drop out?”. Respondents could indicate a percentage from 0% to 100% using a slider. The average expected probability that an ADM would assign to one’s dropout was 32.54% (SD = 29.33). For the analysis and interpretation, the variable was divided by ten, so an increase of one unit reflects an increase of ten percent in a respondent’s self-assessment.

5 Results

To address RQ1 and RQ2, we ran a latent factor analysis for the perceived fairness in each condition, and to address RQ3, we estimated a structural regression model with perceived fairness as the dependent variable. In all analyses, effect coding was used for factor scaling, a procedure that “constrains the set of indicator intercepts to sum to zero for each construct and the set of loadings for a given construct to average 1.0” (Little et al., 2006, p. 62). The estimated latent factor model is scaled like the indicators, which helps with interpretation.

Before assessing the results, it is essential to test for measurement invariance between our analysis groups. A test for measurement invariance suggests strong factorial invariance of the latent factor in each condition of the factorial survey (see Table 2).

Table 2 Measurement invariance of perceived fairness

5.1 Mean Differences of Perceived Fairness Between the Distribution Norms and Between ADM and HDM

Subsequently, the model was estimated for each of the six experimental groups of the factorial survey, estimating six means. The model shows good fit (\(\chi ^2\) (62) = 115.65, p = \(< 0.001\); RMSEA = 0.06 CI [0.04, 0.08]; TLI = 0.99).

Perceived fairness of the distribution of support measures for students is highest in the equality condition, followed by the need condition (see Table 3). In the condition that equity is the distributive justice norm of choice, perceived fairness is lowest. Accordingly, the mean difference between equality and need and equality and equity suggest a medium-sized significant effect, irrespective of whether APP was based on ADM or HDM. Concerning the mean difference between equity and need, results suggest a significant difference in the ADM condition but not in the HDM condition.

Table 3 Latent means and mean differences for the perceived fairness

Concerning the differences in perceived fairness of the distinct distributive justice norms between ADM and HDM, as seen on the right-hand side of Table 3, the results suggest no difference.

5.2 Effect of Study Subject on Perceived Fairness

To address RQ3, we estimated a structural regression model in which the perceived fairness as the dependent variable is regressed on the study subject of a respondent as the independent variable among control variables such as their age, gender, and the expected probability that an ADM would assign to their drop out (i.e., the self-assessed likelihood of AI drop out prediction). For a more accessible overview of the effects in light of the results of RQ1 and RQ2, the condition of whether ADM or HDM performed APP was also included as a dummy variable (0 = ‘ADM’ and 1 = ‘HDM’). Furthermore, a moderator variable was included to assess whether the study subject had a different effect depending on the condition of ADM or HDM (i.e., the scaled and mean-centered product of the two dummy variables study subject and the respective condition of decision-making). The estimated model shows a good fit (see Table 4).

Table 4 Structural regression model for the relationship between study subjext and the perceived fairness of support measure distribution

The results suggest that in the Equality condition (B = 0.02, SE = 0.08, p = 0.8, beta = 0.01) and the Equity condition (B = 0.07, SE = 0.09, p = 0.38, beta = 0.04) the study subject did not affect the perceived fairness of the distribution decision. Only in the Need condition there was small significant effect (B = 0.16, SE = 0.08, p = 0.04, beta = 0.10). Whether ADM or HDM was used, the condition did not moderate this effect (B = 0.02, SE = 0.04, p = 0.65, beta = 0.02). Accordingly, students studying a STEM subject judged the latter scenario, in which students in need were eligible for support measures as fairer than students studying a SHAPE subject. There was no difference in whether ADM or HDM served as a basis for decision-making.

6 Discussion

This study aimed to explore students’ perceptions of the deployment of APP as a basis for the distribution of student support measures in an attempt of HEIs to improve student performance and retention using AIEd. The study adds to the body of empirical evidence of fairness perceptions of ADM, an important FAIR ML research field. More specifically, the factorial survey design addressed German students’ distributive fairness perceptions of the application of different distributive justice norms (i.e., equality, equity, and need) based on student performance assessments by either ADM or HDM.

The results of the factorial survey show that equality appears to be the favored distributive justice norm when it comes to the distribution of student support measures. Compared to the distributive justice norms of equity and need, students perceive unqualified access to resources as substantially more fair. This is in line with the literature that suggests that in situations where collectives need to distribute limited resources, the low-threshold heuristic suggests to allocate an equal-sized share to each member (Allison & Messick, 1990; Allison et al., 1992; Morand et al., 2020). In addition, it can be assumed that students see themselves as part of a group and, in this sense, are concerned with social cohesion (Cropanzano & Ambrose, 2015). It may also be speculated that some loss aversion of students may contribute to their preference for equality as the distribution norm of choice (Smith et al., 2019; Tversky & Kahneman, 1991). After all, students prefer a situation in which everyone–at least in theory–has the same chance to receive support measures, irrespective of support measures being a limited resource that not everyone can draw from. Thus, despite having additional information on the individual’s performance from APP, German students remained in this ‘fallback option’ of equal distribution. As a result, the results of our study are consistent with some previous research demonstrating the preference for altruistic and equal distributions (Lee & Baykal, 2017; Uhde et al., 2020). On the other hand, some disagreement with other studies was revealed that either indicates a preference for proportional distributions, for example, in the context of loan decisions (Saxena et al., 2020), or no preference at all (Schlicker et al., 2021).

As of now, even though decision-makers on the university side have high hopes for the use of AI systems and computer scientists go to great lengths to develop and improve them, the idea of APP using ADM (as well as HDM) does not fall on fertile grounds when proposed to students in higher education. After all, one does not need APP to distribute support measures when equality is demanded of the students affected by APP. The same distribution can be ensured by straightforward means, even if, in the end, not everyone can access and benefit from support measures because of limited resources, especially those students who might well deserve additional assistance according to the equity norm or those who could genuinely use it under the need norm. Such evidence points to a conflict of interest between more or less well-intentioned aspirations for improving student performance by administrators and the affected students that may not favor the ultimately suggested solution. It remains to be seen whether more detailed information on the goals and effects of APP or other additional communicative measures may improve preferences for distributions that differentiate between students based on their performance. Further research should thus investigate under what specific conditions students may judge the distribution logic based on assessing performance as more or less fair.

Altogether, students were ambivalent regarding the distributive fairness perceptions of the distribution norms of equity and need. While the reported fairness perception in the equality condition was above the center of the scale, for these two conditions, the results suggest a more negative fairness perception below the center of the scale. Accordingly, not only were distribution norms of equity and need perceived as less fair than equality but their overall assessment suggests that they were deemed somewhat unfair. From this finding, one might conclude that those situations are evaluated more negatively, in which some students that–based on a distributive justice norm–are deemed worthy of support benefit more than all others. However, it remains to be seen whether such fairness perceptions will change if, for example, a combination of different norms focusing on the most effective distribution of support measures or fine-tuning APP to suggest only those measures with the highest individual payoff in terms of increasing student performance are implemented.

Moreover, results show that students are similar in their fairness perception when contrasting HDM and ADM. As it is usually suggested within the literature on AIEd that there is a qualitative difference between humans and AI systems concerning the assessment of vast amounts of input data, our findings suggest that students do not draw a distinction when it comes to their perceived fairness. This finding raises the question, why students are the same in their assessment. In aggregate, do the individual strengths and weaknesses of the respective approach employing HDM and ADM even out so that they are both eventually perceived as equally fair or unfair? Moreover, what are the specific drivers of perceptions of (un)fairness concerning APP if this were the case? Are the underlying preferences of students concerning the application of systematic student performance assessment opposed to the introduction of further means of bureaucratic management and organizational control?

While the inclusion of sociodemographic variables attests that there are inconclusive effects of individual characteristics on perceived fairness, the presumed central predictor of the students’ study subject suggests that the field of study has hardly any effect on fairness perceptions. Only in the need condition did STEM students perceive the proposed distribution as more fair than students from SHAPE studies. However, this effect was not moderated by whether humans or algorithms were the basis of decision-making. While we can only speculate about reasons for this effect limited to the norm of need, the absence of the effects of interest suggests that there is no systematic difference with respect to the field of study when it comes to APP in general.

The study adds to the literature in two important regards. First, it contributes empirically to the understanding of how students perceive different distributive justice norms, particularly focusing on the application of ADM and HDM in distributing student support measures. This addresses a gap in existing research, uncovering nuanced insights into the preference for equality over other norms like equity and need in the context of higher education. Second, the study challenges commonly held assumptions about the potential for technological solutions like APP to enhance fairness in educational support allocation. It emphasizes the complexity of human perception and the importance of social cohesion, altruism, and the potential conflict of interest between administrative aspirations and student preferences. These findings not only underline the need for careful consideration and transparent communication in implementing AI systems in education but also pave the way for future research exploring how to align technological advancements with societal values and expectations.

Limitations of our study include the choice of the empirical design and the sampling of students. First, using vignettes to illustrate the possible scenario of the deployment of APP comes with limited room for the depiction of a usually detailed sociotechnical distribution process. It may also be detached from the current study situation of the surveyed students at their respective universities. After all, only three distinct distributive justice norms were used for illustration. The systems and the decision-making processes HEIs deploy are more varied and elaborate than can be easily depicted and understood in a brief written paragraph within the factorial design. Further research should thus use an actual APP system and give a vivid display of its inner procedural workings, especially in inviting students’ evaluation of ADM. Moreover, non-standardized empirical approaches may prove helpful in assessing the nuances and points of reference of students’ fairness perceptions more closely than our standardized design and measure of fairness perceptions. Second, the sample was limited to German students and was skewed towards students who identified as female. While this sample bias is less of an issue concerning the experimental logic within a factorial survey design, we suggest that additional studies broaden the scope and include a more diverse set of students with a special focus on cross-country comparisons, considering the diverse academic systems and institutions across nations. Such an approach is critical given that previous research on FAIR ML has focused almost exclusively on study samples from Western countries (Starke et al., 2022). As delineated by Lee et al. (2017), these cultural subtleties can be discerned when contrasting American and Indian respondents regarding their distinct fairness perceptions. Furthermore, the heterogeneous expectations of students, shaped by the differing academic systems and sociocultural contexts they hail from, warrant further exploration to ensure the comprehensive applicability of any model or application.

7 Conclusion and Implications

The deployment of AIEd is an ongoing process that aims to introduce sociotechnical AI systems such as ADM for APP within HEIs. Consequently, it may change the social fabric of educational organizations and the life trajectories of generations of students. While the intentions to use APP to improve student performance and retention arguably appear understandable from the perspective of university administrators, it is suggested to also focus on the people affected by introducing sociotechnical AI systems. Research on the social consequences and evaluation of the affected students thus deems necessary, and our study answers the call for a deeper understanding of the distributive fairness perceptions of distinct norms of distributive justice. Our findings show that German students prefer a distribution of support measures based on an APP system according to the equality norm. In this regard, there are no differences in fairness perceptions of an APP comparing HDM and ADM and concerning the students’ study subject.

Eventually, the results of our study link to the literature on AIEd and thus contribute essential findings to the research on perceptions of distributive justice in this field. Consequently, these findings have implications for HEIs and administrators and future research in this area. In light of evidence that students prefer a distribution according to the equality norm when distributing support measures based on an algorithmic APP system, the question arises as to whether it makes sense to use an APP system at all since support measures that are–in theory–intended to be open to all students do not require particular classification by an algorithm in the first place. Should individual universities nevertheless decide to use an APP system to benefit from its presumed potential–for example, to enable individual feedback or to increase the graduation rates (Attaran et al., 2018; Mah, 2016)–this use of AIEd could evoke students’ disapproval. In light of potential negative consequences for the university’s reputation with students and student retention (Marcinkowski et al., 2020), the pros and cons of using APP must be carefully weighed. Should universities make plans to introduce APP, we suggest that they conduct research that investigates the evaluation of APP by its current and future student body. This evaluation may take the form of an ongoing consultation process with all stakeholders within the university to navigate potential fairness issues and their respective contextuality.

Further research in the field of AIEd, particularly regarding the deployment of APP within HEIs, beckons multidimensional exploration. Firstly, studies should delve deeper into understanding the various norms of distributive justice across diverse student populations. While our study focused on German students’ preferences towards the equality norm, it remains to be seen if such preferences resonate similarly across different cultural and educational contexts. Moreover, as suggested above, the practical ramifications of these findings for universities worldwide need empirical substantiation. A targeted examination of whether supposedly innovative APP systems align with the core ethos and strategic visions of HEIs could provide more clarity on the matter. Similarly, the potential for negative repercussions for universities, in terms of reputation and student retention, stemming from student perceptions of AIEd, necessitates a comprehensive risk assessment. Lastly, a co-creative approach, wherein students and other key stakeholders are involved in APP deployment’s developmental and decision-making processes, might offer a blueprint for future AIEd endeavors. Investigations into such participatory methodologies, gauging their feasibility, efficiency, and efficacy, could shape the future discourse on AI in education.