Disparities in Students’ Propensity to Consent to Learning Analytics

Li, Warren; Sun, Kaiwen; Schaub, Florian; Brooks, Christopher

doi:10.1007/s40593-021-00254-2

Disparities in Students’ Propensity to Consent to Learning Analytics

Article
Open access
Published: 03 June 2021

Volume 32, pages 564–608, (2022)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Artificial Intelligence in Education Aims and scope Submit manuscript

Disparities in Students’ Propensity to Consent to Learning Analytics

Download PDF

6693 Accesses
16 Citations
19 Altmetric
2 Mentions
Explore all metrics

Abstract

Use of university students’ educational data for learning analytics has spurred a debate about whether and how to provide students with agency regarding data collection and use. A concern is that students opting out of learning analytics may skew predictive models, in particular if certain student populations disproportionately opt out and biases are unintentionally introduced into predictive models. We investigated university students’ propensity to consent to learning analytics through an email prompt, and collected respondents’ perceived benefits and privacy concerns regarding learning analytics in a subsequent online survey. In particular, we studied whether and why students’ consent propensity differs among student subpopulations bysending our email prompt to a sample of 4,000 students at our institution stratified by ethnicity and gender. 272 students interacted with the email, of which 119 also completed the survey. We identified that institutional trust, concerns with the amount of data collection versus perceived benefits, and comfort with instructors’ data use for learning engagement were key determinants in students’ decision to participate in learning analytics. We find that students identifying ethnically as Black were significantly less likely to respond and self-reported lower levels of institutional trust. Female students reported concerns with data collection but were also more comfortable with use of their data by instructors for learning engagement purposes. Students’ comments corroborate these findings and suggest that agency alone is insufficient; institutional leaders and instructors also play a large role in alleviating the issue of bias.

Supporting the shift to digital with student-centered learning analytics

Article 25 November 2020

Data Privacy in Learning Management Systems: Perceptions of Students, Faculty, and Administrative Staff

Ethical oversight of student data in learning analytics: a typology derived from a cross-continental, cross-institutional perspective

Article 07 July 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

In recent years, data collection in educational settings has been increasing. While this is partially attributed to institutional audit culture and its practices of benchmarking and formalizing accountability (Shore and Wright 2003; Shore 2008), the rise of technology use in classrooms (Tondeur et al. 2017; Long et al. 2017) coupled with advances in artificial intelligence and machine learning algorithms, which heavily rely on large quantities of data, has accelerated this trend. One purpose for the collection and use of this data is to create predictive models of learners, the targets of which range from academic performance to affect and engagement in class (Gardner and Brooks 2018). An application of these models include early warning systems (Macfadyen and Dawson 2010), which are used to alert advisors, instructors, administrators, or students themselves if a student appears to be struggling so that they can be supported before they fall significantly behind (Alhadad et al. 2015).

However, these systems often rely upon the collection of sensitive data such as demographics, grades, and interaction traces with online content (Pardo and Siemens 2014) that students are uncomfortable sharing for learning analytics (Ifenthaler and Schumacher 2016) depending on the stakeholder involved. For instance, third parties, such as Learning Management System (LMS) vendors, have also turned to developing early warning systems and products that rely on educational data, even though such data sharing arrangements may be unclear to students (Polonetsky and Jerome 2014). The manner by which data collection is conducted thereby creates a tension between institutional goals of using predictive models to support students’ educational progress and retention, instructor goals of course-specific performance monitoring, and upholding commitments to learners’ consent, agency and privacy (Pardo and Siemens 2014; Prinsloo and Slade 2014b).

There have been numerous calls to provide students with more agency regarding how data is used in learning analytics (Pardo and Siemens 2014; Drachsler and Greller 2016). Yet, students’ privacy concerns may deter them from consenting to the use of the data in learning analytics. Moreover, biases have been shown to exist in predictive models, partly due to non-representative samples acquired during data collection (Ocumpaugh et al. 2014). As the availability of data is restricted, machine-learned models may have a reduction in accuracy which can lead to less effective interventions for some (or all) students (Li et al. 2019). This is particularly concerning since demographic gaps already exist in educational achievement (Bainbridge and Lasley 2002), and is especially true for underrepresented minorities (Bensimon 2005), those with a lower socioeconomic status (Duncan and Magnuson 2005), and between genders in certain contexts such as STEM programs (Matz et al. 2017). Not only are there outcome discrepancies, but it has also been shown that different demographics-based communities have different expectations of privacy and concerns when it comes to how their data is to be used (Cho et al. 2009). If students in the minority groups or with a particular background are more reluctant to share data, their data will be absent, which may end up biasing models in ways that are not representative of all students.

In this study, we investigate students’ propensity to consent to or opt out of having their data collected and used for learning analytics. We further connect consent propensity to students’ demographics, personality characteristics, privacy perceptions, as well as students’ perspectives and concerns regarding learning analytics in order to understand the factors motivating students’ expressed consent preferences. Linking participants’ responses to demographic characteristics enables us to analyze the differences between student subpopulations and how those might translate into differential consent rates. The research questions we address are as follows:

[RQ1:]
What are students’ perspectives on their educational data being used in learning analytics?
[RQ2:]
What are the population and participation characteristics of students who indicate a preference to allow or disallow their educational data to be used for learning analytics?

In “Methods”, we describe our study to answer these questions by first ascertaining students’ propensity to consent or deny use of their educational data for learning analytics with an email-based, one-question preference elicitation prompt. Respondents were subsequently invited to complete an online survey that investigated the factors behind their consent indication in order to identify key determinants. The email prompt and online survey responses were then associated with students’ institutional demographic data in order to contextualize the relationship between students’ demographic characteristics and their propensity to participate in learning analytics. We sent our email prompt to a sample of 4,000 students at our institution stratified by ethnicity and gender; 272 students responded to the email prompt, of whom 119 further completed the survey.

In “Findings”, we found differences in response rate to the email prompt among genders and ethnicities. Female students were much more likely to respond than male students and, despite stratified recruitment, responses from White students were overrepresented while responses from Black students were underrepresented; there were no differences in consent behavior between genders nor ethnicities. Among respondents, we identified three important factors which play a role in students’ consent expressions regarding learning analytics: student’s trust in the educational institution, a student’s level of concern regarding individual data collection, and a student’s comfort with an instructor’s use of data for improving student engagement. Certain privacy attitudes are correlated with population subgroups, most notably students’ identifying as Black generally express less trust in the institution, and female students tend to have greater apprehension about personal data collection while simultaneously being comfortable with instructor use of such data to improve student engagement.

Our findings suggest that instructors may have an important role in making students feel at ease when it comes to data sharing. We discuss in “Discussion” how this comfort may be bolstered by being more transparent regarding who data is used and who has access to it, thereby balancing broader institutional interests of effectively educating students while maintaining individual privacy safeguards and student agency. We also discuss limitations of the current study and routes to deepen our understanding of the rationale behind students’ consent decisions.

Background and Related Work

We discuss prior work on privacy and ethical concerns regarding learning analytics, equity and disparities in education, and sociocultural orientations in education.

Privacy and Ethical Issues in Learning Analytics

Learning analytics relies on the collection and use of student data that may include sensitive information and confidential records, which raises privacy concerns (Drachsler and Greller 2016; Ifenthaler and Schumacher 2016; Reidenberg and Schaub 2018). Meanwhile, broader changes in society emphasizing individuals’ rights in data processing are reflected in new privacy regulations such as Europe’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) and California Privacy Rights Act (CPRA). In this light, discussions around the implications of collecting, using, and analyzing student data in educational contexts are becoming more critical (Prinsloo and Slade 2017; Niall 2017). Existing research has pointed out emerging privacy and ethical issues around learning analytics, including student consent and agency over student data, and their trust in learning analytics systems (Pardo and Siemens 2014; Drachsler and Greller 2016; Rubel and Jones 2016).

Student consent is critical not only to demonstrate respect for them and their decisions, but also to support important values such as autonomy and freedom of choices (Sedenberg and Hoffmann 2016). Also, considering student consent is to acknowledge students’ rights and voluntary collaboration to allow the collection and use of student data by learning analytics in supporting student learning (Slade and Prinsloo 2013). It is an ethical approach when institutions include codes of conduct that guide informed consent, data collection purposes, and transparency of data use to minimize potential harm and allegations of misuse (Land and Bayne 2005; Slade and Prinsloo 2013).

Prior studies have found several sociodemographic characteristics contributing to disparities among demographic groups when it comes to their consent to participation in research such as age (Jacobsen et al. 2004; Benfante et al. 1989), gender (Ramos et al. 2004; Pirzada et al. 2004), socioeconomic status (Boshuizen et al. 2006; Gordon et al. 1959), and ethnicity (Moorman et al. 1999, 2004). Li et al. (2019) found that student consent or opt-out decisions can affect the predictive power of learning analytics models for different student subpopulations. In our study, we quantified students’ participation and consent rates for learning analytics by demographic groups, which is important for contextualizing the differential effects identified by Li et al. We further investigate the underlying reasons as to why students choose to consent or opt out of learning analytics, and how these factors are linked to demographic characteristics and personality traits.

Meanwhile, consent is closely related to autonomy and agency (Alexander 1996). Student agency is characterized as students being able to hold themselves accountable to make decisions in learning processes, which is critical to students’ learning engagement and pursuit of learning goals (Deakin Crick and Goldspink 2014; Seifert 2004). To increase student agency and empowerment to participate in learning analytics, students should be viewed as collaborators in learning analytics rather than data producers or service receivers (Buchanan 2011; Kruse and Pongsajapan 2012), and Sun et al. (2019) found that students demand more agency in decisions regarding how data about them is used.

On the other hand, current consent practices face challenges and critiques as consent is often perceived as an operational act rather than being understood and assented with moral legitimacy (Barocas and Nissenbaum 2014). Barocas and Nissenbaum (2009) identified that in the online behavioral advertising context, consent often neither sufficiently capture users’ agreement to tracking and targeting nor conveys meaningful notice that could facilitate users’ choices due to the disconnection of privacy policies from different parties (e.g., data publishers, contracted third parties), the changing nature of privacy policies, and the lack of data flow transparency to users. As a result, accompanied with the asymmetrical power relationships with companies, people could feel powerless toward the inevitable privacy violations, a social phenomenon described by Draper and Turow (2019) as digital resignation. Obfuscatory consent practice may confuse people and discourage them from demanding agency (Draper and Turow 2019; Ellison and Ellison 2009).

Furthermore, students’ trust in learning analytics systems plays a critical role in supporting an educational ecosystem that maximizes the experiences of different stakeholders such as learners and educators (Drachsler and Greller 2016), creates reciprocal relationships between the institution and the students to encourage students to share their data for learning benefits (Slade et al. 2019), and facilitates the establishment of reliable analytics systems (Petersen 2012). Prior work has also found several factors positively influencing students’ trust in learning analytics, such as protecting data to avoid unauthorized access or distribution, proper storing of historic data, data de-identification, valuing student privacy, achieving consensus on data collection purposes, and transparency of data collection (Pardo and Siemens 2014; Clarke and Nelson 2013; Drachsler and Greller 2016; Slade and Prinsloo 2013; Beattie et al. 2014). Recent work shows that students inherently trust and expect their institution to properly and ethically use student data (Slade et al. 2019). Our study explores whether students’ trust in the institution affect students’ propensity to consent to learning analytics.

Equity and Disparities in Education

Equity has long been a fundamental concept in education. Simon et al. (2007) describe equity in education as twofold: fairness and inclusion. Equity as fairness illustrates that an individual’s socioeconomic status should not affect their chances to pursue education. Equity as inclusion acknowledges the basic need to complete compulsory education in order to acquire the skills needed in society.

Racial equality remains a controversial issue in education due to disparities in academic outcomes and limited access to opportunities and resources for students of color (Noguera 2016). Students from Hispanic/Latinx, African American, American Indian, and Pacific Islander groups are underrepresented at all levels of higher education from undergraduate majors to graduate program pursuits, particularly in STEM-related fields (Hanson 2008; Cook and Córdova 2007).

Even with successful graduation from higher education, individuals from minority groups are less likely to consider pursuing research careers (DePass and Chubin 2008).

In the late-90s, Ladson-Billings (1998) stated that “the intersection of race and property [is] a central construct in understanding a critical race theoretical approach to education”. While the fundamental belief in critical race theory (CRT) is to “recognize the experiential knowledge of people of color” (Matsuda 2018), education scholars hold the same belief and have recognized more aspects in CRT that align consistently with education equity goals (Ladson-Billings 1998; Dixson and Rousseau 2005) such as, the inherency of CRT to historical and contextual analysis, its challenges to mainstream neutrality, objectivity, color-blindness, and merit, as well as its values on the opinions of people of color (Crenshaw et al. 1995; Matsuda 2018).

Gender equity in education has also been a subject of national debate as shown when the American Association of University Women (AAUW) published The AAUW Report: How Schools Shortchange Girls (Bailey et al. 1992), which marks a series of efforts to support gender equity through the introduction of topics such as race and gender on campus and girls in science and technology (Corbett et al. 2008). Gender disparities have been shown to play a role in students’ academic performances, school experiences, education outcome, and barriers while achieving their educational goals (Buchmann et al. 2008; McWhirter 1997; Grossman and Grossman 1994).

As learning analytics aims to support teaching and learning for all students (Diaz and Brown 2012), discussions arise around the fair use of learning analytics (Prinsloo and Slade 2014a; Roberts et al. 2017) and predictive models (Dwork et al. 2018; Friedler et al. 2019; Liu et al. 2018; Gardner et al. 2019) due to the potential biases and lack of impartiality in such algorithms (Cofone 2018; Richardson et al. 2019); this could lead to inaccurate modeling for populations that are not well represented (Li et al. 2019; Ocumpaugh et al. 2014). When coupled with the fact that minorities are already less likely to consent in numerous contexts as we described in “Privacy and Ethical Issues in Learning Analytics”, it becomes crucial to understand how characteristics such as gender and race affect students’ consent propensity in the context of learning analytics to avoid developing models that inadvertently widen disparities.

Sociocultural Orientations in Education

Learning science research has established that students’ academic performance is related to factors such as their personality traits (Zhou 2015), cultural background (Niles 1995), and competitiveness and cooperativeness (Baumann and Harvey 2018). An individual’s competitiveness and cooperativeness is part of their social interdependence orientation (Johnson et al. 1998; Johnson and Norem-Hebeisen 1979), and such characteristics are associated with one’s gender, cognitive and social development (e.g., perception and response in group settings), attitudes toward the educational institution and relevant people in that environment (e.g., other students and teachers), and perspective-taking ability (Madsen 1967; Johnson and Engelhard 1992). More specifically, positive social interdependence is (cooperation) established when individuals from a group share common goals and their collective actions affect the group outcomes (Johnson and Johnson 1991; Deutsch 1949). In other words, people cooperate when they realize that they would not accomplish the goal without everyone working towards it (Johnson et al. 1998; Johnson and Johnson 1991). Relatedly, people’s gender (Ramos et al. 2004; Pirzada et al. 2004) and perceived contributions of their consent to research benefits (Kim et al. 2017) (as an example of perspective-taking) have been shown to affect consent. We therefore explore whether students’ competitiveness/cooperativeness, as a representation of their various underlying cognitive and social developments, would be a factor influencing their willingness to consent, as well as their perspectives on data collection and use.

Furthermore, the culture aspect of social orientations can reflect an individual’s decision-making considerations and motivation to succeed (Johnson and Engelhard 1992; Triandis 2018). Among different dimensions of culture measurements, individualism-collectivism (IND-COL) has been the most studied (Hofstede 1984; Cozma 2011). IND-COL can be a key characteristic of an individual’s racial identity (Nobles 2006), influences how people prioritize personal goals versus group goals (Schwartz 1990; Yamaguchi 1994), and can be used as a framework to analyze if one feels connected to and responsible for the group they belong to (e.g., students’ perception of their roles and responsibilities as students) (Taylor and Moghaddam 1994; Triandis et al. 1988). Carson (2009) also identified that collectivism is reflected in students’ belief of education purpose and the way they evaluate academic success. As we discussed in “Privacy and Ethical Issues in Learning Analytics”, consent is closely related to one’s agency and student agency relies on students being considered collaborators in learning analytics. Thus, we investigate if there is a relationship between students’ sense of responsibility to contribute their data (as a form of collectivism) and their consent practices.

Methods

Our study investigated two primary research questions: (RQ1) What are students perspectives on their educational data being used by learning analytics systems in the form of predictive models? and (RQ2) What are the population characteristics of students who indicate they would consent or opt-out of participating in such uses? In order to investigate these questions, we distributed an email-based preference elicitation prompt to students asking them whether or not they would hypothetically agree to have their data used in learning analytics systems. Upon selecting either yes or no to indicate their consent preference, students were redirected to an online survey that asked about the student’s rationale behind their consent indication and perspectives regarding their data being used for leaning analytics in different contexts and by different stakeholders. We further elicited relevant personality characteristics and attitudes that might impact students’ propensity to consent. Responses were then linked with institutional demographic data to identify correlations with consent. This study design is summarized in Fig. 1.

The study team comes from an interdisciplinary background and has a variety of experiences with student data. Dr. Brooks, for instance, has been a part of the institutional stewardship chain for student data related to learning technologies, which is adjacent to the data we collected. In addition, Dr. Schaub has been involved in institutional processes related to privacy and learning analytics, and the whole study team has been involved in student modeling and educational data science research at the institution including qualitative and quantitative approaches in the past. Next, we explain each part of the study design in greater detail. Our study has been approved by our Institutional Review Board.

Measuring Privacy Perceptions, Personal Traits, and Decision to Consent

We sent a one-question email prompt to a stratified student sample to understand student’s consent decision regarding data being used by learning analytics. Li et al. (2019) found that the use of either “opt-out” or “opt-in” wording leads to different response rates from participants. Thus, we prepared two variants of the email prompt, shown in Fig. 2.

Students only saw one framing and we conducted pilot testing to ensure that the wording did not lead to confusion. Once a student clicked either the yes or no link, the response was logged with an identifier to link it with their corresponding institutional demographic records. Identifiers were subsequently discarded before analysis. Regardless of response, respondents were then directed to a debrief that explained the purpose of the study, an informed consent form, and an invitation to participate in an optional online survey. Participants who completed the online survey were compensated $5.

The email prompt allowed us to ascertain propensities for students’ consent to learning analytics data use. Our online survey further explored why such decisions were made. Note that we intentionally used a broad consent message in order to study the factors and pre-conceived notions about learning analytics that influence students’ consent decision. We are not advocating for this prompt as an exemplar for broadly soliciting data consent decisions on live systems.

For the survey questions (see Appendix A for the full survey instrument), we iteratively refined the wording to minimize misinterpretation, and pilot-tested the questions with a group of about 10 undergraduate and graduate students working on privacy-related and educational technology research. While the survey contains multiple scales, most questions were Likert scale items that did not require significant cognitive load to process. We also provided fair compensation based on the average completion time of 15 min, which we do not consider to be excessive, though it is presumable that some participants exited due to length. As shown in Table 1, of the 272 people who clicked one of the options in the email prompt, 150 actually consented and started the survey, of whom 116 completed the survey, i..e, the survey completion rate is 43%.

Table 1 Response rates per condition

Disparities in Students’ Propensity to Consent to Learning Analytics

Abstract

Similar content being viewed by others

Supporting the shift to digital with student-centered learning analytics

Data Privacy in Learning Management Systems: Perceptions of Students, Faculty, and Administrative Staff

Ethical oversight of student data in learning analytics: a typology derived from a cross-continental, cross-institutional perspective

Introduction

Background and Related Work

Privacy and Ethical Issues in Learning Analytics

Equity and Disparities in Education

Sociocultural Orientations in Education

Methods

Measuring Privacy Perceptions, Personal Traits, and Decision to Consent

Recruitment & Participants

Quantitative Analysis: Identifying Factors in Consent Decisions

Ensuring Data Quality and Correctness

Model Fit and Feature Selection

Quantitative Analysis: Understanding Relationships Between Key Consent Factors and Demographics

Qualitative Analysis

Findings

Quantitative Analysis Findings

Response to Email Prompt

Identifying Primary Factors in Participation

Demographic Correlations with Key Factors in Consent

Qualitative Analysis Findings

Reported Important Factors in Consent Decision

Perceived Benefits of Data Use in Learning Analytics

Perceived Concerns Regarding Data Use in Learning Analytics

Students’ Trust & Distrust in Institutions

Student Perspectives on Data Collection

Views on Instructor Use of Student Data

Discussion

Survey Instrument Limitations

The Difficulty Posed by Varying Engagement Rates

The Role of Instructors in Key Factors for Consent

Ethnicity and Gender Trust Gaps and The Role of Institutions

Context Dependency and its Importance in Learning Analytics

Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendices

Appendix A: Debrief, Consent, and Survey Instrument

Debrief on Study Purpose

Name of the Institution Consent for Student Participation in a Research Study

Appendix B: Codebook for Open-Ended Questions

Layer 1 Code: Value Data Contribution

Layer 1 Code: Trust

Layer 1 Code: Distrust

Layer 1 Code: Positive Impact/Benefits on Other Parties

Layer 1 Code: Negative Impact on Other Parties

Definition

Layer 1 Code: Impact on Students Themselves

Layer 1 Code: Privacy

Layer 1 Code: Confidentiality

Definition

Layer 1 Code: Data Stolen/Breaches/Leakage

Definition

Layer 1 Code: Lack Transparency (on Data Collection, Access, Use, Storage, Flow, Tracking, Analysis, Interpretation, Algorithm)

Definition

Layer 1 Code: Inaccurate Interpretation/Representation from the Data and Analytics

Definition

Layer 1 Code: Lack Agency/Control/Consent

Definition

Layer 1 Code: Data Value/Compensation

Definition

Layer 1 Code: No Concerns

Definition

Layer 1 Code: Miscellaneous

Definition

Rights and permissions

About this article

Cite this article

Share this article

Keywords