Education and Big Data
KeywordsBig data Analytics Ethics Education
Big data is a significant concern for many academics, largely because it is complex, unmanageable, and open to misuse. While there is a tendency to believe that “big data” might be bad and possibly dangerous, many types and uses for it exist. The challenge of big data for higher education is that it has been, until fairly recently, portrayed as something that is straightforward, clear, and easily delineated, when in fact it is none of these, and there is still relatively little consensus about how it might be defined. This entry explores how big data is defined, described, and utilized in different contexts. It explores different notions of analytics and suggests how these are having an impact on higher education. The entry then explores the claims that are being made about the objectivity of big data and sets these claims in the broader context of what can be claimed and what cannot. In the context of such claims, the way in which the ideas about what is plausible, possible, and honest in the use of big data is examined, and suggestions are offered about what may be useful and realistic uses of big data. The final section explores the possible futures for research and use of big data in the context of higher education and offers some suggestions as to ways forward.
Definitions of Big Data
Sources are usually available in real time.
The scale of the data makes analysis more powerful and potentially more accurate.
Data often involve human behaviors that have previously been difficult to observe.
Huge in volume, consisting of terabytes or petabytes of data
High in velocity, being created in or near real time
Diverse in variety, being structured and unstructured in nature
Exhaustive in scope, striving to capture entire populations or systems (n = all)
Fine grained in resolution and uniquely indexical in identification
Relational in nature, containing common fields that enable the conjoining of different data sets
Flexible, holding the traits of extensionality in that it is possible to add new fields easily, as well as expand in size
Big data in different disciplines
Understanding and use of big data
Economics (Taylor et al.)
Specific terminology seen as fairly recent – some were working with what is now being termed “big data” a decade ago and believe it has not gained much traction within academic economics
Seen as a class of data which was particular in terms of its size and complexity, although there were several different points of view as to which features rendered it genuinely new
Digital humanities (Manovich 2012)
The use of data analytics to analyze and interpret cultural and social behaviors
Complex overview of data, visual representations of images and videos, exploration of patterns of representation
Education (Sclater 2014)
Use of data to analyze student retention, student engagement, and identification of risk and to examine student progress
Data seen as useful for gaining information, tracking possible problems by student, tutors, and senior management
Business (Brynjolfsson and McAfee 2012)
Use of data to make predictions and management decisions
Information from social networks, images, sensors, the Web, or other unstructured sources are used for decision making in business
Journalism (Lewis 2015)
Journalism that incorporates computation and quantification in diverse ways, for example, computer-assisted reporting
The implementation of mathematical skills in news work as well as the critique of such computational tools
Maths and statistics (Housley et al. 2014)
Creating mathematical tools for understanding and managing high-dimensional data
Tools, algorithms, and inference systems seen as vital for analysis of data within maths and statistics but also other disciplines using big data
Computer science (Rudin et al. 2014)
Use of methods for statistical inference, prediction, quantification of uncertainty, and experimental design
Use of multidisciplinary teams with statistical, computational, mathematical, and scientific domain expertise with a focus on turning data into knowledge
Medicine and health (Lee and Yang 2015)
Predicting and modeling health trends
Locating health patterns
Understanding prevalence and spread of disease
Psychology (Moat et al. 2014)
Predicting and modeling trends and the use of data sets to examine behaviors, influence, and use of language
Analysis of trends, behaviors, judgment, and decision making as well as spheres of influence
There are currently many different types of analytics in higher education, but it is only relatively recently that it has been termed learning analytics. However, learning analytics is in fact rooted in a longer tradition such as educational data mining and academic analysis. Currently (in 2015), learning analytics in education and educational research focuses on the process of learning (measurement, collection, analysis, and reporting of data about learners and their contexts), while academic analytics reflects the role of data analysis at an institutional level. For many researchers in higher education, learning analytics and data analytics are seen as fields that draw on research, methods, and techniques from numerous disciplines ranging from learning sciences to psychology.
Form of analytics
Analysis of student engagement, predictive modeling, patterns of success and failure
Analysis of learner profiles, performance of academics, knowledge flow, research achievements, ranking
(Big) Data analytics
Commercial contexts and data warehousing
Development of data mining algorithms and statistical analyses
Information retrieval and computational linguistics
Discovering the main themes in data such as in news analysis, opinion analysis, and biomedical applications
Commercial and academic context and cloud computing
Integration of data across platforms for social research and/or commercial gain
Academic contexts, such as mathematics, sociology, and computer science
Examination of scientific impact and knowledge diffusion, for example, the h-index
Commercial contexts but also increasingly in areas such as disaster management and health-care support
To reach many users but also increasing productivity and efficiency in a workforce
Knowledge analytics (this term tends to be used with learning analytics but is generally defined)
Commercial settings and to some degree academic settings
To manage knowledge within an organization and to use organizational knowledge to best effect
Objectivity and Context
Claim 1: Big data speaks for itself. This is clearly not the case since analysis and mapping are researcher driven. There is a need to ask not just what might be done with big data but why (and if) it should be used in particular ways – as well how big and small data might be used together.
Claim 2: There are many good exemplars of big data use. This is not the case, particularly in the social sciences and education, where the landscape is complex and varied. For example, in 2013 Snowden disclosed that the US National Security Agency was monitoring domestic “metadata.” The archive released by Snowden indicated that the e-mails, phone calls, text messages, and social media activity of millions of people around the world had been collected and stored and then without consent been shared and sold (Rodriguez 2013). Although this has brought to light a number of other forms of monitoring and surveillance practices, the US Government argued that it was only “metadata.” The Snowden examples introduce questions for those who work in higher education about how data they collect and are data that collected about them and their students are used in covert ways. It would seem that increasingly government agencies are using big data in ways which focus on economic outcome results in unhelpful social, political, and cultural bias for educational activities. Such a stance would seem to indicate that there is increasingly a neoliberal agenda shaping higher education, with a growing belief in competitive individualism and the maximization of the market.
Claim 3: There is integration and understanding across the disciplines. While some universities have shared forums for big data, much big data remains in disciplinary silos. There is a need for greater interdisciplinarity and large teams to work together coherently.
Claim 4: There is a coherent view about how learning and academic analytics should be used. It is evident that institutions already seem to be finding themselves having to balance students’ expectations, privacy laws, tutors’ perspectives about learning, and the institution’s expectations about retention and attainment.
These four claims exemplify the need to consider issues of plausibility and honesties in big data research. What is often missing from claims and debates is how power is used, created, or ignored in the management and representation of big data, or where and whose voices are heard or ignored, privileged, or taken for granted.
Plausibility and Honesties
Ethical issues connected with using big data are complex and muddled. It is often assumed that just because data are public, ethical concerns can be ignored. The open, accessible, and online society has resulted in various kinds of uses of big data, one of which is tracking. For example, many people inadvertently leave tracking devices switched on their mobile phones or leave the Wi-Fi on overnight. The result is that people do not realize they are being tracked, while feeds from social networks are analyzed and visualized, personal movements tracked, and shopping behaviors noted. There is thus a serious lack of privacy, which occurs through the aggregation of users’ online activities. Companies can track and aggregate people’s data in ways previously impossible, since in the past people’s data were held in paper-based systems or company silos. Now, personal data can be mined and cross-referenced, sold, and reused, so that people are being classified by others through sharing their own data. This use of big data is often disregarded – but it is relatively easy to discover most things about most people, and blue chip companies can use such large data sets to ensure market advantage. In day-to-day life, this open but hidden knowledge is already both accepted and ignored. There have been discussions about the need for better formal regulation and changes to the way social media are designed. Yet, almost a decade after the concerns were first raised, the suggested changes are unlikely to occur, and it is difficult to decide how security might be maintained in a post-security world. Now, as time marches on, most people are encountering various forms of liquid, participatory, and lateral surveillance (Savin-Baden 2015).
There are still questions about what it is possible to “know” from big data analyses and ethical challenges concerning what is done or not done with such analyses and findings. In higher education, the focus and interest in big data have resulted in many researchers rebadging their work as “big data research,” when in fact it is not. Particularly in the humanities, this has resulted in criticism of big data research. Some years later, the pertinent criticisms and concerns of many higher education researchers still seem to have resonance. Big data has changed how data are seen, how they are used, and how they are defined. Such shifts are changing how knowledge is seen and managed in higher education. At the cusp of higher education and commerce, big data tend to be located as neutral, objective, and reliable. This, in turn, obscures the ways in which big data are covertly managed and used and the ways in which people become constructed by and through big data.
Big data, as aforementioned, can be linked to neoliberal capitalism, and engaging the current performative enterprise practices has shift the focus in higher education increasingly toward consumerism, the marketization of values, and the oppression of freedom. Thus, criticality and questioning are being submerged in the quest for fast money and solid learning. In areas of higher education that reject neoliberal capitalism, there is a tendency to shift away from the idea of big data as a resource to be consumed and as a force to be controlled and instead to ask questions summarized below.
How Accurate or Objective Is Big Data?
As a result of the way big data are constructed and used to make policy decisions, it is vital to recognize that these data can easily be a victim of distortion, bias, and misinterpretation. Driscoll and Walker (2014) illustrate how data access and technological infrastructure can affect research results. For example, they demonstrate how differences in timing or network connectivity can result in different results for the same experiment.
Is Big Data Better Data?
While it would be easy to suggest a binary relationship here, it is important to note that big data is not always representative, nor is it necessarily presenting a complete picture of the issues, nor may it meet high enough standards of rigor and quality. Administrators, faculty, and managers in higher education may find the promise of big data alluring. Assumptions that big data is objective, with clear outcomes that will improve retention, increase student numbers, and ensure there is more money in the university coffers, make this promise highly seductive. Yet these data are not necessarily reliable, and using them for monetary ends in a sociopolitical system such as higher education brings high risk.
How Are Issues of Context To Be Dealt With in Big Data Research?
Big data sets need to be located contextually and there is a need to understand how big data are being used and understood and what is being claimed for them. There is a need for more robust studies and examples across higher education to provide an examination of issues of context by defining and critiquing how big data and definitions of it have changed over time.
What Are the Ethical Issues Associated with Big Data?
The ethical questions relate not only to how data are obtained, as in the Snowden affair (Rodriguez 2013), but who and what is subject to analysis. Ethical considerations also extend to how and where data are reported. For example, Eubanks (2014) researched the electronic benefit transfer card and food stamp use in the United States and suggested that those in poverty are already “in the surveillance future.” The result is that the poor and marginalized that are more easily tracked are already being judged and assumptions made about them, which may or may not be just.
To What Extent Is Big Data Creating Digital Divides?
It seems that the expense of gaining access to big data has resulted in a restricted access to this data, with higher education necessary being marginalized as a sector. Yet the abovementioned study by Eubanks (2014), as well as other studies concerning surveillance, illustrates not merely digital divides but also suggest surveillance divides.
See big data as part of a repertoire of data collection and analytical options.
Use big data as a means of locating areas that can or need to be explored on a smaller scale.
Use big data in multi-methodological ways so that the research undertaken is both wide and deep.
Recognize the advantages, disadvantages, challenges, and power issues of working with large data sets alongside small, fine-grained data.
Acknowledge that large real-time data sets, such as those produced by social networks, often do not provide a clear or representative picture of realities.
Recognize that full documentation of how big data were collected will probably be unavailable, and therefore the validity of such data is likely to be unpredictable and tenuous.
There is a prominent expectation that big data can and will deliver more than is really possible and that its questionably clear outcomes will necessarily make a difference to the complexity of human life and experience. The contrasting view questions whether big data can offer anything particularly new or innovative while being concerned about the management of big data and how they are being used in persuasive and pernicious ways. What appears to be a consistent message is that big data is difficult to manage, analyze, and evaluate. Therefore, it is uncertain how robust findings and assumptions, as well as what has been learned, might in fact be. It is vital to recognize that big data is neither good nor bad but useful in different ways if collated and presented with honesty and plausibility at its core.
- Brynjolfsson, E., & McAfee, A. (2012). Big data: The management revolution. Harvard Business Review, 90(10), 60–68.Google Scholar
- Driscoll, K. & Walker, S. (2014). Working Within a Black Box: Transparency in the Collection and Production of Big Twitter Data. International Journal of Communication, 8, 1745–1764.Google Scholar
- Einav, L., & Levin, J. D. (2014). The data revolution and economic analysis. In J. Lerner & S. Stern (Eds.), Innovation policy and the economy (Vol. 14, pp. 1–24). Chicago: University of Chicago Press.Google Scholar
- Eubanks, V. (2014). Want to predict the future of surveillance? Ask poor communities. The American prospect. Retrieved from http://prospect.org/article/want-predict-future-surveillance-ask-poor-communities.
- Manovich, L. (2012). “How to follow software users? (Digital humanities, software studies, big data),” at http://lab.softwarestudies.com/2012/04/new-article-lev-manovich-how-to-follow.html. Accessed 22 July 2014.
- Rodriguez, G. (2013). Edward Snowden interview transcript full text: Read the Guardian’s entire interview with the man who leaked PRISM. Policymic, at http://www.policymic.com/articles/47355/edward-snowden-interview-transcript-full-text-read-the-guardian-s-entire-interview-with-the-man-who-leaked-prism. Accessed 12 July 2015.
- Rudin, C., Dunson, D., Irizarry, R., Laber, H. Ji, E., Leek, J., McCormick, T., Sherri Rose, Schafer, C., van der Laan, M., Wasserman, L., & Xue, L. (2014). Discovery with data: Leveraging statistics with computer science to transform science and society. A Working Group of the American Statistical Association.Google Scholar
- Savin-Baden, M. (2015). Rethinking learning in an age of digital fluency is being digitally tethered a new learning nexus? London: Routledge.Google Scholar
- Sclater, N. (2014). Learning analytics: The current state of play in UK higher and further education. Bristol: JISC.Google Scholar