Keywords

9.1 Data Science

Most CCI and learning technology studies are conducted on small groups of participants, often from a homogeneous context (e.g., the same school or a similar background). With the emergence of online education and learning-at-scale technologies (e.g., MOOCs, LMSs, ITSs, open courseware, and community tutorial systems such as Stack Overflow), millions of participants in different parts of the world and from different backgrounds can engage with CCI and learning technology systems. New forms of data require new methodologies. As we have described in this book, a classical approach in a CCI and learning technology study would involve some dozens of end-users participating in each condition and would apply hypothesis-testing analysis (e.g., t-tests or ANOVAs). Since the datasets (and the respective data points) would be small, only large effects would be detectable, and so significance would imply relevance. On the other hand, if the number of students is large, we could easily end up rejecting the null hypothesis and detecting an effect that is irrelevant in practice (Kidzinski et al., 2016).

As mentioned above, in CCI and learning technology research, typical data analysis techniques (e.g., analysis of variance, correlations, and regressions) are usually employed to explore the RQs and test the hypotheses, where the formulation of the RQs and the hypothesis formation are guided by previous work and/or theories. However, when dealing with massive amounts of data (e.g., from MOOCs or LMSs) or rich multimodal data (e.g., video, eye-tracking, or other sensor data), different statistical analysis techniques need to be employed (including predictions and classifications). Given that learning/educational scientists and designers are often unfamiliar with contemporary modeling techniques, this has prompted an increasing number of computer scientists, statisticians, and data scientists to engage with CCI and learning technology research. In many cases, because of the nature of the problem and the data (e.g., online learning), contextual knowledge (e.g., how someone is using YouTube or Stack Overflow in their learning) is either not relevant or cannot be captured (e.g., in a MOOC). In such cases, we see research initiatives in CCI and learning technology that seek to address problems in the absence of contextual knowledge.

This type of decontextualized and large-scale experimentation in CCI and learning technology research lies outside the scope of this book. However, we would like to emphasize that exploratory data analysis techniques (Tukey, 1977) can be useful, particularly for finding an adequate data transformation and for outlier detections. Explorations of this type can bring new insights and hypotheses and eventually close the cycle (see Fig. 9.1). For those interested in how to employ advanced data science and machine learning (ML) techniques in the context of learning, we provide elsewhere a mini-tutorial on methodologies for forming and testing hypotheses in large educational datasets (Kidzinski et al., 2016). We also present practical guidance for building data-driven predictive models with state-of-the-art ML methods, using the R and CARET packages because of their simplicity and the ease of access to the most recent ML methods.

Fig. 9.1
A flowchart relating research problems, pattern recognition, data collection, and information retrieval.

Data-driven CCI and learning technology in at-scale contexts. (Adapted from Kidzinski et al., 2016)

9.2 Artificial Intelligence

Artificial intelligence (AI) in CCI and learning technology research is traditionally represented by AI in education (AIED), intelligent user interfaces (IUI), and the ITS communities, and involves a wide spectrum of technologies and approaches. In recent years, we have seen AI technologies and approaches employed in almost every CCI and learning technology community. Since the 1980s, researchers have been interested in the association between learning and AI, although initially this mainly meant a focus on knowledge representation, reasoning, and learning (Self, 2015, p. 5). Russell and Norvig (2021) have described AI as a technology that includes problem solving, representation, reasoning on the basis of certain/uncertain knowledge, ML, and communicating, perceiving, and acting techniques for designing and developing intelligent agents. More recently, we have seen various developments in sensing technologies, analytics, and visualization, as well as cognitive technologies and architectures that have boosted the use of AI to support teaching and learning. The International Journal of Artificial Intelligence in Education (IJAIEDFootnote 1) describes the focus of the AIED field as the development and design of AI-powered computer-based learning systems, including agent-based learning environments, Bayesian and other statistical methods, cognitive tools for learning, intelligent agents on the Internet, natural language interfaces for instructional systems, and real-world applications of AIED systems.

The topic of AI and advanced data science techniques in education is not central to this book; nevertheless given recent advances in data science, this book would not be complete if we did not introduce the reader to these advancements. Drawing from a recent literature review on AIED (Chen et al., 2020), we see that contemporary AI learning systems incorporate various techniques and technologies, such as recommendations, knowledge understanding and ML, data mining, and knowledge models (Avella et al., 2016). There are three main components of an AI-powered learning system: the educational data collections from learners’ and teachers’ activities, the techniques or modeling employed (e.g., knowledge inference or ML), and the system’s intelligence as expressed through different intelligent technologies (Kim et al., 2018). Figure 9.2 shows how these three components work together to enable AI functionalities in the learning system.

Fig. 9.2
A chart represents the A I powered educational systems on the basis of three types of data, student, teacher, and course.

Representation of AI-powered educational systems, which consists of the data collections layer (e.g., educational and interaction data), the modeling techniques which developing different intelligence based on the data collections, and the system’s intelligence part that provides the technologies needs to provide the intelligence as a service to the user

As Fig. 9.2 makes clear, the quality of data collection is of paramount importance if an AI learning system is to operate efficiently. In the context of CCI, we see children’s toys evolving through advances in embedded electronics, digital capabilities, and wireless connectivity that combine different capabilities such as networking, processing, and intelligent reasoning. As we see from a recent IJCCI special issue in AI and CCI,Footnote 2 the increasing use of such interactive objects in CCI and the rise of AI techniques through data-driven methods reinforce intelligent features and adaptivity, but they also bring many significant privacy issues and ethical concerns.

In summary, AI technologies can amplify different areas of human abilities, including physical, memory, perception, cognition and learning (Shneiderman, 2020). Examples of technologies that leverage AI to amplify human abilities are, information representation/ awareness/ reflection technologies (e.g., dashboards), in-situ human-computer interaction technologies (e.g., augmented reality and ubiquitous displays), and technologies with implicit and adaptive control (e.g., gaze tracking). On the contrary of autonomous AI systems that focus on replacing human decision making, those AI technologies employ the notion of “intelligence augmentation” (IA) that attempts to support human abilities (e.g., decision making, cognition) rather than replacing them. Contemporary learning systems employ different information representation and IA techniques via powerful interfaces and communication modalities (e.g., dashboards, adaptive navigation). Those interfaces and communication modalities combine various log data and provide explicit, easy-to-understand, and concise ways of presenting valuable information to support human abilities.

9.3 Sensor Data and Multimodal Learning Analytics

The use of sensors to support research on human-factors IT-related fields (especially in the context of learning) is not new. To some extent, the use of sensors (e.g., via cameras) has been central to LS research for several decades, as the popularity of qualitative video analysis indicates. However, in recent years, a proliferation of wearable and remote devices has made sensing widely available and affordable in the context of education, and a growing number of related studies have been published (Sharma & Giannakos, 2020). In addition, new methods, models, and algorithms have been developed (Blikstein & Worsley, 2016) that enable the continuous, unobtrusive, automated, and useful application of sensors during learning. Thanks to these devices and techniques, it is possible to monitor indices that are argued to be significant for learning but have often been ignored because of the difficulties of measuring and interpreting them dynamically (Giannakos et al., 2020). Despite the challenges of using sensor data, previous studies have advocated the use of sensor technologies to capture complex interactions exchanged between learners/children and the interactive systems they engage with (Giannakos et al., 2022). Work on quantified-self movement has shown potential in using sensor data to support human decision making (e.g., in relation to diet, fitness, and lifestyle), self-monitoring, self-awareness, and self-reflection (Qi et al., 2018), as well as potential in learning technology (Giannakos et al., 2020) and CCI research (Lee-Cultura et al., 2020).

Research on collecting, pre-processing (e.g., data “cleaning”), synchronizing, and analyzing sensor data streams can be found in neighboring fields such as HCI and ubiquitous computing, with applications dating from the 1980s onward (Weiser et al., 1999). Sensor data has also been at the center of several learning technology and HCI communities, such as ITS (D’Mello et al., 2010), educational data mining (EDM) (Romero et al., 2010), and user modeling, adaptation, and personalization (UMAP) (Desmarais & Baker, 2012). The typical steps when using sensors include data collection, pre-processing, engineering, mining/analysis, validation, contextualization, and making sense of the results. These steps are somewhat different depending on whether there is a data-driven or a theory/hypothesis-driven approach, on the research design employed (e.g., qualitative or quantitative), and on the epistemic stance of the researchers (e.g., positivist or post-positivist) (Giannakos et al., 2022). In the last decade, there has been much discussion around the use of sensors in learning technology and CCI (Giannakos et al., 2022; Markopoulos et al., 2021), with different communities using different nomenclature to describe various facets of sensor data (e.g., sensor data in education, sensing, physiological analytics, ubiquitous data in education, and multimodal learning analytics).

In a recent chapter focusing on the use of sensor data in education (Giannakos et al., 2022), the authors described the advantages and qualities of sensor data in terms of three pillars. First and foremost, whereas computer logs enable us to capture learners’ actions in binary fashion, sensors go further in terms of richness, allowing us to capture information about learners regardless of whether they have completed an action (e.g., while watching a video but not interacting with it, or interacting with a nondigital object). Second, sensors provide temporality by being sensitive to temporal changes and giving us direct access to indices that are relevant to cognitive and affective processes. Third, instead of reductive representation of the user and learner experience, sensor data provide granularity, allowing us to capture very low-level insights and focus our analysis on different aspects. Those qualities of sensor data, combined with advances in data science and AI, can provide powerful learning capabilities. For instance, they can provide access to indices relevant to cognitive and affective processes (see Fig. 9.3, left), or they can incorporate sensor data into a learning system’s functionality (e.g., embodiment) or intelligence (e.g., affective support) via appropriate technological architectures (see Fig. 9.3, right).

Fig. 9.3
Three frequency graphs with sensory data. A chart relates interaction, sensation, sensemaking, and enhancement.

Meaningful sensor data from a child interacting with a learning technology. (From Giannakos et al., 2021; with permission by IEEE). Left: The vertical lines show the child’s response correctness (green for correct, red for incorrect), the solid red curves show the child’s indices, and the dashed-green curves show the average for the whole class. Right: The logic of a system that leverages sensor data

To summarize this chapter, sensor data have several qualities that support interaction with the technology. Many of those qualities are beneficial for learning systems and can help us to improve the effectiveness of those systems. At the same time, sensor data introduce challenges that need to be tackled to allow contemporary learning technology research and practice to realize the potential benefits. Contemporary research on sensor data and advanced computational analyses has introduced the term “multimodal learning analytics” (MMLA) and led to the formation of the a special interest group in the context of the Society for Learning Analytics Research (SoLAR).Footnote 3