1 Introduction

Video is a central tool in mathematics education, especially for studying classroom practice. Schoenfeld (2017) has reflected how video data makes it possible, for example, to see what the participants are attending to and how their representations evolve over time; video is thus irreplaceable for his research on mathematical problem solving and teacher thinking over multiple decades. Clarke and Chan (2019) discussed video research through the metaphors of window, lens, and mirror. Video can be seen as a window to look into classroom practices as they are, or it can be seen as a lens through which the researcher focuses on a phenomenon of interest. As a mirror, the video can be used in professional development for teachers to reflect on their own practices. Clarke and Chan (2019) also warned that the researchers may use the video as a distorting mirror, providing an image that reflects researchers’ pre-conceptions.

Video itself is in many ways raw data. By this we mean that video as data is relatively independent of the research question and, as such, is primarily a window. Of course, the positioning of cameras and microphones is influenced by the research questions, zooming in to what the researchers find relevant. Yet, it is possible to revisit the recorded episode from multiple perspectives, applying different theoretical frameworks to analyze the same data in order to address different research questions (see, e.g., Chan & Clarke, 2019b). Schoenfeld (2017) wrote about reviewing videos multiple times until he gained intuitions for what was happening when students struggled with problem solving, before he had developed the measurements for documenting the findings. Ferguson et al. (2019) compared working with rich video data to the ethnographers’ work, where the videos serve as the field to which they can return in order to deepen their ethnographic understandings.

Some characteristics of video make it especially suitable for specific research questions. First, the video is audiovisual material, capturing both sounds and visual features of the situation, thus making it good material for multimodal analysis focusing not only on language, but also on material resources and bodily actions (Davidsen & Vanderlinde, 2014; Ferreira, 2021; Sinclair & de Freitas, 2019). Second, video captures processes. Learning in classrooms is essentially a social process, and it has been theorized in mathematics education, for example, as socio constructivism (Cobb & Yackel, 1996), social interactionism (Bauersfeld, 1980), and the theory of didactic situations (Brousseau, 1997). There is extensive research on classroom dialogue, showing that it is mostly the teacher who talks in the class, and that the dialogue is usually structured as initiation–response–feedback (e.g., Howe & Abedin, 2013). However, more recent research has also addressed non-verbal interaction, such as gestures (Lim, 2021), and looking at what others are doing (Pruner & Liljedahl, 2021). Third, especially by using eye tracking, it is possible to reveal mental representations and assess subconscious aspects of mathematical thinking (Strohmaier et al., 2020).

Socio-emotional processes are essential in collaborative learning (Näykki et al., 2014). Socially shared regulation of learning consists of the joint regulation of cognition, metacognition, motivation, emotion, and behavior (Panadero & Järvelä, 2015). To study social interaction, we need to study it in the actual classroom, as the classroom is a specific social setting with its own social, mathematical, and sociomathematical norms (Cobb & Yackel, 1996). Moreover, it is important to include gestures, glances, body movement, and prosody into the analysis of fine-grained multimodal human communication, as they are essential aspects of the communication that takes place in mathematics classes (Radford, 2008). In mathematical communication, mathematical signs appearing in speech, gestures, drawing, and written sentences (Steinbring, 2006) also have a specific role. Yet, multimodal communication between mathematics teachers and their students has been studied only relatively recently (e.g., Arzarello et al., 2009; Radford, 2008).

Current mathematics education research focusing on multimodality and body in collaborative learning processes has identified research gaps and needs to develop methods. For example, Ferreira (2021, p. 1468) has pointed out that collaborative learning “is not confined to students’ heads and dependent only on verbal dialogues”, highlighting that learning is dependent on both social structures and the bodies involved, concluding that such complexities of learning are “yet unexplored.” Moreover, Sinclair and de Freitas (2019), in their review of studies about body in mathematics, emphasize “the need to develop research methodologies that are adequate to the complex socio-material practices” (p. 228), specifying that the “methodological challenges of studying different scales of bodily mathematical activity—brain, gut, gesture, eye, gender, race, geolocation—is felt across different research approaches” (p. 229).

Advancement of video technology has provided new opportunities for video research methods, allowing researchers to address many of the issues raised above. Typical designs have moved from single camera to multiple cameras, improved optics allow higher quality data and zooming to see more in detail, wearable cameras allow researchers to take the participants’ point of view, and multiple microphones allow high quality audio from multiple discussants. All this allows us to study interactions from new perspectives and to follow multiple, simultaneously happening interactions rather than focusing only on the teacher or some selected students. Such advancements in research technology pose several practical and methodological questions.

This is a reflective paper, focusing on experiences with methodological affordances and challenges when working with video data. Our reflection is conducted in light of extant literature, which we illustrate with examples from the Social Unit of Learning (SUL) data from the Science of Learning Research Classroom (SLRC) facility at the University of Melbourne (see Chan & Clarke, 2019b), and from our own MathTrack research project at the University of Helsinki (see Haataja et al., 2019, 2021). We have compared and contrasted the two research settings and their use of advanced video tools. We have focused on aspects that are not specific to only one technical solution but appear relevant for using complex video technology in natural settings in general. Consequently, we have focused on issues around data collection, analysis and management posing the following research question: What affordances and challenges does video research on multimodal social interaction pose in terms of methods, data management, and ethics?

2 Context: two video research settings

The Science of Learning Research Classroom (SLRC; https://pursuit.unimelb.edu.au/articles/high-tech-classroom-sheds-light-on-how-students-learn) at the University of Melbourne was specifically designed for studying what happens in classrooms. It looks like a typical school classroom, but it is fitted with several high definition microphones and video cameras making it possible to follow every student in the classroom. Behind a one-way window, researchers and technicians can control the cameras, send audio messages to the teacher’s earbud, and prepare stimulus videos for interviewing the teacher and students immediately after the lesson (Chan & Clarke, 2019b; Chan et al., 2019; Ferguson et al., 2019).

The Social Unit of Learning (SUL) was a research project investigating social aspects of interactive problem solving in Australia and China. The aim of the project was to chart how the social interaction that is fundamental to collaborative problem solving can be optimized. The Australian video data were collected in the SLRC, with eleven classes of 7th grade students working on mathematical problems individually, in pairs, and in groups of four. The teachers scaffolded the students’ work during sessions. In addition to the video data, student questionnaires about experiences of the session and teacher interviews were collected (Chan & Clarke, 2019b, 2020; Chan et al., 2018).

MathTrack is different from SUL in that the researchers went to seven in-field 9th-grade mathematics classrooms to collect data. In the MathTrack research setting, the special focus was on a small group of four student volunteers and the teachers all wearing eye-tracking glasses. Three stationary video cameras and several microphones recorded the actions and conversations of the students and the teachers. To record the students’ moment-to-moment working processes we used smartpens that recorded students’ writing and screen capture videos that recorded students’ work on computers. Most importantly, we had five sets of mobile (i.e. wearable) eye-tracking devices recording the eye movements of the teacher and the four focus students. The eye-tracking device consisted of two eye cameras, a scene camera, and simple electronics attached to 3D-printed eyeglass frames, and corresponding software (Toivanen et al., 2017). The four focus students and the teacher also wore backpacks to carry the laptop computers that processed the data. This enabled them to move freely in the classroom.

From the beginning, the MathTrack study was designed to allow comparison with data from SUL. Hannula and Clarke discussed the study design at several stages of the project planning. As a result, MathTrack used the same instructional design as SUL: the students worked on the task first individually, then in pairs, in groups of four, and finally the students presented their solutions for a whole class discussion. During this process, the teacher was instructed to engage in activating guidance, using questions and not revealing the key idea of the problem (Hähkiöniemi & Leppäaho, 2012; Laine et al., 2018). The main difference from SUL was that in MathTrack, the students worked with one task, while in SUL there was a new problem for each phase (alone, pair, group).

3 Complexities of advanced video research

In this section, we compare and contrast the two types of research settings using advanced video tools. We identify issues about methods, data management, and ethics, which are not specific to only one contextual or technical solution but appear relevant for using complex video technology in general.

3.1 Collecting and analyzing video data on social interaction

We begin the reflective comparison with a focus on issues around methods. Both research settings aim to capture the multimodal social interaction in an ecologically valid setting. We discuss the solutions regarding the setting, data collection, and data analysis.

Research on social interaction has only recently acknowledged the importance of multiple sensory modalities and material resources (see, e.g., Mondada, 2019). The two research settings, SUL and MathTrack, were both designed to study multimodal social interaction at an advanced level. Both settings enable the investigation of the moment-to-moment interaction that constructs the classroom relationships (Pennings et al., 2018) and influences students’ learning, engagement, and outcomes (McCluskey et al., 2017).

To study social aspects of learning, it is important to acknowledge the socio-historical nature of classroom interactions as students bring their social histories to the class, and classrooms generate their own shared understandings and discourse (Hannula, 2012; Sherin, 2002). One key design element for both SUL and MathTrack was to aim for high ecological validity, i.e., to study learning behavior in settings that correspond to ordinary classrooms. In both settings the social learning context is familiar to the students and teachers, as they participate with their authentic classmates.

Yet, both settings compromise the naturalness of classrooms. In SUL, the students were temporarily removed from their school and placed in a research classroom in a university facility. Therefore, they were not seated as usual and did not have access to the materials that are normally present in mathematics classes, such as mathematics textbooks. The design of SLRC allowed all but one of the researchers and technicians to stay away from the class, behind a one-way transparent window (Chan & Clarke, 2019b). The cameras and microphones were strategically positioned to capture everything that happened in the class.

In MathTrack, the students stayed in their ordinary classroom, but the researchers came there with plenty of highly visible equipment (Fig. 1). Visual markers were placed on boards and attached to eye trackers.

Fig. 1
figure 1

MathTrack research setting

Despite such obvious visual distractors, previous research showed that the reactivity to the presence of eye-tracking equipment does not harm the reliability of the data significantly. The teachers tend to forget about the presence of the eye trackers and researchers quite quickly after the lesson has started (Praetorius et al., 2017) and do not pay attention to them anymore (Haataja et al., 2019). In MathTrack, we asked the students to reflect on how they experienced the data collection lesson, and none of them reported that the equipment or the presence of the researchers affected their behavior or learning significantly. Some compared the experience to watching a 3D-movie: as soon as the action began, they forgot the goggles.

Advancing research on classroom interaction faces two conflicting challenges. On the one hand, the classroom is rich in interaction and it is a daunting task to capture every communicative act in the classroom—including the back row whispers and subtle gestures. On the other hand, there are unending possibilities of capturing more details of each individual in the class. In addition to their utterances, we might also record and analyze their gestures, body positions, facial expressions, prosody, and gaze direction in minute detail. Both SUL and MathTrack allow the studying of interaction processes in detail, utilizing data for deep and comprehensive analyses on momentary interactions during mathematical problem solving (e.g., Chan & Sfard, 2020; Haataja et al., 2019; Salminen-Saari et al., 2021). The SUL data are distinctive in the quality and amount of data on verbal and nonverbal peer interactions, allowing examination of peer interactions between individuals and groups. The multiple video cameras and microphones enable not just cross-individual but also cross-group reflection in the analysis. These peer interactions have been studied from many perspectives, such as the use of materials (Moate et al., 2021), student positioning (Zhang et al., 2019), and shared cognition (Clarke & Chan, 2020). However, in the data collected in the SUL project, the teacher’s role was limited, and therefore the data may not be optimal for analyzing teacher-student interactions. In contrast, while the MathTrack data do not cover all students in the class, they provides opportunities to investigate teacher and student behavior in more detail. Specific attention was paid to the role of students’ written work in classroom interactions and the coordination of attention in collaborative work. One of the three stationary video cameras followed the teacher throughout the lessons, which enabled exploration of the verbal and nonverbal teacher-student interactions as a part of teachers’ pedagogy (Haataja, 2021). In both settings, the teachers were able to ask instructions from the researchers during the lessons.

Another critical issue in mathematics education research is the choice of tasks that generate interesting learning behavior. Tasks are the main vehicle in mathematics instruction (Shimizu et al., 2010). For example, engaging students with open-ended tasks may promote mathematical creativity (Molad et al., 2020). In both settings, the mathematical tasks used were selected by the researchers instead of the teachers and may not have represented a task type typical for each teacher’s pedagogy. Using collaborative problem solving and open-ended problems in mathematics education may have been new to the teachers and as well as the students. However, controlling the learning content and materials enabled the comparison of the learning processes of the participant groups (e.g., Chan & Clarke, 2019b).

Video research also often utilizes data other than video files (Otrel-Cass & Antonsen, 2018). For instance, drawings have been utilized to gain insight into the development of mental models (Katz et al., 2011). These data can complement video data in valuable ways, but may require additional planning in order for researchers to exploit fully, the opportunities that the mixing of data provides. In MathTrack, it was important that the problem-solving process produced drawings aligned with the students’ visual attention beyond mere verbal discussion. We used smartpens or screen capture to record the students’ processes of writing and drawing, while in SUL the outcome was documented. In SUL, the problems selected prioritized the possibilities for the students to express their thinking verbally, graphically, and textually (Chan & Clarke, 2020). Both of the projects showed that the selection of the task is central for successful video research: the acuity and positioning of the cameras affect the design of the task whenever the students’ verbal, nonverbal, or written expressions of the learning processes are to be investigated.

Video research can be ecologically highly valid. However, its applicability or value in other contexts may not materialize automatically. Much video data are collected with the intention of producing ethnographic accounts (Otrel-Cass & Antonsen, 2018). These are often small-scale and descriptive in nature (Peters et al., 2021). At the same time, quantitative results of video research are often difficult to communicate meaningfully for a broader audience (Jacobs et al., 1999). Derry and colleagues (2010) advocate seeking a

… balance at all stages in the research process between strong theory, the need for advanced planning, and formal systems of sampling and hypothesis testing on the one side, and the need to remain open and flexible to serendipitous learning, discovery, challenging of current ideas, and progressive and iterative refinement of hypotheses on the other. (Derry et al., 2010, p. 41)

For both SUL and MathTrack, the first analyses were qualitative case studies (e.g., Garcia Moreno-Esteva & Hannula, 2015; Moate et al., 2021). This was a way to start making sense of the data. While research on SUL data has continued primarily in the qualitative dimension, MathTrack has recently applied mixed methods including quantitative analysis (e.g., Haataja et al., 2021). Even so, these were accompanied with qualitative analyses to enable valid interpretations of pedagogical and interactional aspects that are simultaneously deep probing and detailed (e.g., Haataja, 2021; see also Beach & McConnel, 2019). The nature of the video data tends to require qualitative (pre)analyses even when the research questions are quantitative. For example, the researchers can first categorize the data qualitatively, and then conduct statistical analyses with the categories (e.g., Haataja et al., 2019; Juuti et al., 2020). Quantitative analyses allow researchers to capture the patterns of gaze behavior in the classroom, whereas the qualitative analyses allow the comparison of these patterns with situational interactions that take place in the learning process.

In particular, both research groups developed new methods for analyzing the complex video and audio data. Such development was not only related to technology and data analysis; fundamental methodological questions were also addressed. The researchers in SUL, for example, developed the concept of unit of analysis to better serve multi-theoretical research. Individual research questions with distinct background theories may require different units of analysis, some of them more fixed than others, and it is the shared data setting that makes possible the commensurability and comparability of studies (Chan & Clarke, 2019a). Moreover, because of the abundant amount of data, it was important to define a focus for analysis (see also, e.g., Ferguson et al., 2019, and Schoenfeld, 2017). In SUL data, the key selection was made among the multiple modalities and phenomena that could be zoomed into (Chan et al., 2019), whereas in MathTrack, the time-consuming nature of coding gaze targets required careful selection of the time periods that were included in the analyses (e.g., Haataja, et al., 2021).

An important difference between the projects is in the level of detail for analyzing teacher and student gaze behavior. While students’ gaze direction has also been analyzed from the SUL video data (e.g., Chan et al., 2020), in MathTrack, the eye-movement data provided a detailed insight to individuals’ learning processes and momentary interaction. Specifically, the eye-movement data included exact numerical information enabling statistical analyses. By using continuous coding, the researchers can create a profound picture of peer and teacher-student interaction processes. However, the use of only four eye-tracking devices restricts the possibilities for examining the sharing of the thinking process across the whole class, which can easily be done with the numerous recordings in the SUL data. Providing eye-tracking glasses for every student is technically and economically challenging, and therefore, may not be a viable option.

Important advances have also been made in data-analysis methods. In both settings, the amount of data collected during just one lesson is so rich that it enables the use of statistical analyses and scoping in the micro-level processes of collaborative learning. However, this has required developing methods for automated pre-processing of data. The SUL lab pioneered the use of multimodal learning analytics (MMLA) to extract features such as student gaze direction, student posture, teacher position, student talk, and teacher talk from the video (Chan et al., 2020). This has improved the quality of the coding in terms of reliability and consistency, and enabled new types of analysis thanks to the features extracted with MMLA. Moreover, these features were combined to detect automatically high-level constructs such as attention to teacher speech, teacher attention, student concentration during individual task, and engagement during pair and group work. The power of such analysis is highlighted by the fact that some of these constructs were previously inaccessible or hardly accessible. More generally, case studies on classroom video data enable reflections on the interplay of the micro-level (interactional utterances) and macro-level (whole lessons or interpersonal relationships) phenomena in mathematics classroom discourse (Haataja, 2021; Pennings et al., 2018).

In MathTrack, similarly rich and vast amounts of data had to be coded as well. To facilitate the process, we developed techniques for the automatic coding of the gaze videos, such as the use of visual markers in the learning environment to identify gaze targets automatically (Hannula et al., 2019). Further analytical tools such as scanning signatures (Garcia Moreno-Esteva et al., 2020) were developed to synthesize the information from hundreds of gaze fixations on different targets. Moreover, gaze synchrony was developed as a measure to explore the amount of joint attention as an indicator of collaboration (Salminen-Saari et al, 2021).

Together, the MathTrack and SUL researchers further developed the use of continuous coding on student agency to address the variations in the group-level peer interactions (Haataja et al., manuscript submitted for publication). Both research settings enable and benefit from international collaboration of researchers using multiple theories and requiring a wide range of analytical methods. Simultaneously, these projects can build understanding of the social aspects of learning that can only be reached with a creative combination of traditional and pioneering approaches.

3.2 Management of complex data

The rich data posed challenges to data management. Data management is necessary in all research, and many institutions and funders require that researchers create data management plans. Data management involves description of the nature and type of data, including using standard ways of description (metadata), identifying software used or developed, ways of ensuring data integrity and identification of associated risks, storage of data, open data and control of access, and the responsibilities in data management among the parties involved. In MathTrack, the technology was still in the development phase, which means that the data processing practices were developing together with hardware and software development, thus making it difficult to provide a detailed data management plan in advance.

For both settings, data management is an issue. The amount of data generated for each session is large. In MathTrack, we stored for each lesson altogether 28 video files, including stimulated recall videos, screen captures, and the raw data from ten cameras recording eye movements. Because of the large number of video files, such projects need more data storage space than usual in educational sciences. Furthermore, as both projects collected multiple channels of data, it was important to make sure that the videos were synchronized. In SLRC, this issue was resolved in the process of designing and building the special classroom. For MathTrack, we had to synchronize the camera clocks before each recording. In both settings, physical signals (e.g., clapping hands) were used at the beginning of the recording to enable checking that all videos were synchronized. Furthermore, in both settings, the data included students’ written work and stimulated recall interviews, requiring the synchronization of, for instance, the students’ processes of drawing solutions with the video recordings.

The complexity of a natural classroom environment brings challenges in supervising the functionality of the technologies. For example, noticing malfunction of an individual camera is a challenge in an environment with several technological tools, numerous people, and complex interactional settings. In SLRC, the technicians were able to supervise the data collection from a monitoring room outside the classroom, whereas in MathTrack the researchers were in the classroom throughout the data collection, monitoring the stationary cameras. However, when an eye-tracking device was malfunctioning, researchers noticed it only after the lesson, leading to some incomplete data. This makes evident the necessity of data management throughout the collection—not something that takes place post data collection.

Having a novel, highly technical setting requires specialized technical staff. In SLRC, there were two experienced technicians in the backstage all the time, controlling the audio and cameras according to researcher wishes. They were able to provide video for stimulated recall interviews immediately after the session. In addition, MathTrack also required specialized technical expertise. This was provided by Miika Toivanen, who had been one of the developers of the mobile eye-tracking glasses and the related software under the auspices of a prior employer before working in the MathTrack project. Collecting and post-processing the eye-tracking data was his special expertise, which he taught to other researchers in the project. The necessity for highly specialized technical expertise can make research vulnerable, and requires plans for how to ensure continuity in terms of competence and knowledge development.

Data management converges with ethical issues as it involves consideration of sensitive data, that is, data that, if disclosed, could induce harm. Furthermore, data management also includes taking note of legal compliance. In the European context, the General Data Protection Regulation (GDPR; see European Parliament & Council, 2016) came into effect in 2019 and required researchers to pay increased attention to what personal data are to be collected, how research participants are informed about their rights regarding data, and how personal data are stored.

Personal data involves all information based on which an individual is identifiable directly or indirectly. Of direct identifiers, picture and voice are available in video data. Of indirect identifiers, gender and approximate age are available. Sensitive personal information, also called ‘special categories of personal data’, include data revealing racial or ethnic origin or religious beliefs (Finnish Social Sciences Data Archive, n.a.). Such information can be available through video data. In addition, behavior, which provides sensitive data, is readily accessible through video research. Anonymization of multimodal data without compromising the quality of the data can be challenging (Siegert et al., 2020). Full anonymization is not even possible if personal physical identifiers, such as close-up images of the iris, constitute essential data. This was the case in MathTrack, where each eye was recorded on video for computing gaze direction and fixation durations.

Researchers should avoid collecting unnecessary data in order not to burden research participants beyond what is necessary, and to reduce anonymization needs. Later use and sharing of video data may require anonymization, which can involve a substantial effort, and therefore it is wise to consider how much and exactly which data are needed. The GDPR prohibits collecting unnecessary personal data. However, the principle of minimization of data poses a challenge for multimodal research (Siegert et al., 2020). It is in the nature of video data that it is not predictable in the sense that one will know exactly what information participants end up disclosing. Also, the data may not align with predefined categories of information outlined in a privacy policy description. It may also be difficult to estimate the point of saturation of the data (see Creswell & Miller, 2000; Hennink et al., 2019) in advance, and analyses may not keep up with data collection to indicate when might be a good point to stop collecting further data. In SLRC, it is possible to record anything said or done in the classroom. In MathTrack, only the teacher and the focus students were followed. Yet, with eye tracking, it was possible to record and later view what the participants had looked at. Inevitably, there will be an inclination rather to collect ‘more’ than ‘less’ data. These features of video research can make the principle of minimization challenging to abide by.

While transfer of data among parties is possible, transfer outside the European Union requires planning as GDPR compliance involves that research participants are informed of transfer in a privacy policy description. Furthermore, making video data openly available for all researchers further entails a number of questions at the crossroads of data management, legal compliance, and research ethics, often leading researchers not to open their data for others to explore (Siegert et al., 2020). While it may be possible and nowadays facilitated by dedicated journals, in the Finnish and Australian context opening video data requires extremely careful planning, informing and consent procedures—and opening may still not be feasible due to potential risk or harm to participants (see Rutanen et al., 2018).

3.3 The ethics of unpredictability, exposure, and bodily experience

The methodological aspects pertaining to technology, data collection, and the nature of data in video research also entail ethical questions. Methods such as video research and eye tracking did not exist at the time when ethical principles were initially taking shape as a result of the Nuremberg trials. While the principles provide a compass, researchers nowadays must consider how to embody ethical principles when using the methods, tools, and environments available today. Implementation of novel technological solutions always requires thorough ethical scrutiny. For a recent review of ethical guidelines for video research we refer the reader to Everri et al. (2020). The authors concluded that while general ethical guidelines merge on participant protection, there is still surprisingly little guidance directly related to video research (for a similar conclusion related to the discrepancy between the advances of technology and the state of guidelines, see Legewie & Nassauer, 2018). Consequently, individual video researchers may benefit from equipping themselves with their own (emphasis original) explicated guidelines around informed consent, participant rights, and establishing rapport and reflexivity (Everri et al., 2020). These perspectives materialized in the research conducted in SLRC and MathTrack, the latter adding an element of bodily experience through the eye-tracking device.

The methods used in SLRC and MathTrack generate multifaceted data, and open up unprecedented questions for researchers who delve into the data (see also Cope & Kalantzis, 2016). Accurately describing the intended research to participants, their guardians, and ethics review boards, while acknowledging that data potentially affords yet un-envisioned paths for the project, is a challenge that materializes in both eye-tracking and video research (see also Everri et al., 2020). While participation in both settings is based on informed consent, it is worthwhile to question to what extent participants can ever be fully informed. This applies especially to video research in naturalistic settings (see also Legewie & Nassauer, 2018). Furthermore, researchers may ‘see’ more in the data than participants may anticipate. The object of a participant’s gaze, when exposed to researchers through eye-tracking technology, could be disconcerting for the research participant. This bears consequences for the reporting of the data, and emphasizes the importance of considering carefully how and whether to analyze and report, for instance, inappropriate gazing or unsuitable behavior caught on recordings. For this reason, researcher reflexivity is essential in video research; it involves analyzing the fieldwork relationship and scrutinizing the researcher’s own relationship with the produced material (Pink, 2004) while maintaining a respectful approach towards participants (Derry et al., 2010).

While the ethical principles guiding research are more similar than different in the two settings (see Australian Research Council Statement on Ethical Conduct in Human Research, 20072018; Finnish National Board on Research Integrity, 2019), the requirements regarding ethics review differ in the two contexts. Research involving researcher interactions with human beings or collection of personal data require an ethics review in the Australian context (MGSE HEAG, 2016). This is not, per default, the case in Finland, where an ethics review is required when certain issues materialize, such as intervention in the physical integrity of research participants, including deviation from informed consent or parental consent, exposure to exceptionally strong stimuli, mental harm, or security risk (Finnish National Board on Research Integrity, 2009/2019). The Australian procedures distinguish between different levels of risk involved, and differentiate the ethics review process accordingly. In the Finnish context, studies involving risk must be reviewed, but the process does not differentiate levels of risk, exposing all research to the same procedure. The option of a ‘lighter’ procedure for less risky designs is not available. The thinking in Finland emanates from the idea that the ethics review is in place to support research involving the ethically most challenging designs, whereas the Australian practice emphasizes the necessity of ensuring that all research adheres to ethical standards. The protection of individuals is, in the end, a key concern in both contexts, but the view on how institutions best support researchers in this important and sometimes difficult task differs.

In MathTrack, the question materialized as to whether or not the eye-tracking device along with its wearable backpack was considered an intervention in the physical integrity of the research participant. In Finland, this question is fundamental, as it determines whether an ethics review is required. The study was subjected to ethics review; however, the ethics review board outlined that the set of devices used in MathTrack did not constitute an intervention in the physical integrity of research participants. MathTrack became a precedent facilitating the re-definition of ‘intervention in physical integrity’ using new technology. In absence of a more precise definition, the board interpreted that intervention in the physical integrity of participants takes place if participants cannot free themselves from the devices within a reasonable time. In 2019, when the Finnish National Board on Research Integrity revised the national guidelines for ethics review for non-medical research involving human participants, this definition, originating from the questions raised through the ethics review of MathTrack, was added. With the ever-developing technology increasingly utilized by researchers in social and behavioral sciences, it had become inadequate to talk simply about intervention in the physical integrity of participants without considering the nature of the ‘intervention’ with technology, some of which may be more invasive in terms of privacy or bodily experience than others (e.g., Duru, 2018).

The definition originating from the MathTrack ethics review does not per se take a stand on the technology itself, rather its restrictiveness on the participants’ autonomy in a physical sense. As such, it is a useful definition helping researchers and ethics review boards to consider the ethical aspects of participant experience. In a laboratory setting without the use of wearable technology, restrictions of physical freedom tend not to materialize. However, this nevertheless raises the question, to what extent a lab space, such as SLRC, may restrict participants to move freely in and out of the space and of the research.

A difference in the two settings, bearing ethical implications, is the presence of the researcher. A more visible presence as in the case of MathTrack may interfere more with the students’ activities than a lab setting in which most researchers observe behind a non-transparent screen, as in SLRC. However, students may disclose more undesired behaviors, or behaviors of which they are unaware, when not reminded of the research participation by the presence of researchers. In addition, in MathTrack, while the use of technology was not considered physically intrusive, and students did not report the experience wearing the equipment as obtrusive, asking participants to wear a mounted device still calls for ethical reflection on the bodily experience that participants become exposed to, and which may vary from one individual to another (see Duru, 2018).

4 Conclusions

Clarke and Chan (2019) discussed video research through metaphors of window, lens, and mirror. Both SUL and MathTrack can be seen as windows into classroom practices. Yet, it might be more appropriate to consider them as different lenses, the MathTrack zooming in on the micro interaction in the scope of seconds and the SUL being the fish-eye lens capturing 360 degrees of what is happening in the class. To add yet another metaphor, SUL can be likened to the modern ‘panopticon’, where the researcher can see and hear the teacher and every student while they are not knowing exactly when they are being observed. Such complete observation enables observing and recording each act of communication in the classroom, be it verbal or non-verbal. In metaphorical terms, MathTrack, on the other hand, is akin to ‘mind-reading’, where the researchers can analyze eye-movements to reveal even those attentional tendencies that are non-conscious. While there is much overlap in what the two research settings can capture, there is also complementarity where each of the settings can access some aspects of human interaction better than the other setting. Together, SUL and MathTrack allow methodological and theoretical triangulation that can lead to deeper understanding of the social aspects of learning.

Video data is irreplaceable for research on learning behaviors (Schoenfeld, 2017). The multimodal analysis requires moving beyond just verbal interactions, to see how the learners interact with each other and with their learning environment through their different senses (Ferreira, 2021; Sinclair & de Freitas, 2019). In a recent study, Pruner and Liljedahl (2021) used video recordings to study problem solving in a collaborative learning setting, addressing also student attention to other groups than their own. They suggested that future studies might

capture the problem solving of an entire class by recording each group’s work in video and producing a gaze-dialogue transcript for a full class. Instead of noticing that a group or individual is attending to a different group’s work, it would be interesting to be able to document whose work is being attended too, and to what degree are resources being distributed. (p. 768)

SUL and MathTrack provide data that suit the studying of some of these new questions and in doing so they have pushed the envelope in what video research in education means.

The examples of MathTrack and SLRC show that building an advanced video research setting is both a technical and organizational challenge. The research setting requires specific technical expertise, but also the continuation of work is dependent on continued funding in order to retain the expertise. The technical infrastructure requires solutions for data storage and maintenance of devices.

While video is already well established in research, the new multi-camera research settings pose new opportunities and challenges for data analysis. Detailed video data provide a foundation for analyzing multimodal classroom interaction in detail. Yet, the amount of data and the required amount of manual labour can be paralyzing for human researchers. Machine learning and other existing software make it possible to preprocess raw video data with clever algorithms, reducing the amount of manual work. In order to analyze such amounts of video data that can be considered representative in any sense, such advanced tools are necessary. Even with a rather limited number of cases, MathTrack has been able to make meaningful statistical analyses focusing on within-person comparisons (e.g., Haataja, et al., 2021). When we focus on events that are sufficiently frequent, it is possible to analyze statistically how a person acts or reacts across similar situations in comparison to a different set of situations.

Both settings offer possibilities for professional development by making visible something teachers are not able to see while teaching, serving as a mirror (Clarke & Chan, 2019) for reflecting their practice. SLRC can provide very detailed videos about pupils’ behavior during the lesson while MathTrack can give teachers a possibility to look at both students’ and teachers’ visual attention. Such video recordings give the teachers opportunities to reflect on interactional aspects, analyze the connection between pupils’ actions and teacher’s behavior, and therefore understand teaching and learning more profoundly. Ideally, the teachers could reflect on their own lessons and see how their students act during the lesson. Alternatively, analyzing different types of interactional situations could enhance teachers’ understanding of how their pedagogical behavior and intentions are related to student responses to them. Furthermore, there is much potential in using data generated through video research in teacher education.

Ethical considerations in video and eye-tracking research are multifaceted and call for revisiting the ethical premises from time to time. The potential and unpredictability of data—be it panopticon or mind-reading—may cause a tension between providing sufficient information about the research and obtaining fully informed consent from research participants. Sensitivity towards participants in reporting results is crucial. Furthermore, different emphases in national and institutional ethical guidelines and procedures may challenge researchers working across contexts. However, exposure to different practices always provides a fruitful moment for reflection on, and gaining a deeper understanding of, not only the 'other' system, but also of one’s own.