1 Introduction

Collaborative learning affects achievement, attitudes, and perceptions positively (Kyndt et al., 2013). Joint attention, that is the ability to focus attention on the same thing simultaneously with other people and to acknowledge it, develops already in infancy (Corkum & Moore, 1995). This ability allows us, for example, to engage in meaningful interactions and to learn from others (Tomasello, 1995). The recent development of mobile gaze-tracking devices has made it possible to study nonverbal interaction more precisely in an authentic context. Mobile gaze-tracking allows us to view what the participant sees and focuses on. With mobile gaze-tracking it is easier to determine, for example, which gestures are relevant and catch the attention of the participant. It also gives exact information on the direction of gaze during interaction. There are studies concerning campus walks (Foulsham et al., 2011), science learning center visits (Magnussen et al., 2017), teacher attention in classrooms (Haataja et al., 2021; McIntyre et al., 2017; Prieto et al., 2017) and collaboration in student dyads (Schneider et al., 2018).

Even though many have studied collaborative problem-solving process over the years (e.g., Artzt & Armour-Thomas, 1992, 1997; Roschelle & Teasley, 1995), there is still uncharted territory left to investigate. The quality of interaction is important for successful collaborative problem solving (Barron, 2003). Gaze is an important part of non-verbal interaction, for the purposes of observing what the others are doing, and of communicating one’s own intentions (Cañigueral & Hamilton, 2019). Joint attention is one part of fruitful collaboration, and we need to learn more about how and when it happens. Mobile gaze tracking lets us analyze in detail the eye movements of participants during mathematical problem-solving processes. This allows us to pinpoint the phases of collaborative problem-solving during which joint attention arises. Joint attention allows us to assess the importance of different phases of the collaborative problem-solving process. During which phases do students share joint attention, e.g., during which phases do they learn from each other, and engage in scaffolding together.

Educational researchers have studied joint attention mostly in the context of visual attention. Joint visual attention requires a shared target of visual attention. However, mathematics is full of abstract ideas and representations. In fact, Kaput (1987) claimed that mathematics can be seen as the discipline that studies representation of one structure with another one. As such, representations are also an integral part of mathematical problem-solving. Because representations inevitably are also part of the interaction during collaborative mathematical problem solving, we might need not only to focus on visual attention but also to acknowledge the role of multiple representations when studying joint attention in the context of mathematical problem solving. In this paper, we examine the nature of joint attention in interactions that evolved in doing mathematics. We then examine which phases of collaborative problem solving occurred during and right after joint attention.

1.1 Phases of mathematical problem-solving processes

For problem-solving, we use the definition commonly used in mathematics education: problem-solving happens when the solver does not know how to carry out a task with familiar or routine procedures (Schoenfeld, 1983). In collaborative problem solving, the collaborators need to build a shared space of understanding in a joint problem-solving space (Roschelle & Teasley, 1995). A joint problem-solving space includes socially negotiated sets of knowledge elements, such as goals, problem-state descriptions, and problem-solving actions (Roschelle & Teasley, 1995). Roschelle (1992) argued that one of the key factors in the creation of a joint problem-solving space is the presence of repeated cycles of displaying, confirming, and repairing understandings.

Whereas the individual problem-solving process can be described as cyclic (e.g., Carlson & Bloom, 2005; Pólya, 1945; Schoenfeld, 1985), the collaborative problem-solving process is more unpredictable (Artzt & Armour-Thomas, 1992). In collaborative problem-solving, individuals bring ideas into a collaborative space. The group constructs knowledge via interaction (Roschelle, 1992). The interaction helps the individual through ideas, structuring the problem, and verifying the correctness of the plausible solutions. However, it also disrupts the cyclic process of the individual (Artzt & Armour-Thomas, 1992).

Various researchers have identified somewhat similar, yet different phases in a problem-solving process. Table 1 shows these phases according to various authors, together with our version, and their relation to each other.

Table 1 Phases of problem solving

The earlier model for studying collaborative problem solving by Artzt and Armour-Thomas (1992) was based on Schoenfeld (1985)’s framework, which in turn was founded on Pólya’s (1945) model. Based primarily on Artzt and Armour-Thomas’s (1992) framework, but also adopting ideas from other frameworks, we built a framework that is better suited for analyzing the phases appearing during engagement with a geometric mathematical problem-solving task in a small group. In what follows, we describe the framework.

Orienting The first stage in Pólya’s (1945) framework was understanding the problem, which he later (Pólya, 1973) divided into getting acquainted and working for better understanding. Other researchers call the first stage orienting to the problem (Carlson & Bloom, 2005) or reading the problem (Artzt & Armour-Thomas, 1992; Schoenfeld, 1985). As the problem used in this study includes hardly any textual information, we decided to call this opening stage orienting. During this phase, the problem solver gets acquainted with the problem. Thus, it does not entail any collaboration and is unlikely to lead to joint attention.

Understanding the problem As did Pólya (1973) and Artzt and Armour-Thomas (1992), we also separated the phase working for better understanding of the problem from the phase getting acquainted with the problem. Whereas orienting can happen only when the problem-solver sees or hears the problem for the very first time, the phase understanding the problem does not have a fixed position in the problem-solving process timeline. This phase occurs when the problem-solver considers linguistic, semantic, and schematic attributes of the problem in his or her own words, and represents the problem in a different form (Artzt & Armour-Thomas, 1992).

Planning and exploring Whereas Pólya called the planning phase devising a plan, we call this phase planning and exploring, combining the terminology used by Schoenfeld (1985), Carlson and Bloom (2005), and Artzt and Armour-Thomas (1992). Artzt and Armour-Thomas (1992) considered calculations and diagrams as reference points to separate the phases of analyzing, planning, and exploring. We expected the problem-solvers mainly to discuss and produce drawings. Due to the nature of our data, it was impossible to separate these phases similarly to Artzt and Armour-Thomas (1992).

Implementing Pólya (1945) called this phase carrying out a plan. We call it implementing, similarly to Schoenfeld (1985) and Artzt and Armour-Thomas (1992). During this phase, the student carries out the plan and comes up with a possible solution, a drawing.

Verifying Even though Pólya (1973) did not have separate phases for verifying and watching and listening, they are present in the description of the phase looking back. He defined this phase as reviewing the solution and possibly discussing it. We have divided this phase into two phases: verifying (defined as reviewing) and watching and listening (defined as following a discussion about the problem or solution), as was done by Artzt and Armour-Thomas (1992). During the verifying phase, the problem-solver checks to see if the solution satisfies the problem’s conditions or explains to others how he or she obtained the solution.

Watching and listening The phase watching and listening happens when the problem-solver is attending to others’ ideas and work (Artzt & Armour-Thomas, 1992). During this phase, the problem-solver is attentive towards a fellow collaborative problem-solver actively participating in the collaborative problem-solving process. The observed problem-solver is actively trying to communicate his or her thoughts to the group.

1.2 Interaction during mathematical collaborative problem-solving

Interaction is necessary for the collaborative problem-solving process (Barron, 2003). Without successful interaction, it is nearly impossible to construct a shared space of understanding in a joint problem space (Roschelle & Teasley, 1995).

During interaction, people observe others' non-verbal signals, using them to estimate their level of attention (Clark & Schaefer, 1987). People use non-verbal signals to communicate understanding and they use gestures especially when explaining abstract ideas (McNeill, 1992). We define gestures following McNeill (1992): gestures are the movements of the hands or arms in space or on objects.

Non-verbal interaction, such as nods, smiles, and gestures, can enhance learning during problem-solving (Rouinfar et al., 2014). McNeill (1992) identified four types of hand gestures: beat, deictic, iconic, and metaphoric gestures. Kita and Davies (2009) combined the categories iconic and deictic, calling all those gestures representational. They observed that representational gestures could change their forms flexibly depending on the communicative and linguistic context. They also found that the representational gestures' occurrence rate was higher the more challenging the mathematical representation (the diagram) was to describe.

Representations have a central role in collaborative mathematical problem solving. Mathematical ideas are abstract in nature (Radford, 2008), and to communicate an abstract idea requires a reference to a representation. External representations are the external embodiments of participants’ internal conceptualizations (Lesh et al., 1987) that other people can observe. Understanding the role of representations and representation systems in mathematical problem solving has interested many researchers over the years (e.g., Goldin, 1998; Lesh et al., 1987). Later research has highlighted that also non-verbal signs, such as gestures, are important forms of representation.

In learning interaction, the students interpret the learning of content meanings by following the teacher’s verbal instructions, gaze cues, and gestures simultaneously (Jarodzka et al., 2013; Shvarts, 2018). Tracking the participants’ gaze allows us to follow gaze cues and gestures of the participants accurately.

1.3 Joint attention

Joint attention is a social phenomenon. When two or more individuals know that they are attending to something in common, they experience joint attention (Tomasello, 1995). In group work, joint attention supports interaction (Barron, 2003; Mercier et al., 2017). As joint attention entails the capacity to coordinate attention with a social partner, it is fundamental for learning (Mundy & Newell, 2007). However, we have not found any studies on how joint attention affects the collaborative mathematical problem-solving process, nor on what the characteristics of joint attention are in the context of interaction evolved around mathematics.

Joint visual attention is understood traditionally as attending to the same target and acknowledging this shared perception (Emery, 2000). For example, the participants are looking at an apple and discussing it (Fig. 1). In mathematics, instead of an apple, the target could be a diagram, a problem, or a solution in a particular place, e.g., on the board or in a notebook.

Fig. 1
figure 1

Joint visual attention (based on Emery, 2000)

Joint attention need not be only joint visual attention. Moore (2013) has written about how the onset of symbolic linguistic representation during the second year of life enables interactions around absent or even nonexistent objects. Such interaction is possible when the participants understand the non-visible object roughly the same way, e.g., they have a similar image of it in their minds, for example, an apple. He refers to this mental image as representation. During this kind of interaction, the participants do not share joint visual attention. However, their attention is on the same thing around the same time, and they acknowledge it. Moore (2013) refers to this kind of interaction around non-visible objects as joint representational attention. We suggest that joint representational attention can be found also in the interaction focusing on mathematics.

We think about Tomasello’s (1995) “something in common” through representations and limit ourselves to a collaborative mathematical problem-solving situation where students work with different diagrams, some of which may represent the same mathematical idea. We hypothesize four possible joint attention situations where the participants do not look at the same diagram:

  1. 1.

    the participants are discussing a solution and looking at the representation of this solution, each in their own notebook (Fig. 2);

  2. 2.

    the participants are discussing a possible solution that each can visualize in their minds in similar ways (Fig. 3);

  3. 3.

    one participant explains a solution through representational gestures to others, for example, by drawing a diagram into air, and they are visualizing the solution in their minds (Fig. 3);

  4. 4.

    one participant looks at a diagram and another is talking about the diagram.

  5. 5.

    In this study we were interested in finding out if these situations are part of the joint representational attention phenomena described by Moore (2013).

Fig. 2
figure 2

Attention to the same representation and acknowledging it

Fig. 3
figure 3

Attention to the same representation through discussion or representational gesture

1.4 Observing joint attention

Visual attention is often intentional (Tomasello, 1995). Therefore, joint visual attention offers information also about the intentions of the participants. Gullberg and Holmqvist (1999) emphasized that unless a visual target falls in our field of foveal vision, which is the small area of acute visual perception (Campbell & Green, 1965), we are not able to read symbols or to detect facial expressions or gestures. Gaze patterns are not the only indicator of cognitive attention, but as such, they are necessary (Gullberg & Holmqvist, 1999).

Emerging verbal and kinetic practices affect the direction of the gaze. For example, speech can direct the observer’s visual attention towards the speaker, after which gestures and other body movements become detectable. They can further direct the observer’s gaze. When the observer is already fine-tuned to the subject, speech is not necessary for directing the observer’s gaze (Stukenbrock, 2018). Student gaze direction is an essential indicator of the target of their attention when they are silent (Hannula & Williams, 2016).

The methods to study joint visual attention in small groups are only developing. However, some methods have already been developed for dyadic joint visual attention. Jermann et al. (2011) introduced a method that utilizes what they call cross recurrence graphs for tracing joint attention. This method works best when there are only two participants, and therefore, it is not suitable for groups with more than two people. Schneider and Pea (2014) developed this method further and introduced network representations to study gaze in collaborative learning. This method is also optimal for only two participants. We have developed a new method to study joint visual attention with more than two people, which we describe in the methods section, and further in the electronic supplementary material.

1.5 Research aim

While most of the previous research has usually examined collaborative problem solving through the discourse in the group and questionnaires (Greiff et al., 2013), we see that non-verbal interaction is also relevant. Therefore, we investigate both verbal and non-verbal interaction during collaborative problem solving in a regular classroom. To make joint attention visible for our investigation, we gave students a drawing problem.

We see the importance of investigating the idea of joint representational attention in mathematical problem-solving in order better to understand joint attention in the context of mathematics. As previous research on the topic is limited, our study is explorative. Our research aim is to understand the nature of joint attention in mathematical problem solving and its effect on the collaborative problem-solving process:

RQ1

What characterizes joint attention in the context of mathematical problem solving?

RQ2

Which phases of the collaborative problem-solving process occur during and right after joint attention?

2 Method

2.1 Participants

The data were collected during a grade nine mathematics lesson in a Finnish comprehensive school from a class of 22 students. The participants were four 15–16-year-old students. The rest of their class (18 students) was present and participated actively in the lesson but did not wear the gaze tracking devices. The four participants, three boys and one girl, were selected among volunteers.

2.2 Apparatus

We recorded the lesson using audio recording and three video cameras in the classroom. Two of the video cameras were pointed towards the students, and one camera followed the teacher. The teacher wore a mobile gaze-tracking device and a personal mobile microphone. Ambient microphones placed in the classroom recorded student voices. The four participants used a smartpen that recorded their drawings as a video and served as a personal microphone. We recorded students’ eye movements with mobile gaze tracking devices.

The gaze tracking devices, the algorithms, and software were developed in the Finnish Institute of Occupational Health (Toivanen et al., 2017) and manufactured in a lab at the University of Helsinki. The accuracy of the device is approximately 1.5 degrees of the visual angle.

The device consists of a glasses-like frame equipped with electronics and three mini-cameras connected to a computer that was carried in a backpack (see Fig. 4), allowing the participants to move. The software in the computer records the video frames and produces a video of the scene camera, superimposed with a gaze point.

Fig. 4
figure 4

Mobile gaze tracking gear and research setting. The light around the eyes in the picture is infrared light and invisible to the naked eye

The frame rate of the video camera varies according to the amount of light; optimally, it is 30 fps. The video frames of each device were recorded with synchronized time stamps.

2.3 Procedure for data collection

Before the lesson started, the teacher was instructed not to provide hints about the optimal solution, but instead, to encourage students to keep trying and to ask questions that could help students articulate their ideas.

The students were asked to find the shortest possible way to connect four cities located at the vertices of a square (Fig. 5), first on their own, then with a partner, and finally in groups of four. However, our target group started spontaneously collaborating already during the instructions, skipping the stages when they work alone and when they work with a partner.

Fig. 5
figure 5

The illustration shown on the whiteboard to pose the problem

This problem is the four-point version of the Steiner tree problem. For this study, we wanted a mathematics task that (1) would work as a collaborative task for Grade 9 students, i.e., the task should be accessible and provide meaningful solution alternatives; (2) would be challenging enough to generate opportunities for novel insights during the process, and (3) would generate interesting visual representations as potential targets for visual attention. Our piloting of the task indicated that people trying to solve this task generate alternative solution versions (see examples in the Figs. 6, 7, 8) quite easily, but that the optimal solution is challenging to find (Fig. 9). This kind of problem task provided opportunities for Aha! experiences and opportunities to collaborate.

Fig. 6
figure 6

The solution ‘X’

Fig. 7
figure 7

The solution ‘Z’

Fig. 8
figure 8

The solution ‘H’

Fig. 9
figure 9

The optimal solution

2.4 Analysis procedure

To study the similarity of two or more students’ attentional behavior, we developed a gaze synchrony measure. It is a numerical measure that identifies moments when a gaze target overlaps among two or more subjects (see details in the electronic supplementary material). We developed the gaze synchrony measure further, creating an improved and less error prone measure, the Garcia Moreno–Esteva–Salminen-Saari measure of joint attention (GMESS; see details in the electronic supplementary material) to identify joint representational attention. Both methods rely on annotated targets for fixations. Fixations stabilize the retina over a stationary gaze target (Duchowski, 2007).

Collaboration can be detected from the gaze, gestures, and speech of the collaborators (Gullberg & Holmqvist, 1999; Radford, 2008). We identified the periods of collaboration, which we call task-focused sections, based on whether students engaged in discussion about the problem, viewed each other’s diagrams or calculations, or produced diagrams or calculations. These sections were identified from the moment the teacher introduces the task until the moment when the collaborative problem solving in groups of four was over. In these task-focused sections, we annotated the gaze target for all the occurring fixations for each participant, fixations having been detected to within an accuracy of 30 ms. For recording and analyzing annotations, we used the software package ELAN (2019, September). We used the annotated gaze videos to determine the possible occurrences of joint visual attention and joint representational attention during these task-focused sections using GMESS.

We used the gaze-tracking videos and smartpen recordings to determine the phases of collaborative mathematical problem solving for each individual. Our coding scheme follows the model presented in Table 2. The multimodal data provided us with detailed and accurate access to the microanalysis of the collaborative problem-solving process. The stationary video cameras informed us about the general learning process of the group and the actions of the teacher. We could follow the student’s gaze in detail through gaze-tracking videos. Additionally, the audio recordings from the personal microphones offered us information about the thinking processes of the students through their verbal communication and argumentation. From the smartpen recordings, we were able to track when the solutions were drawn.

Table 2 Phases of problem solving and how to observe them

In joint visual attention, the participants are looking at the same gaze target. In joint representational attention, the participants are looking at representations of the object or discussing them. In each type of joint attention, it is necessary to map out in time first the gaze patterns of the participants, and then to determine from those the possible occurrences of joint attention. We studied qualitatively the possible occurrences of joint attention. With the external videos, we checked if the participants in the possible joint attention occurrence truly acknowledged the shared perception. Moreover, we used student gestures and speech as indicators for their acknowledgement of the shared perception (see McNeill, 1992).

Using the framework justified in Sect. 1.1. we identified 180 phase changes in the data. We then determined the lengths of the collaborative problem-solving phases for each student using the gaze-tracking videos and smartpen recordings. The gaze tracking videos give detailed information about the students’ gaze. As such, they reveal more about the students’ thinking than the external videos of the situation do. Gaze patterns do indicate cognitive attention, even though they cannot solely be used as an indicator of cognitive attention (Gullberg & Holmqvist, 1999). Elan makes it possible to mark the phase changes without a need to categorize them immediately. The first author determined the positions of the phases first without categorizing them. They were interpreted as starting from the moment there was evidence of a particular phase and as ending when there was evidence of another phase or the student disengaged from the problem-solving process. The first author and the third author each categorized the phases independently after discussing the criteria (see Table 2). They then discussed phases and their length. In the end, we combined the phases that were categorized to be the same ones occurring without a gap between them. We reached a consensus, and we were left with 166 phases.

During the phase orienting, the problem solver gets acquainted with the problem and looks at the assignment right after it is presented, before engaging in conversation. The student may look at the problem again also in the later stages, but this is interpreted as understanding the problem or planning, depending on the verbal context around the situation. Due to the solitary nature of this phase, we did not include this phase in the joint attention analysis.

If one of the students asks a question concerning the nature of the problem, looking at the assignment is interpreted as understanding the problem. If there are no questions concerning the nature of the problem, and the student draws a possible solution in his or her notebook shortly after looking at the assignment, the act of looking at the assignment is interpreted as planning and exploring. Note that in these phases, understanding the problem and planning and exploring, the student might be staring at the assignment in his or her notebook. In other words, the student has drawn the four points that are given at the onset of the assignment in his or her notebook. For the phase understanding the problem, the verbal context provides evidence of working towards understanding. Even though understanding can happen also without the verbal context, it is impossible to observe this externally. Only from the comments of the students can we make this interpretation with certainty (Artzt & Armour-Thomas, 1992).

The phase planning and exploring is not dependent on the verbal context. During this phase, the student may be silent and look at what others in the group have tried out already, or a student might look at his or her notebook seeking answers. This phase should not be confused with the phase verifying. For the phase verifying, it is necessary that either the student has drawn a possible solution immediately before discussing or comparing it to other solutions, thus shifting to the phase verifying from the phase implementing, or that another student draws attention to his or her own suggestion for a solution. During implementation, the student makes an idea visible by drawing a solution suggestion. For this phase, it is not relevant how correct the solution suggestion is. During verifying, the student explains these ideas to others or silently compares the solution to other solutions.

An important component of interaction and therefore also collaboration is the ability to watch and listen to others. We did not interpret a student as being in the phase watching and listening unless they simultaneously watched another student speaking. If the students did not fulfill any of these criteria (Table 2), they were not placed in any of these phases.

3 Results

Our focus group started working as a group of four from the onset, even though the teacher instructed them to first work alone. Our analysis ends at the time when the teacher made a call for attention and the class started to ponder different solution options collectively. From this timeline, we identified seven sections when the students were focused on the task. We analyzed these task-focused sections in detail. The total length of these sections was 8 min and 30 s and they occurred within 22.5 min. The distribution and the length of each task-focused section is illustrated in Fig. 10.

Fig. 10
figure 10

Timeline of analyzed task-focused sections 1–7 starting from the first task-focused section

We identified 3903 fixations from the task-focused sections. With GMESS, we identified 77 joint attention occurrences. Of all the identified occurrences of joint attention, 49 (64% of all) were joint visual attention and 28 (36% of all) joint representational attention. The high frequency of joint representational attention suggests that joint representational attention is a common phenomenon during collaborative mathematical problem solving.

Both joint visual attention, and joint representational attention occurrences had very similar mean lengths, suggesting their similar nature. Even though the mean length of joint representation attention occurrence (M = 5.91 s, SD = 3.51) was slightly longer than joint visual attention occurrence (M = 5.68 s, SD = 4.63), the joint visual attention occurrences varied more in length (range 1.32–31.67 s) than joint representational attention occurrences (range 2.32–17.67 s).

We identified 166 problem-solving phases in the task-focused sections. The distribution, the mean length, and standard deviation of the phase categories for each student is presented in Table 3. It was interesting that the phase implementing was clearly shorter than other phases. Its mean length was only 4.34 s (SD = 3.19), which was a little over half of the length of the second shortest phase planning and exploring (M = 8.05, SD = 5.57). The phase implementing was also least common (f = 14) in the dataset. The phase verifying was clearly the longest lasting phase (M = 18.89, SD = 28.19). This phase was over 2/3 longer than the second longest and most common (f = 63) phase watching and listening (M = 11.42, SD = 15.58).

Table 3 Frequencies of phases of problem-solving for each student

We noticed that the durations of the different kinds of phases (see Table 3) were mainly longer on average than joint attention occurrences. The number of the occurrences of different phases among the students did not vary much. All the students spent most time in the phase verifying and watching and listening, and least in implementing.

Joint attention occurrences were mainly shorter than the phases. Therefore, it is likely for the student to stay in the same phase before, during, and after participating in joint attention. This happened in our data 147 times. During the joint attention occurrence, it is not uncommon for a student to change the phase they are in. This happened in our data 73 times. (See an interactive illustration of the phases and the joint visual and representational attention occurrences during the first task-focused section in https://www.geogebra.org/m/ymyscvqj.) In the illustration, the timeline is on the \(y\)-axis and proceeds from bottom to top. As the illustration in the link shows, joint visual attention and joint representational attention can exist simultaneously, and a student may participate in each of them at the same time. For example, let us imagine a situation where three students, A, B and C, are discussing a solution. A and B look at a diagram in B’s notebook and the third student, C, looks at the same diagram in her own notebook. In this situation A and B are participating in joint visual attention together, but also in joint representational attention with C. In other words, a student A is included in joint visual attention with student B and at the same time in joint representational attention with students B and C.

If a student participated in each type of joint attention at the same time, we counted that student into each type of joint attention. Therefore, the total number of phases is greater in Tables 4 and 5 than in Table 3. Also, it was possible for a student to stay in the same problem-solving phase during and after joint attention and continue to be in this phase also during the next joint attention occurrence (see Fig. 11, student 1). When joint attention occurred twice during the same continuous phase, that phase was counted twice in Table 4.

Table 4 The phases of problem-solving during joint attention
Table 5 The phases of problem-solving right after an occurrence of joint attention
Fig. 11
figure 11

The beginning of the first task-focused section. Time progresses from down to upwards on the y-axis. Each column indicates the phase changes of each student during that time and when they attend to joint visual attention

Figure 11 demonstrates the progression of phases of problem solving and joint visual attention occurrences at the beginning of the first task-focused section. (See the full version of the first task-focused section as an interactive graph in color in https://www.geogebra.org/m/ymyscvqj.)

Table 4 shows which phases the researchers observed during each type of joint attention. There were also students participating in joint attention who were in none of the problem-solving phases. In these cases, they were not focused at first, but during joint attention they either rejoined the problem-solving process, or they gradually lost focus, or joint attention was requested successfully by a student outside of the focus group. These occurrences are shown in Table 4 in the None-column.

As can be seen from Table 4, the two phases occurring most often during joint attention were verifying and watching and listening. The phase implementing was more common during joint visual attention (f = 5) than during joint representational attention (f = 1) as was planning and exploring (JVA f = 11, JRA f = 3). Overall, the problem-solving phases occurred with similar frequency during each type of joint attention. For example, implementing occurred the least during both joint visual and representational attention whereas verifying was the most common phase in both joint visual and representational attention. Figure 12 is an illustration of phase shifts during joint visual attention, joint representational attention, and during other times. The phase shifts occurred similarly during joint visual and joint representational attention, also suggesting their similar nature.

Fig. 12
figure 12

Phase shifts during joint visual attention, joint representational attention, and during other times. The weight of the arrows indicates how many times that phase change was observed in the data. This graph is also available as an interactive illustration in https://www.geogebra.org/m/byvz3cgx

In Fig. 12 the weight of the arrows indicates how many times that phase shift was observed in the data. The most often occurring phase shifts were between the phases watching and listening and verifying. This was observed in the data more often towards the end of the problem-solving process. The phase shift between watching and listening and verifying was observed often during joint visual attention, during joint representational attention, and during other times. It suggests that when the students enter the phase verifying, they soon after seek approval from the group. Also, it entails that many discussions in the group evolve around verifying a solution. The phase shift from planning and exploring to implementing occurred often at times other than during joint attention. This suggests that the phase shift to implementing is more solitary in nature than a direct product of group interaction.

Table 5 shows which phases occurred right after each type of joint attention. In a manner which is similar to what Table 4 indicates, a student can simultaneously attend to joint visual attention with one student, and to joint representational attention with other students. Hence, the total number of the phases is greater in Table 4 than in Table 3. But unlike Table 4 which shows all the phases experienced during joint attention occurrence, Table 5 shows only the phase they were at or entered right after an occurrence of joint attention ended.

Table 5 shows that the phases verifying (46.70% of the total) and watching and listening (33.92% of the total) occurred most often right after joint attention. The phase implementing (0.44% of the total) was the rarest of the phases right after a joint attention occurrence. Some of the students turned to doing other things, for example, doodling. In these cases, the students were not categorized in any of the problem-solving phases. Table 5 also shows that the problem-solving phases occur somewhat similarly after each type of joint attention.

It was also interesting to notice that participants in joint attention were not experiencing the same phases of problem-solving, neither during nor after joint attention with each other.

4 Discussion and conclusions

We identified 77 joint attention occurrences, and 28 of these were joint representational attention occurrences. Already the shared number of joint representational attention occurrences implies that the definition of joint attention should also include attention to separate representations of the same idea in the context of collaborative mathematical problem solving. The results also showed that phases of problem solving occurred in the same proportion during joint visual and representational attention.

There were phase shifts during joint attention (73 times in our data) but more often there was no phase change (146 times). In addition, the students often continued to be in the same phase after joint attention as they were in during joint attention. During both joint visual and representational attention, the two most often occurring phases were verifying and watching and listening. Our results are in line with those of Roschelle (1992), who argued that one of the key characteristic features in the creation of joint problem-solving spaces is repeating cycles involving the display, confirmation, and repairing of understandings. The phase shifts during joint attention in our case study were mainly between verifying, watching and listening, and understanding the problem, which are the ones that correspond to displaying, confirming, and repairing understandings. Both Roschelle (1992) and Barron (2003) also observed that increasingly, the students expected to get evidence from each other that they understood one another as the conversation progressed. In our data the phase shifts between verifying and watching and listening increased clearly towards the end. Right after joint attention, the phases occurring most often were verifying and watching and listening.

This study showed that a student did not necessarily need to be in any of the phases during, or after entering joint attention. In such a case, the student was not focused on the problem at first but got back into the problem-solving process during joint attention. Alternatively, they either gradually lost their focus or joint attention was requested successfully by a student outside of the focus group. Additionally, the phases of the problem-solvers attending to joint attention were not synchronized during or after joint attention.

Carlson and Bloom (2005) showed that individual problem-solving processes progress in cycles of planning, executing, and checking. We showed that this cycle (planning and exploring, implementing, and verifying) is also present in collaborative problem solving, but it is less likely to appear during joint attention than at other times.

Reflecting about methodological implications and limitations of this study, we consider mobile gaze tracking a useful way to complete a problem-solving phase analysis. For example, Arzt and Armour-Thomas (1992) used videotapes in determining the phases of the students. They and their research assistant observed a videotape together in one-minute intervals. They then recorded the observations and noticed that the participants exhibited several behaviors during the one-minute interval. Gaze tracking allows us to view what the participant sees, and the current technology allows one researcher to observe several synchronized videos simultaneously. Visual attention is often intentional (Tomasello, 1995). Hence, being able to view exactly what the students see makes it easier to understand their problem-solving strategies and to pinpoint more precisely when a phase begins and ends. Nevertheless, it remains characterized by a third person instead of the actual problem solver.

Relying on gaze target annotations has its difficulties. There are moments when the annotator cannot see clearly what the participant is looking at. The drawings can be small and relatively far, and the lights may cause reflections. In addition, the gaze targeted on a pen pointing at a drawing could be interpreted as gaze focusing on pen, hand, or gesture. The new method GMESS for tracing joint attention proved to be of value also in those kinds of situations. It made it easy to detect not only joint visual attention occurrences but also joint representational attention occurrences. Before checking the audio—which is necessary for identifying joint attention—we had identified by using GMESS seven possible joint attention occurrences, which, however, did not have the verbal context required for joint attention. We discarded two additional occurrences of joint attention from the analysis because attention was not on the task. Yet, we found 77 occurrences of joint attention, which we consider to be a lot for 8 min and 30 s of task-focused time. With GMESS, even those difficult to detect occurrences become detectable. Therefore, we recommend the use of GMESS, especially in the context of mathematics education.

Even though we analyzed only one group solving one problem, the data had many fixations (n = 3903) and occurrences of joint attention (n = 77). The high frequency of joint attention occurrences can be seen as a sign of successful interaction (Barron, 2003). Coding the gaze targets for fixations is extremely time consuming (approximately an hour per minute of data). For this reason, only the first author annotated the fixations. Use of the GMESS method also made some errors in the coding visible, thus allowing us to correct them. The analysis of problem-solving phases, on the other hand, was much quicker to do. The first and the third author annotated the phases first separately and then discussed their coding where it differed, until they reached a full consensus. Whereas we could detect joint attention within an accuracy of 30 ms, the time when a student transitioned from one phase to another could not be determined as exactly. This did not cause problems for our analysis, as the problem-solving phases were relatively long. We were not interested in when exactly during a phase joint attention occurred. Instead, we focused on finding out during which phases joint attention occurred.

Joint attention as such is fundamental for collaborative learning (Mundy & Newell, 2007). Our research introduces the concept of joint representational attention to mathematical problem-solving research. By including joint representational attention in the analysis of joint attention in collaborative problem solving, the analysis gives more holistic information about the process. It also emphasizes the fact that interaction and collaboration do not necessarily require a concrete joint visual target. To establish successful interaction and to initiate joint attention, it is enough that the collaborators share an understanding of the concept through representations. Adding the concept of joint representational attention in the repertoire of mathematical problem-solving research opens also new opportunities for research. A more in-depth analysis of the fixations is needed to understand gaze behavior during joint attention. One important future direction of work is to study why and when joint attention is initiated during collaborative problem solving, and what determines the success of the request for joint attention. Also, a more qualitative analysis is needed to understand why a student skips a phase, during collaborative problem-solving, that would normally occur in the individual problem-solving process.