A central dilemma exists in video research, particularly in relation to education. It concerns the trend towards interpreting what is captured on video as an evidence-base from which to draw all-too-certain conclusions about it’s meaning for all. The claims that arise from such data are often presented without speculation or scrutiny, further aided by ethical barriers that preclude public access to the footage of their origin. Combined with a push for methodological instrumentalism (Mills & Ratcliffe, 2012, p. 152) the opportunities video research offers the field are denied their fullest potential and, as a consequence, all too often fail to deliver their promise as legitimised data sources or, for that matter, in representing “what passes for experience and reality” (Sandywell & Heywood, 2012, p. 6).

This is not a new dilemma in visual studies and the associated field of visual culture (Davis, 2011). Indeed, its origins can be traced back through use of image in photograph and art (DeBord 1994), originating in Plato’s (1952) early critique of the illusionary nature of the eye and culminating in modern rhetoric which suggests that the visual can persuade and coerce certain types of seeing through tactics such as ethos, pathos, logos and kouros:

Vision is by no means an automatic function of our psychological apparatus. There is much evidence that vision is a mode of thinking. When we see, we interpret the world around us and orient ourselves in it. Sharpening our awareness, heightening our sensibility, disciplining our vision, it will increase our power to understand the world, appreciate its richness and cope with its problems (Kepes, 1944, p. 17, cited in Brannon, 2013, p. 280).

In spite of this cautionary philosophical and semiotic heritage, video is now characteristically and increasingly promoted in educational research with young children as evidence for broad pedagogical claims in various sociocultural contexts (Fleer & Ridgeway, 2014; Johansson & White, 2011; White, 2015). Jay (1993, cited in Peters, 2010) describes this phenomenon in the broader field as a “hegemony of vision” that fails to recognise its own authority as a kind of public pedagogy with the power to define what forms of learning and teaching matter, and for whom (see also, Tavin, 2015). In this location, video becomes an ocular-centric means of determining what constitutes teaching and learning by defining what ‘is’ or what ‘should be’ according to a series of moving images as a means of demonstration or exemplification. As Sandywell & Heywood (2012) suggest, this approach casts video as the “paradigmatic aesthetic machine of the nineteeth century” (p. 18) failing to recognise its wider potential as a “grenade of meaning” (p. 37) and a source of multimodal speculation.

The problem with aesthetics

In itself, the deployment of moving image as a means of fuller understanding in context-specific settings is not the problem. Indeed, its utility in developing a broader picture of educational experience is indisputable. Were this not the case the Video Journal of Education and Pedagogy would not exist. However, where the meaning ascribed to video by a researcher (or any other individual for that matter) is either absent from its representation or represented as reality for all, an ethical dilemma arises. Burri (2012) suggests that this literal emphasis on the image fails to examine how the image shapes cultural meaning, how practices are constructed through video and to understand what gives rise to their status as social ‘reality’. Treated as an aesthetic object of validity and reliability in a traditional sense, the use of video to make certain claims concerning its meaning supposes that there is only one way the footage might be interpreted. Moreover, it assumes a certain methodological holism to what can be seen, asserting an ‘eye of God’ reality which has the potential to pervert as much as represent (Baudrilland, 1967). As Mihailovic (1997) asserts - since any product of aesthetic activity (e.g. a video) is immutable - it cannot represent a living reality. This view lends support to the contention of Pink & Leder Mackley (2012) who argue that video in research is primarily methodological rather than empirical, posing pedagogical questions that ask not only how or what are we learning and teaching, but also ‘why’? Such questions shift the field of inquiry to the ideological basis for understanding what matters, and by association, who decides what is ‘seen’ as legitimate learning.

Although moving image was virtually unheard of at the time, a similar dilemma was posed by Mikhail Bakhtin almost a hundred years earlier in his philosophical discussions of aesthetic activity:

Aesthetic activity … is powerless to take possession of that moment of Being which is constituted by the transitiveness and open event-ness of Being. And the product of aesthetic activity is not, with respect to its meaning, actual being in the process of becoming, and, with respect to its being, it enters into communion with Being through a historical act of effective aesthetic intuiting (Bakhtin, 1993, p. 1).

For Bakhtin, uncritical forms of aesthetic activity which are unproblematically ‘received’ as significant or otherwise according to universal principles represent an epistemological and ethical crisis. This is because they lack any consideration of the living, evolving, shifting and located (ideological) nature of meaning in the event itself, as well as its aftermath. They are therefore static, unchanging and received truth-istina. An alternative type of meaning-making is offered instead through act Bakhtin describes as ‘intuiting’ or lived truth –pravda. In this Bakhtin invokes the capacity for humans to draw upon their intuitive responses in the interpretive event here-and-now, as opposed to relying on a set of universally sanctioned and received definitions or categories that can be overlaid on an already known process. Seeing, according to this interpretation, is a pedagogical engagement requiring the see-er to understand what can be seen as an event of becoming – both for themselves as much as the learner. This idea is enshrined in Bakhtin’s (1990) notion of answerability as a means of accountability, reflexivity and an encounter with one’s own morality in the life of others.

The route to intuitive engagement of this nature is established by Bakhtin (1993) through his notion of visual surplus – a concept which emphasizes the situatedness of interpretation. More specifically, visual surplus accepts that insights are derived from the evaluator’s unique place in the world, from which they ‘see’ (and interpret) accordingly. Located outside of the individual (that is, the ‘seen’) the visual surplus of another holds the capacity to contribute fresh ways of seeing. However, this conception also asserts that any evaluation is also ideologically located, and that the see-er is implicated for the contributions they make. A corresponding work (or ‘effort’) of the eye therefore represents the highest interpretive authority and summons an ethical imperative to interpretation. Since visual surplus holds as much potential for damage as it does for enhancement of meanings, Bakhtin grants attention to the ‘I’ – for other, for oneself and for a mutually animated ‘we’ that resides in the dialogic space. It is here that this research methodology finds its home.

The ‘work of the eye’ in video research

Contemplating the eye as a lived encounter of visual surplus as well as an ethical responsibility calls for a more complex methodological encounter with videoed events:

Here, the eye is re-cast as a visual encounter with others. As such, what can be ‘seen’ is viewed as an authorial gift that draws on the insights of another’s visual field because they offer additional opportunities for understanding and because the eye alone (with the ‘I’ or the insight of another) cannot see to its fuller extent. (White 2016a, 2016b).

This axiologic engagement with the footage in tandem with a consideration of its many, perhaps even oppositional, interpreted meanings holds potential for researchers to ‘see’ beyond the limits of their own eye/I and, in doing so, represents a much richer approach to analysis. The premise for such an approach is further expounded, by Bakhtin (1984), through Dostoevsky’s novelistic inspiration, in the notion of polyphony whereby “subjects co-exist as autonomous worlds within the world of the author and contend with him for the readers’ attention” (Krasnov, 1980, p.5). In this sense, polyphony addresses the problem of seeing as an isolated or discrete activity by relocating what can be seen as a mutually animating event. As such, traditional binaries of subject versus object in research are collapsed in order to contemplate interpretation from both the perspective of the see-er and the seen.

Coupled with the Bakhtinian notion of visual surplus, polyphony provides a revisioned way of approaching video research. Now, “emphasis is placed on the authors ability to allow multiple voices (and voices-within-voices) to remain in play and characters to speak for themselves through the multiple genres employed” (White, 2010, p. 87). Attention is therefore granted to the unspoken as well as the spoken, whispers, sideways glances and gestures alike play an integral role to the interpretations that are shared. Significance is given to the different contexts in which the event takes place, and the intended audience(s) for whom language is oriented. Just as the polyphonic novel draws on the words and actions of those whose narrative story is told, so does video research seek to invite the embodied as well as the articulated interpretations of those who are being videoed as a means of revealing a series of research narratives and their relationships to one another. More importantly, those interpretations are not mere window dressings for the researchers authoritative gaze, but play a vital role in influencing what might be seen and how it could be interpreted otherwise.

A polyphonic approach

A polyphonic approach to video research as a central form of visual surplus lies at the heart of this interpretation of dialogic methodology, as presented in the remainder of this paper. I should state from the outset that such an approach is not for the faint hearted. If certainty is desired, as is so often the case in educational research, then a polyphonic approach will fail to satisfy. If, on the other hand, a genuine desire to see richly and to be informed otherwise, is sought, polyphony offers such an encounter. In some fields of education where interpretation is less prescribed or certain this approach is, perhaps, easier to contemplate. Early childhood education (ECE) is one such domain, but there are many others that invite similar contemplation. Indeed, as I have tried to argue elsewhere (White 2011a, 2011b), those so-called ‘certain’ domains also benefit richly from suspending authoritative thresholds in order to contemplate what might be accessed in polyphonic chorus with other ways of seeing. Indeed, pedagogical research of a multi-perspectival nature is evident in early years studies with preschool aged children across diverse cultures where different teachers discuss and compare their pedagogical insights based on classroom video (see, for example, Hayashi & Tobin 2015).

Given my own interest in ECE it is hardly surprising that it is to the youngest learner that I orient by way of demonstrating a polyphonic approach to video research. It is worth pausing for a moment to explain why. As explained elsewhere (White 2016a, 2016b) infants in ECE research are often misrepresented in educational research due to a lack of understanding or, quite simply, placed in the ‘too hard basket’ and ignored altogether in pedagogical discussions. There are several reasons for this – many of which lie beyond the scope of this paper (for a fuller discussion see Dalli & White, 2016). Suffice to say that their recent entry into the educational realm has not been marked by a flurry of educational research activity beyond a legacy of developmental science and psychology that claims certain realities for infant experience and capability, largely based on laboratory tests with infants and their mothers. As a consequence, the experience of infants in educational settings is largely speculative. This is especially true for infants under the age of one year who are increasingly spending significant hours of their day in ECE settings with non-familial adults who have to work very hard to interpret their language cues (White et al. 2015b).

Taking up this challenge, I set out to action Bakhtin’s visual entreaty by trying to earnestly interpret the ECE experience of infants in polyphonic dialogue with others. Including the adults who work with infants (teachers and parents) as a means of understanding is not new to educational research (Lang et al. 2016) and is easily achieved by interviewing them about their interpretations in a similar way to David Clarke and his associates (Clarke et al. 2006). As a kind of ‘surplus’ this form of data generation goes some way to contributing to an enhanced understanding since adults who live and work with infants are able to offer important and additional insights to the research. However, in isolation of other interpretative approaches that foreground the infant experience from their own visual perspective and seeing the same event through different eyes, interviewing adults alone implies that they fully know the infant and can speak on their behalf. Moreover it denies the infant an opportunity to have their unique perspectives heard or alternative insights considered.

As such research that speaks on behalf of the infant – as if their perspective were fully known - represents a form of ‘ventriloquisation’ (Tannen, 2010) that fails to recognise the infant’s experience beyond the interpretation of another. Bakhtin has a great deal to say about this from an ethical standpoint, suggesting that an exclusive and intimate approach to evaluative activity alone may lead to a complete consummation of another in the absence of an outsider point-of-view. The same is true for approaches that are exclusively distant from the infant, and make assertions based on monologic claims that homogenise infants as developmentally ‘known’ (Cheeseman et al. 2015). Infant research is characterised by both extremes (Dalli & White, 2016).

Not withstanding the obvious linguistic, developmental and ethical limitations in providing opportunities for infants to contribute to the research (Elwick et al. 2014) a polyphonic approach deliberately sets out to view the experience through their eyes, in tandem with others. No ventriloquised assertions are made concerning infant interpretations of the event, but instead, build on what can be seen as a source of insight for all. Emphasis is placed on the language forms and their interpreted meanings in events, and the way participants give form to these through dialogue. This is a dialogic process which summons ‘the work of the eye’ – and the subjective ‘I’ of the researcher - to its fullest extent (White 2016a). As Deborah Hicks (2000) explains: “Rich seeing requires that the contemplator immerses him or herself in the “heaviness” of the social relationship’ (p. 232).

Operationalising a polyphonic approach to video data generation therefore entailed a revised form of richly seeing which encountered the visual field of the infant him or herself. Earliest attempts at this approach had revealed insights far in excess of previously held assertions concerning very young children, including their capacity to disorient adults in their understandings (White, 2011b). In order to access this lens I utilised four cameras which simultaneously shot film from a lens worn on the infants head, the teachers head and my own hand-held device. The role of the teachers is important to note here as the ECE setting operated with a key teacher-buddy system which meant that each infant had a special adult who held primary care responsibility, and who was supported by a back-up – buddy – when they were occupied elsewhere.

Figure 1 provides a view of the four visual fields, including four-month old Harrison, ten-month old Lola, Harrison’s key teacher (1) and Lola’s key teacher (2):

Fig. 1
figure 1

Screen shot of polyphonic footage

In the top left screen teacher 2 and Harrison are in the visual field of teacher 1. In the bottom left screen teacher 2 is (close up) in the visual field of infant 1 - Harrison. In the bottom right screen a different scene is evident in the visual field of infant 2 – Lola – who is in the same room. The top right screen shows the researchers visual field taken from a distance. Although all screens are shot in the same place and time, what they reveal is often very different, dependent on the direction of each participants head. While this technology cannot claim to track their explicit eye movements and thus cannot account for sideways glances (which are also important in dialogic research according to Sullivan 2013) they do provide a general overview of the visual orientation of each person.

Time synchronised, these visual fields taken over two hours were offered to the teachers for pedagogical interpretation (in an earlier study the family were also invited to offer their perspectives on polyphonic video events – see White 2009a, b). This meant that teachers were invited to select specific events from the polyphonic footage which they considered held pedagogical significance. These insights were shared in a subsequent interview which, in tandem with in-depth analysis by the researchers themselves, provided a rich source of visual surplus (White et al. 2015b). By tracing the field of vision, in tandem with the evaluative eye of researchers and teachers, a means of fuller appreciation of the pedagogical experience for these infants was established. This was seen as particularly important at the time of this study, when infant teachers were being accused of pedagogical incompetence in the absence of an articulated pedagogy that would satisfy the requirements of wider educational discourse (Education Review Office, 2015).

By way of demonstration

The video excerpt that follows offers one small example of insight from the polyphonic video as a means of demonstration. Those watching this footage in a future-oriented world of technology where more sophisticated cameras make visual work much simpler (or perhaps more complex), will probably find the filming most primitive. Indeed, for those who seek production quality the footage may be unpalatable. However, at the time of filming, in late 2013, and given the subjects involved, the (nano-pod) cameras were the best option available. Emphasis is placed on the footage as a means of dialogic engagement rather than a product or outcome as is so often the case in video-based research. As such, participants are not given instructions on what to view or how to view the split-screens, since what they choose to look at and how they approach their viewing is yet another potential means of insight and challengeFootnote 1.

This scene takes place in a New Zealand early childhood education (ECE) setting catering for infants and toddlers during the early afternoon. Both Lola and Harrison have recently been fed and are playing on the floor of the ECE setting. In this case teacher 2 is Lola’s key teacher while Harrison’s key teacher (1) is occupied in another part of the setting.

Lola. Harrison & Rachel flattened movie. The video is available to download if requested to editorial@videoeducationjournal.com.

Among a myriad of other insights, what this event highlights is the significance of the three-way relationships that take place between the infants and their teacher, but also the infants themselves. It is difficult to separate the three in this dialogic context. The teachers highlighted this event because they noticed Lola imitating her teacher’s ‘tickley’ act (both in terms of sound and action), but in their dialogues they emphasised a great deal more – traversing their own discoveries as well as articulating their pedagogical practice using language that might otherwise have been overlooked by the researchers:

Teacher 2: I provided the provocation for her at the beginning and then invited an extension on that….She [Lola] takes the blocks, I just love that, straight away.

Teacher 1: And Harrison is watching again.

Teacher 2: And at this moment I thought it was important to talk about how we have no time restrictions, so I’m not like “right, Ok, I’ve got 20 nappies to do. I can’t sit here for this length of time. You know, like I can be in the moment again and I’m not hurried by anything. You know apart from recording nappies and stuff we’re barely looking at our watches…And then I notice that I zone out here and Lola moves away with her freedom to move, rolling. I think because we’re in an environment like this where we’ve got no restrictions, like no baby swings and bouncers and stuff, I actually think that makes you engage more with the children because in an environment where we had say Harrison in a bouncer over there and Lola in a swing over here I don’t think there would be the same amount of engagement as what happens.

Teacher 1: That’s a really good point, because you would be thinking, what should I do now?

Teacher 2: yeah, like give the swing a little push - because these children are lying on the floor we’re engaging with them…I mean we have all these provocations but if you watch most of the time its that engagement. You know that position I’m in here and that L was in with hers. We’re just so awesome [laughs]

Teacher 1: We’re figuring it out all the time. We’d never say we know because its always different. A lot of the most important stuff goes on with the children – there in the moment, just figuring it out.

[Teacher interview]

For these teachers, as much as for the researchers, the insights polyphonic footage provided not only exceeded their own independent visual fields and associated insights, but also revealed their own deeply held pedagogical beliefs concerning these infants and their pedagogical choices accordingly. Taken together, these highlight the embodied work of the early years teacher (Hayashi & Tobin, 2015) as well as the intuitive nature of engagement that calls for moment-by-moment responsivity rather than received categories. This unknown nature of their engagement represents what Shotter (2012) describes as ‘poised resourcefulness’ and represents some of the complexity teachers face when working in a manner that suspends certainty in favour of events of ‘being’ and ‘becoming’ (White 2016a, 2016b).

When asked by the researcher if there were any surprises for teachers in the footage their replies highlight the importance of having access to infant visual fields as a tremendous source of insight:

Teacher 1: I really didn’t know how much of Harrison’s time is spent watching absolutely everything going on around him and how obviously how important that is for his learning. That is so significant in this environment with regard to his social, learning about being a social person and just the way he so engages you verbally and the children.

Teacher 2: And with the key teacher it’s interesting how there’s different interactions that happen between key teachers and the buddies.

Teacher 1: I think that’s quite good because it shows, it was a concern for me earlier that I wouldn’t be giving a true picture of Lola’s relationship if we didn’t have [her key teacher 2] involved and yet she is perfectly happy for me to do things for her. The thing that has become apparent to me is that we are doing so much more than we think we are doing. You know, like, its obvious – you’re not only giving a baby a bottle, you’re interacting with another child, you’re scanning the environment, you’re thinking about who is going to need what in the near future and it all looks like you are just feeding the baby. Yeah, so its very involved what’s going on.

Teacher 2: And how we view the children as capable and confident. There was a part where Lola got stuck under the shelving. I didn’t jump in straight away because I view her as capable and confident. I wanted her to engage with her dispositions and get herself out of there because then she will feel like she is empowered to do that. To make her own decisions about how to do that. I think that’s how we view our children.

Teacher 1: Yeah I think that’s so evident in that whole part where Lola was sitting there for such a long time. It is viewing the child as being actively engaged and able to do so. When I was talking to [other teacher] about it, she was like “well imagine if you’d been sitting there passing her things” – you know – which is quite a normal thing for a teacher to do…and she was not only learning about the objects but shes learning about herself as a learner. You know, like she was looking at some beautiful objects like to paua shell and that’s huge - understanding that the environment is the third teacher.

[Teachers interview]

Similarly, the researchers had access to a great deal more understanding of pedagogical events of significance through engaging with teacher dialogues and their own independent analysis of the polyphonic screens. A full depiction of these discoveries is beyond the scope of this paperFootnote 2, suffice to say that a systematic qualitative and quantitative analysis of the different language forms (including the use of the body, including eye movement as a feature of communication the teachers highlighted to the researchers), their sequenced useage in dialogues with teachers and the infants proximity to teachers during different events, provided further visual surplus to the videoed events. Taken together, these approaches respond to a dialogic interpretation of utterance as “social phenomenon” (Voloshinov, p. 82) whereby meanings are encountered, negotiated, disputed and refuted (dialogised) rather than received as truth.

Analysing utterance

Utterance as a central unit for analysis offered a way of cross-examining the data and understanding genres in infant dialogues with teachers and/or peers. The figure below provides a screen shot capturing some of the multi-layered complexity in the social events on film as seen through the different visual fields and associated insights which laid the groundwork for a two tiered analytic process:

Tier 1: Identification of speech genres

In the first instance, events were coded against the language forms used by the infant on film and their articulated meaning(s) - based on the insights offered in dialogue between researcher and teachers as well as the visual cues offered on the video itself. This approach responds to Bakhtin’s explanation of various genres and their employment in certain social settings that are characterised by preferred combinations of language form and content (or meaning). As Mabin explains:

The notion of a genre emerging from social activity switches the focus from a more static tableau-like notion of setting (for example a classroom) to the various different social activities, involving different kinds of speech genres, which may be going on within it.

Identifying speech genres as a means of a combination of form plus content provided a means of understanding the complex ways that various language forms might be dialogised by infants and adults alike, in the ECE setting. Our initial analysis focused on teacher-infant dialogues (White et al. 2015b). That a large number of these forms were non-verbal and very subtle – requiring replay after replay revealing further layers of meaning, further legitimated both the importance of video itself and, specifically, the visual fields of the infants.

Tier 2: Visual field analysis of alteric and intersubjective events

Participants (including researchers) were then invited to focus on the infant visual field – as shown through their camera lens – as an additional source of provocation. It is here that alteric insights are generated – that is, insights whereby the infants field offers fresh perspectives on the event and its meaning of adults. These are contemplated alongside the attention give to intersecting visual fields – between participants - where intersubjectivity and shared meaning was more often emphasized in the analysis. Analysis therefore sought to try to understand the nature of both alteric and intersubjective events; their duration, content and influence on subsequent events. This was possible due to the access studiocode offered to time sequences, and the opportunity to discuss events retrospectively and over several episodes. These forms of analysis generated a rich qualitative platform for further quantitative inquiry.

Tier 3: Quantitative analysis of genre

Applying a quantitative approach to analysis provided a means of converting single language events and their meanings into frequencies over time Fig. 2. Through such means it became possible to ‘see’ patterns in dialogues and, as a result, to begin to draw conclusions about a variety of features in the learning environment that influenced these. For example, that the proximity of the key teacher consistently played a vital role in the kind of communication that took place for infants even when they were interacting with others (White & Redder, 2015). An important finding concerning eye movement revealed the importance of a lingering gaze, as opposed to a glance or a watch, in the dialogic exchanges that took place for infants with their teachers and featured as a pedagogical priority in keeping with the assertions of the teachers themselves (White et al. 2015a). With the aid of polyphonic footage, which was returned to at regular intervals, it became possible to recognise nuanced moments as dialogic events and their significance to others. Importantly these occurred between a variety of people, places and things in the ECE setting, rather than merely in dyadic relationships between adults and infants alone (as traditional research for infants might suggest). It was therefore possible to ‘see’ how events were also influenced by time-space AND axiologic coordinates, or what Bakhtin describes as “an intersection of axes and fusions” that make up the chronotopes in which language is located.

Fig. 2
figure 2

Screen shot of analysis frame

Tier 4: Beyond the adult visual field

Subsequent analysis highlighted the peer relationships that took place (Redder, 2014) – often outside of the adult visual field – and which set the scene for a dialogic encounter far beyond what might otherwise be accessible to research. On many occasions our discoveries drew from the direct visual lens of the infant; while on others it was these visual images that sparked important discussions concerning what was valued, responded to and recorded (in assessment documentation, for example – see White 2009a, b) and, perhaps even more importantly, what was not. These, and many other, insights would not have been possible without the visual surplus of the teachers, researchers and infants in polyphonic chorus. Together they respond to Bakhtin’s call to “give way to the work of the eye as performance and creativity in a particular place at a particular time” (Bakhtin, 1986, p. 38). Further, it is my contention that these approaches represent a re-visioned methodology for video work by methodologically living out the realities of richly seeing in ethical and subjectively honest ways. In educational research, this methodology provides a practical and theoretical means of understanding the complexity of pedagogical events in the lives of learners and teachers alike. Specifically, for infant research, visual surplus through polyphonic means offers potential for understanding our youngest educational partners as an effort of trying. In both literally ‘seeing’ through infant eyes and figuratively confronting the ‘I’ of the intuiting other, the teacher is morally implicated for their pedagogical expressions and associated actions. This is most certainly a pedagogical imperative also.

Conclusion

While this paper began with a critique of video as a means of ‘knowing’ what is learnt and its pedagogical premise, it ends with the Bakhtinian proposition that it is the effort of knowing that maps out a revised methodological orientation for the field. In so doing an approach that upholds the primacy of the optic as an aesthetic source of insight; whilst paying attention to the subjective ‘I’ of those who seek to understand learners and themselves. Bakhtin’s polyphonic imperative offers a great deal to the field of video research in education in this regard – calling researchers to account for the interpretations they make and the associated claims that are made concerning others. Keenly attuned to its ethical and moral purposes, video-based methodology of this nature therefore calls for greater transparency concerning what might be said, and the visual fields through which the narratives might be told. As Bakhtin (1984) reminds us: “Never use for objectifying or finalizing another’s consciousness anything that might be inaccessible to that consciousness, that might lie outside its field of vision” (p. 278). Drawing from the multiplicity of what can be seen and how it might be interpreted by others that arises from this entreaty becomes a central source of dialogic provocation and wonder in this view.

The insights generated out of the different forms of visual surplus provided through polyphonic footage that have been alluded to throughout this paper further highlight the importance of seeing video as a way of understanding pedagogy and, in so doing, deepening an appreciation of the complexity within all events as learning. Paying attention to the nuanced detail of what might be seen, and by whom, sets the scene for a sophisticated engagement with learning as a series of relational, dialogic and deeply ethical encounters with people, places and things. This exceeds any one interpretation but holds great potential for enhancing evaluations when granted a legitimate place in the research. Whether or not it sets out learning agendas for others is, perhaps, less of relevance than supporting those who are ‘in the moment’ to understand the possible impact of their own acts on the lives of other. This impact is not only an ethical imperative for teachers working with infants, but also for researchers who set out to ‘capture’ their lives on film.

As a polyphonic event in a dialogic space that is largely unknown to educational research, the ECE context provides a revised platform for richly seeing. But it is by no means the only educational setting that warrants the work of the eye or polyphonic access to the different perspectives of those who reside in such spaces. There are, as such, implications that arise from this methodology for the study of all pedagogical relations, especially in populations where the language of participants is not necessarily shared (indeed, from a Bakhtinian stance no language rarely is) or difficult to access. This applies to the broadest fields of educational experience – all of which call for intuitive and ethical approaches to understanding and engagement. Thus the methodology posed in this paper is not merely a provocation for video research for teachers and learners in classrooms where learning takes place. Here, video is not merely a source of knowing. It is also a source of speculation and not knowing, as well as a source of intuitive insight and creativity that, in my view, begins to painstakingly operationalise Bakhtin’s notion of visual surplus in contemplation of 21st century pedagogies.