Background

Dynamic attention in film

People spend a lot of leisure time watching films, television, or some other forms of edited dynamic media. Dynamic visual media has increasingly become an integral part of our everyday lives. Film is the best carrier to explore dynamic media because it is the mature product of art, digital techniques and market. Dynamic attending theory (DAT) argues that attention is the dynamic anticipation and processing of events that vary in time. DAT proposed attention will detect and characterize the temporal structure of external event streams to form the allocation of attention in time, and optimize the processing and prediction of external events through the readjustment of this attention resource in the time dimension (Large and Jones 1999).

Visual attention is the ability to rapidly detect the interesting parts of a given scene on which higher-level computer vision tasks can focus. Human vision relies extensively on a visual attention mechanism that selects parts of the scene, on which higher vision tasks can focus. Thus, only a small subset of the sensory information is selected for further processing, which partially explains the rapidity of human visual behavior. Most of them rely on the feature integration theory presented by Treisman and Gelade (1980). How people deal with dynamic information is more complex processing. A computational model of dynamic visual attention was proposed to detect the most salient locations in dynamic scenes. The static and dynamic saliency maps were computed and integrated, in a competitive manner, into a final map of attention, the saliency map on which a Winner-Take-All algorithm is applied to select the most visually salient parts of the scene (Ouerhani and Hü 2003). The theory helps us understand how audiences process static and dynamic images in the film.

Emotional experience in film

From silence to sound, from white and black to colorful, from 2-dimension to 3-dimension images, from framed screen films to frameless cinematic virtual reality (CVR), the history of film is closely correlated with the development of media techniques. All these changes reflect filmmakers’ increasing desire to attract audiences’ attention and to improve audiences’ experience.

Another important function of film is emotional experience. Films make us laugh, cry, angry, fearful or calm. Film has been proven that it is one of the most effective and ethically sound ways of priming mood. So, it is often used to be as a tool for priming emotion in psychological studies (Gross and Levenson 1995; Payne et al. 1997). Researchers created a large film database to study emotion. 70 film clips were selected by 50 film experts and then rated by 364 participants on multiple dimensions (Schaefer et al. 2010). Emotional experience while film viewing and emotion variety are paid much attention to. The emotion before viewing films is normally analyzed as the baseline. It is well known that attention is influenced by many psychological and social factors, including cognitive capacity, emotion, personality, gender and age (Banerjee et al. 2008). Affect-as-information theory suggests that people in a sad mood process information more deliberately and search for specific information before making a judgment (Clore et al. 2001). People in a happy mood use a more automatic or heuristic information processing style and judgments are made on the basis of an overall impression. So positive or negative has a different impact on attention.

Psychological methods applied in film studies

A hundred years ago, early psychologists already tried to understand how we perceive edited moving images. Hugo Münsterberg published one of the earliest essays, “The photoplay: a psychological study”, in 1916. But his study was paid little attention by other people until 1970 (Münsterberg and Griffith 2004; Wicclair 1978). For a long time, the question about how people perceive and experience film did not receive much concern. With the development of cognitive psychology and neuroscience, some new methodologies (et., eye tracking, EEG and fMRI) make it possible to deal with more naturalistic stimuli in the study of visual cognition (Smith et al. 2010, 2012). Through the user’s subjective reports, we can study the user’s attitude, interest, emotional experience and motivation about media contexts (movie, TV, video, advertisement and so on). And more than that, some stable characteristics, such as personality, intelligence and temperament could effectively help us predict the user’s choice preference. Psychological experiments can be designed to record exact reaction time, correct rate and choice selection when people are watching videos, reading books, or listening to music.

It is well known that the visual system processes about 70% of a person’s total sensory input. Eye tracking technology is the most direct method to explore visual information processing. Eye tracking methodology has been widely used to analyze reaction patterns in reading research (McDonald and Shillcock 2003) and in consumer choice behavior (Reutskaja et al. 2011). Eye tracking is a tool for studying selective attention during film viewing. It provides a window to learn about what viewers are thinking. Eye movement tracking technology can record eye fixation, saccade and pursuit movement. Just and Carpenter (1976) proposed the eye-mind theory. They think eye movement can connect directly with internal psychological processing. This method is applied in many fields, such as reading, media watching, advertisement design, video game design, human–computer interface for disability, autism, and soon. Analyzing eye-movement data is a popular means of gaining insight into attentional and cognitive processes. In the real world, dynamic objects, dynamic media or dynamic people are visible all over. Compared to static information, people are easier attracted by dynamic information. In recent years, some researchers are beginning to explore the cognitive processes of dynamic media (Mital et al. 2011). Some tools, like DynAOI and GazeAlyze, are also designed to analyze dynamic eye movement data.

In the behavioral economics field, it has been found that emotions can profoundly affect individual behavior and decision-making. Since emotion is the key component for studying a user’s experience, how to read, understand and regulate emotion is essential for media effect study.

Experiment design

Our understanding of how people react and process rich information in the VR environment is still very limited. As emotion has a great impact on attention, we hypothesize that pre-mood will influence attention during film viewing. By using the eye tracking method, we explored the correlation between pre-mood, post-mood and attention while film viewing. Both two methods were integrated and applied in film analysis. The correlation between objective visual attention and subjective psychological experience provides a broader perspective for understanding film cognition and film experience.

Film material

A Disney animated film “the jungle book” was selected because it is the latest live-action and computer-generated imagery (CGI) film. It tells the story of a little boy “Mowgli” living with the animals in a forest. All the animals and settings in the film are created using computer-generated imagery techniques. An 8-min 43 s duration clip was selected from the scene of the little boy going into the deep forest and he was attacked by a snake. As the participants were Chinese speaking, we presented the Chinese versions of the film scenes. The total duration of the film clip is 8 min 43 s. Computer-generated (CG) imagery is one of the key technologies in digital media. It has been applied largely in movies because it can make some fictional characters or backgrounds seem real (Fig. 1).

Fig. 1
figure 1

Film《The Jungle Book》

Participants

34 healthy Chinese college students were enrolled from the universities in Beijing, China. Two of them were excluded from the analysis because their eye tracking ratio is lower than 80%. There are 32 participants analyzed (mean age = 23 years old, standard deviation = 2.57, 25 females and 7 males). All participants were remunerated with 30 Yuan. They reported normal or corrected-to-normal vision. Before watching the film clip, the participants were asked to fill in the PANAS to measure their pre-mood. Then the participants viewed the film clips without a task. After that, they were asked to fill in PANAS again to measure their post-mood (Fig. 2).

Fig. 2
figure 2

Procedure of experiment

Methods

Firstly, the eye tracking technique was applied in analyzing the characteristics of the audience's attention distribution when watching film clips. To record eye movement, we utilized the SMI RED250/500 tracking instruments (Teltow, Germany). The film was shown from an integrated 22" monitor. The system allowed 40 cm × 20 cm at 70 cm distance free head movement. The operating distance is 60–80 cm. The high-speed sampling rate is 250 Hz per second. Nine points variable calibration modes were used in our study. The average tracking ratio of 32 participants is 95.9%. Eye movements are classified into two basic components: Saccade and fixation. Saccade is the movements of the eye itself. And fixation means the times between movements when the eye remains in a single position.

Secondly, a standardized questionnaire method was also used to study the influence of the emotional state on eye movements during film viewing. Emotional experience is very important during watching a film. A standardized questionnaire, named positive affect and negative affect scale (PANAS), was used to measure the variability of emotion before and after watching the film clip (Watson et al. 1988). The scale is a reliable and valid tool to measure positive and negative emotional state. There are 20 adjective words to describe the positive and negative feelings at moment. The 10 items for positive affect (PA) are: attentive, interested, alert, excited, enthusiastic, inspired, proud, determined, strong and active. Another 10 items for negative affect (NA) are: distressed, upset, hostile, irritable, scared, afraid, ashamed, guilty and nervous, jittery. Participants rated each item on a 5-point scale, 1 = very slightly or not at all, 5 = extremely. The total score of 10 positive items and 10 negative items was analyzed. Watson, Clark and Tellegen suggest that the normal population will have a mean positive affective score of 29.7 (SD = 7.9) and a mean negative affective score of 14.8 (SD = 5.4). The Chinese revised version of PANAS has good reliability and validity in the Chinese population.

Results and discussion

Eye tracking results and visual attention in film

Eye movement data will tell us how much people pay attention to the scene. To answer the questions, we have a three-step analysis. Firstly, the overview of eye movement for the whole clip was analyzed. We separated the whole film clip into 9 trials. The duration of each trial is about 60 s. It is found that fixation and saccade variability showed differences during viewing the film clip. We compared the means of these indicators. Fixation duration and saccade count have the greatest differences between different trials. Fixation duration and fixation dispersion have a significant negative correlation (r = − 0.885, p < 0.01). Based on the result, we will further analyze these two indicators in key scenes (Fig. 3).

Fig. 3
figure 3

Overview of eye movement (mean)

Eye movement data will tell us where and how long people pay attention to the scene. We further analyzed areas of interest (AOI) in the specific scenes. Two scenes were selected from the whole clip: scene A from 04:22:338 to 04:29:728; scene B from 08:25:593 to 08:33:220. We defined the boy and the snake as the areas of interest because these two characters are the key components in this film clip (Fig. 4).

Fig. 4
figure 4

AOI analysis of film scenes

In scene A, part of the snake’s body appeared on the tree, and the boy heard its voice but did not find it. The eye movement result showed that 43.75% of participants (n = 14) did not notice the snake in scene A. The fixation time of the boy is significantly longer than the fixation time of the snake (t = 4.293, df = 13, p < 0.01) (Fig. 5, Table 1).

Fig. 5
figure 5

Eye tracking results of AOI

Table 1 Area of interest (AOI) analysis (mean)

In scene B, the snake was approaching the boy and trying to eat him. It is also found that the fixation time of scene B is significantly different (t = 5.067, df = 22, p < 0.01).

The film consists of a series of still images. But we perceive scenes and events as continuous. Continuity-editing rules are popularly applied by filmmakers. The current eye tracking study found that the audiences have fixation synchrony during film viewing. Attentional synchrony is an important cognitive phenomenon mentioned by film editor Walter Murch (2001). Attentional synchrony between viewers during film viewing was confirmed by Nakano’s study (2009). It showed that spontaneous blinks were highly synchronized between and within subjects when they viewed the same short video stories. Synchronized blinks occurred during scenes that required less attention such as an action, during the absence of the main character, during a long shot and repeated presentations of a similar scene.

The eye tracking result also showed that the face of the boy or the jungle attracted most of visual attention in many scenes during film viewing. The faces of the characters in the film, especially the faces of the human being, always attracted audiences’ attention. Facial expression is an important emotional processing component, as subjective experience and neural response (Izard 2009). Facial expression can transmit emotional communicative information and can provide emotional feedback. It is universal for all humans (Ekman 1979).

PANAS result and emotional experience in film

The scores of PANAS pre-test and post-test were analyzed. PANAS scale includes two factors: positive affect and negative affect. We compared the positive affect score, the negative affect score and the fearful score (scared and afraid) of the pre-test and the post-test. It is found that negative affect score in pre-test is significantly lower than the score in post-test (t = − 2.486, df = 13, p < 0.05). The film clips significantly elicited nine specific emotions, including scared, alert, nervous, jittery, afraid, active enthusiastic and proud emotion (Fig. 6, Table 2).

Fig. 6
figure 6

Results of PANAS

Table 2 Statistic analysis of PANAS

The film clips in the present study successfully elicited some anticipated emotions, especially negative emotion. As a reliable media, film can elicit emotional responses, such as amusement, anger, disgust, sadness or surprise. Films have the desirable properties of being readily standardized, involving no deception, and being dynamic rather than static. Films also have a relatively high degree of ecological validity, in so far as emotions are often evoked by dynamic visual and auditory stimuli that are external to the individual (Gross and Levenson 1995).

The impact of mood on visual attention in film

To explore the individual difference of attention in the film, we analyzed the impact of pre-mood on attention in the film. The result showed that pre-negative affect and pre-positive affect have a significant correlated influence on the average fixation of AOI_boy in scene A (F = 214.279, df = 2, p < 0.01). There is no significant correlation between pre-mood and other fixation of three AOIs. Furthermore, we customized different groups according to the fixation account of AOIs.

The audiences showed differences when more than one characters in one scene are shown. It is found that only 56% of participants paid attention to the snake in scene A. The statistical analysis showed that negative and positive affect before viewing the film has a significant correlational impact on the attention of the boy in scene A. The result is consistent with the assumption. It means that the pre-mood significantly influenced AOI attention during film viewing. Furthermore, the correlation between eye movement and the strength of emotional experience is also analyzed. The result showed that audiences with different emotion will notice different objects. It is the direct proof that emotion significantly influenced on dynamic attention during film viewing. Previous studies demonstrate that mood is a powerful psychological force that may affect attitudes, cognitions and behaviors among recreationists. The pleasure arousal dominance theory (PAD) is a typical theory to examine the effects of mood on attitude and behavior (Berlyne 1960; Russel and Snodgrass 1987). It is a multi-dimensional model of mood, consisting of arousal, pleasant and dominant dimensions. The present study revealed the different effects of pleasant dimensions on attention.

Previous studies demonstrated that some factors will influence human attention and memory, such as edited cuts, color or visual continuity. It is found that memory co-determines attention toward a scene is faster when visual continuity is higher, and color contributes to the guidance of attention after cuts depending on better relevance discrimination (Valuch 2015).

Emotional processes not only serve to record the value of sensory events, but also to elicit adaptive responses and modify perception. Neural research revealed that the amygdala plays a crucial role in providing both direct and indirect top-down signals on sensory pathways, which can influence the representation of emotional events, especially when related to threat. These modulatory effects implement specialized mechanisms of ‘emotional attention’ that might supplement but also compete with other sources of top-down control on perception. This work should help to elucidate the neural processes and temporal dynamics governing the integration of cognitive and affective influences in attention and behavior (Vuilleumier 2005).

The impact of mood on attention is also found in a previous study (Wadlinger 2006). The result showed that individuals induced into a positive mood fixated more on peripheral stimuli than did control participants. Participants under induced positive mood also made more frequent saccades for slides of neutral and positive valence. Selective attentional broadening to positive stimuli may act both to facilitate later building of resources as well as to maintain current positive affective states. Fredrickson’s broaden-and-build model explained the impact of positive emotions on visual attention (Fredrickson 1998). It suggests that positive emotions, such as joy, interest, contentment, elation, or love, temporarily broaden an individual's thought-action repertoire, thereby promoting the expansion of attention or interest in the environment and encouraging play and exploration.

Individuals may demonstrate preferences in what aspects of their environment they attend to as a function of their affective state. Anxious individuals, for instance, selectively attend to threatening stimuli in eye-tracking experiments (Hermans 1999; Bradley 2000). Negative or threat stimuli often attract more attention than positive stimuli in highly anxious participants (Mogg 2000).

Summary

In general, the current eye tracking study found that the audiences have fixation synchrony during film viewing. The film clips in the present study successfully elicited some anticipated emotions, especially negative emotions. Furthermore, pre-negative affect and pre-positive affect significantly influenced visual attention during film viewing. In the field of digital media, the integration of the eye tracking technique and the standardized questionnaire is a reliable method to investigate dynamic media. The issue of interaction between digital media and human beings needs more research and discussion.