Not many published works reveal new approaches to understanding methods, and few works are devoted to the relationship of understanding with perception, despite their explicit connection (Clark, 1977; Bodalev, 2012; Matyushkina, 1965; Brudnyy et al., 1991; Doblayev, 1982; Zinchenko, 1997). At the same time, some researchers have shown the influence of perception on understanding (Znakov, 1991; Znakov, 2002; Belobrykina, 2016; Znakov, 2013; Kvasyuk, 2010; Kostyuk, 1988). Levkova (2009), for example, considers perception and understanding as being different sides of the same phenomenon: Perception reflects the process, and understanding is the result. This conclusion was formulated in the context of studying the process of reading. It was revealed that reading text cannot take place without perceiving, and understanding what has been read. To achieve this, a person is required to launch mental activity (Levkova, 2009).

The definition of the term “understanding” is defined by Dunker through the concept of gestalt, which refers to the visuospatial form of perceived objects and which is key in studying the processes of perception and structuring of human sensations: “Understanding something means acquiring a gestalt or seeing its functional place in the gestalt” (p. 33)(Brudnyy, 2002; Leontyev, 2005). Gelmgolts & Maklakov (2002) noted, that “in acts of perception the unequivocal understanding is carried out. . . . Perceptual activity itself is one of the forms of cognitive activity, one of the forms of understanding” (p. 58).

Being a unique phenomenon, understanding is a holistic integrated process in which the differentiation of individual cognitive functions is extremely conditional, since it is not only difficult to consider them separately from each other—where, for example, perception ends and thinking, speech, and imagination begins—but it is almost impossible to fix them by psychological methods (Zashchirinskaia et al., 2015a; Solso, 2006; Keltner & Haidt, 1999; Pelc et al., 2006; Bransford & McCarroll, 1974; Portnova, 2016; Shinysherova et al. 2018; Rehbein & Saalmann, 2009). With a high degree of accuracy, the objective picture of the features of the participant’s perception, which is significant for revealing the features of understanding, allows us to fix instrumental methods of measurement. The mutual determination of understanding and perception noted by individual scientists determined the direction of our research, the purpose of which was to identify and characterize the interrelationships of perception and understanding, defined by the example of images and texts with varying degrees of blurring of the perceived stimulus material.

Method

Subjects

The study involved 25 students, ages 19 to 25 years, from the Faculty of Psychology of Saint-Petersburg State University (1–4 courses). Of these, six were males and 19 were females. The research revealed descriptive characteristics that act as conditional standards for the specific understanding of images and texts with different degrees of blurring of the perceived stimulus material. This is the definition of conditional standards for the relationship between perception and understanding (on the sample of the student subjects).

Methodology

To identify patterns of interconnection between visual perception and understanding of texts and plot pictures, the following methods were used.

The survey method

I used a questionnaire that aimed to identify the specifics of respondents’ understanding of images and texts. The questionnaire contained multiple-choice questions (where participants had to choose one of the suggested answers) and open-ended questions (which required a free form of response). Participants were asked to write a brief summary of what they saw (for each stimulus) and describe heroes, their actions, and plot. Also, each participant estimated the degree of his or her understanding of the texts and comics on a 6-point scale (0 = I did not understand anything; 6 = I understood everything). In addition, the subjects described the elements in the pictures and texts that attracted the most attention, which allowed us to determine their areas of interest and compare how those areas coincided with the fixed attention zones identified by recording their eye movements.

Eye tracking

Method of distant eye-movements recording

We conducted the study using the EyeTribe eye tracker with Ogama software. We used heatmaps for illustrations and calculated statistics for selected areas of interest.

Method of wavelet filtering of texts and images

The stimulus material was verbally and nonverbally presented to participants using two Asus notebooks, with a screen resolution of 1,680 × 1,050. As examples of verbal patterns, text images were used (eight texts in the preliminary study, three in the study of high school students) consisting of fragments of Russian folk tales in black letters on a gray background (see Fig. 1a). Fragments on average contained 700 characters, including spaces and punctuation marks. As nonverbal patterns, the comics of Herluf Bidstrup, the Danish caricaturist, were used (seven “elements” of the plot were used in the preliminary study—two in the high school students study; data on schoolchildren are not represented at all; see Fig. 1b).

Fig. 1
figure 1

Examples of incentives. a Verbal stimulus (Russian folk tale “Death of the Mean”). b Nonverbal stimulus (Bidstrup comic “Jump in Height”). Visual filter was applied on both stimuli with a maximum frequency characteristic position of five cycles/deg during the test

The images and texts were processed by wavelet filtration. The term “wavelet” was first encountered in the works of Grossmann and Morlet (1984, pp. 723–736), in which they tried to analyze and carry out digital processing of the properties of seismic and acoustic signals. Thanks to the wavelet transformation of one-dimensional signals, their analysis can be performed in two spaces at once: a two-dimensional sweep is performed, in which the frequency and coordinates act as independent variables.

With the help of wavelets localized in time or frequency, the signals and functions are represented in the form of wavelet superposition’s at different levels of signal decomposition. Currently, wavelets are used not only for processing and analyzing various signals but also for processing, data filtering, and for compression and image processing (Bilynsky et al., 2017). All images were preliminarily subjected to wavelet filtration by convolution with a difference of Gaussians (DoG) function, a wavelet that is the difference of two two-dimensional Gaussian functions with different half widthsFootnote 1:

$$ f\left(x,y,\sigma \right)=\frac{1}{2\pi {\sigma}^2}\exp \left(-\left({x}^2+{y}^2\right)/\left(2{\sigma}^2\right)\right)-\frac{1}{2\pi {N}^2{\sigma}^2}\exp \left(-\left({x}^2+{y}^2\right)/\left(2{N}^2{\sigma}^2\right)\right) $$
(1)

where σ is the standard deviation and N is the scale factor—in our case, equal to 2.

According to Marr (1982), the best model for them is image processing by a set of ∇2G filters, where ∇2 is the Laplace operator (d2 / dx2 + d2 / dy2), and G is the Gaussian function. Processing the image with the ∇2G filter allows you to select sharp brightness changes corresponding to the boundaries of objects on the image, and, by changing the half width of the Gaussian, it is possible to allocate borders at different scale levels. Marr proposed the representation of ∇2G filters using the difference of two Gaussian, that is, the DoG function. At present, this technique is widely used in technical vision (Buyakas, 2009).

An analogue of DoG functions for the visual analyzer are receptive fields with circular symmetry, which are typical for the retina and lateral geniculate body. The optimal stimulus for such cells (i.e., image), which reacts only a group of neurons with similar in frequency response receptive fields, is an image whose spatial-frequency spectrum lies within the frequency band to which the given group of cells reacts. These requirements are met either by the DoG functions themselves or by images digitally filtered using them. By changing the value of the standard deviation σ in Equation 1, it is possible to shift the position of the maximum of the frequency characteristic of the filter, which allows us to distinguish different spatial-frequency ranges. In the present study, filters with the following positions of the maximum frequency response were used to process images with comics: 5.0; 3.75; 2.81; 2.11; 1.58; 1.19; 0.89; 0.667 cycle/deg. The ratio of adjacent frequencies is 0.75. To process images with texts, another set of filters was used: 5.0; 4.0; 3.2; 2.56; 2.05; 1.64; 1.31 cycle/degree, with the ratio of adjacent frequencies equal to 0.8. It is important to emphasize that for texts and comics, frequencies with variable pitch were chosen. Nevertheless, in both cases, the first filter gave an image that shows all the details, and the last one was completely blurry, in which it is impossible to recognize any objects (see Fig. 2).

Fig. 2
figure 2

Examples of filtering nonverbal stimuli with specified positions of the maximum frequency response

The method of content analysis

This is a method of qualitative and quantitative analysis of the content of the text, consisting in the systematic fixation and quantification (calculation) of certain units of the content of the text under investigation (Makarov, 2001; Lamminpiya et al., 2015; Denham, 2006; Rumelhart, 1977).

Content analysis allows us to convert qualitative information into quantitative indicators and distribute them according to predefined (preset) categories. The result of a quantitative analysis of the content of the data can be parameters such as the frequency of occurrence of certain semantic concepts in the text and the degree of expression of their interrelationships. When using this method, it is possible to extract (extract, define) categories that combine words according to a certain feature. Reliability of content analysis is provided by peer review, when several people who do not know the essence of the hypothesis being tested assign the concept to a category. After the independent classification by all experts of the units of analysis, a joint discussion is held on the adequacy of each designated category and its harmonization.

This method was used to identify features of the understanding and interest of the subjects to the presented texts and comics. Based on the analysis of the seven participant’s answers by experts, specific evaluation categories were formulated (through generalization). As a result of the established classification, each response of the subject was characterized from the point of view of the presence of indicators and semantic units in it, allowing the statement to be assigned to a certain evaluation category. At the final stage, the experts calculated the percentage of subjects surveyed, in which the semantic units of each class they established were observed.

Mathematical and statistical methods of data processing

Statistical processing of data was carried out using the IBM SPSS Statistics program. To study the relationships between individual parameters, the Spearman rank correlation coefficient was used for varying degrees of blurring. To compare the indices of understanding and oculomotor reactions in texts and comics, a criterion of Wilcoxon’s significant grades was used within each group. In order to identify statistically significant differences between groups, Sheffé’s multiple comparison method (based on the variance analysis) was used for the given indicators. When calculating the frequency of occurrence of respondents’ answers, the Pearson chi-square test was used.

Procedure

The procedure was carried out according to the following algorithm: on the monitor screen, participants consistently demonstrated verbal and nonverbal patterns in the form of texts and pictures in the order from the most clear to the most diffuse. The study used 15 stimuli (seven texts and eight images). The distance from the subject’s eyes to the monitor, on which test images with texts and comics were presented, was 60 cm. The procedure of the investigation was carried out on an individual basis. After viewing each stimulus, the participants were asked to fill out a short questionnaire aimed at revealing the features of understanding the texts and comics presented to each of them. In cases where respondents could not (for certain reasons) independently fill out the questionnaire, an oral questionnaire was used, which enabled them to receive the most comprehensive and meaningful answers using the specification technique.

In the course of the study, the participants were given the following instruction: “Now, comics will be presented on the monitor screen (there will be two), and then texts (three). Your task: Try to understand the meaning of images and texts. All incentives are blurred and will appear in order from clear to less clear, when it will be difficult to recognize. By time you are not limited, just tell me when you are ready. While viewing pictures or reading texts, eye movements will be recorded with this device (pointing to the eye tracker). Therefore, it is very important to look at the screen at all times and not be distracted, otherwise the device will not be able to fix its gaze. After viewing each comics/text, you will need to fill out a short questionnaire, and then we will proceed to view the next stimulus. Please look at the questions. Do you understand everything?”

Results

In the course of the pilot study, it was possible to identify at what level of the erosion of the stimulus image the participants had difficulty in recognizing the materials, as well as compare the understanding of the subjects of the texts and comics at different degrees of their visual complexity. The mean values of understanding by sample were calculated. It was found that participants had significant difficulties in understanding texts after the fifth stimulus, and in the understanding images after the third (see Fig. 3).

Fig. 3
figure 3

Results of the distribution of data on the parameter of understanding texts (left) and comics (right) by the participants

In the pilot study, the participants were offered images (comics) that had eight degrees of blurring for viewing. Each comic contained eight elements of the plot.

According to the pilot-study data, with well-viewed images of the comic, the duration of the fixation is related to the recognition of the person depicted. It was important for participants to understand immediately who the main character is and what his actions are, leading to the final result. It should be noted that the participants focused attention first on figures that are meaningful for understanding, which in turn can indicate the selectivity of perception. It is revealed that they deliberately ignore the intermediate results when viewing the images (see Fig. 4a).

Fig. 4
figure 4

Images with different degrees of blur. a Comics “Jump in Height” (first degree blur). b Comics “Banking Operations” (second degree blur). c Comics “Winner” (third degree blur). d Comics “Caricature” (fourth degree blur). e Comics “Cupid Miscalculated” (fifth degree blur). f Comics “Fishing” (sixth degree blur). g Comics “Jazzbandist’s Wife” (seventh degree blur). h Comics “Instant Photography” (eighth degree of blurring)

With the second degree of blurring, a tendency appears in a more careful examination of not only the figure but also of specific details important for understanding the plot. With these details, the dynamics of the entire plot and the final result are connected (see Fig. 4b).

With an increasing degree of blurring (third degree), the participants immediately concentrate their attention and determine the main subject whose behavior affects the dynamics of the entire plot in the pictures. In this case, a complete image of a person is perceived, and not individual details of the image. When assessing the result, and for the final conclusion in the analysis of the image, the participants have longer fixations on those characters that contributed to the end of the situation. Their actions in the analysis of the incident become a priority in the visual perception of the image (see Fig. 4c).

With a further increase in the degree of blurring (fourth degree), the indicated features of the picture are preserved. This increases the length of fixations on the details of the image of people, which are identified as being the most significant for a common understanding of the series of images (see Fig. 4d).

When analyzing the results of comics with the fifth degree of blurring, the duration of fixations as a whole increases—their range (number) increases when each individual picture is viewed (see Fig. 4e).

When looking at comics of the sixth and seventh degrees of blurring, the previously revealed tendency is also preserved: for understanding the image as a whole, the participants testify to the last (eighth) picture reflecting the final result of the characters’ actions in the presented plot (see Fig. 4f–g).

With the eighth degree of blurring, when it is almost impossible to determine what is depicted in the pictures, the participants begin to focus on the first fragments (details) of the image, but gradually their recognition efforts are reduced. The trend found at previous degrees of blurring, when participants tried to focus on the last picture and draw a final conclusion about the situation, is violated. The cognitive strategy of the participants in this case differs by an attempt to slowly and differentially view and strive to understand the detail of the image (see Fig. 4h).

An analysis of the empirical data of reading the texts (7) with varying degrees of blurring showed that the participants are also characterized by an emphasis on the main characters, and, to a greater extent, at higher levels of blurring of the stimulus material (see Fig. 5d).

Fig. 5
figure 5

Examples of blurred texts. a First degree of blurring. b Second degree of blurring. c Third degree of blurring. d Fourth degree of blurring

At the first three levels of text blurring, there is a tendency to consider its ending in order to understand the meaning of the entire text (Fig. 5a–c). Starting with the fourth level of blurring, this trend disappears. At the sixth level of blurring, participants’ strong-willed efforts are preserved in the reading: They fix attention on certain words, trying to read something, and the number of fixations and the length of the text viewing increases. However, when reading the last text, which is not possible due to a strong blurring of the stimulus, participants’ efforts to recognize the text are reduced, which is reflected in a reduction in the number of fixations and the time for viewing the stimulusFootnote 2,Footnote 3,Footnote 4,Footnote 5.

On the basis of statistical data processing, the average assessment of participants’ understanding of texts is reduced from 5.64 to 4 points (or this is the average number of incentives). When viewing comics, the average values of the degree of understanding are reduced after the third stimulus (from 5.56 to 4.44 points).

Figure 6 shows the mean values of understanding by the participants of each stimulus.

Fig. 6
figure 6

Mean values of understanding scores. Diagram of the distribution of indicators of the average values of subjective assessments of participants and their understanding of texts and comics at different degrees of visual complexity

It should be noted that despite the low degree of blurring, the average score of understanding the first text and comics is lower than with more blurred stimuli (4.72 and 5.28, respectively). Such results might be caused by the novelty of the materials presented and the individual characteristics of the participants, including the specificity of their perception (Vygotskiy & Davydov, 1982; Zhinkin, 1964; Zashchirinskaia et al., 2015b). Seeing the text or comics for the first time, and having no previous experience in solving such tasks, participants might become confused, which, accordingly, could raise their level of anxiety, which, undoubtedly, will affect the evaluation of understanding. Subsequently, participants adapt to similar tasks, and their subjective assessments of the understanding of what they see increase (Zinchenko, 1998; Kostyuk, 1959; Leontyev, 2005; Lidin, 2004). Hence, in our opinion, the effect of learning reveals itself, in which a change in action occurs as a result of the acquisition of a certain experience (Torndayk & Uotson, 1998; Menchinskaya, 2004; Nalchadzhyan, 2009; Rishar, 1998; Rubinshteyn, 1958). However, this effect is manifested only as long as the degree of blurring does not prevent the subject from deconstructing the text or comics. At the same time, participants find it more difficult to understand the incentives that lie in the area of maximum blurring.

Discussion

In the course of the experiment, after completing the review of each image and text, the participant was asked to fill out a short questionnaire aimed at identifying the features of understanding. In particular, the participants had to assess the understanding of each stimulus on a 6-point scale, briefly state the meaning of what they saw, and also assess how interesting the texts and comics were.

Participants’ answers reflecting their interest in texts and comics were processed using the nonparametric chi-square Pearson criterion. The participants had to answer yes or no to the question of whether they found something interesting in the texts and comics. After looking at comics with first-degree blurring, 84% of the participants answered in the affirmative. Comics with third-degree blurring were noted as being interesting by only 56% of participants. The first degree of blurred text was interesting to 76% of participants, the third degree, 48%, and the fifth, 56%.

According to the conditions of the experiment, the participants needed to assess not only how interesting comic books and texts were but also to briefly argue their position, which enabled them to determine informatively loaded categories on the basis of the content analysis of the answers (Myasishchev, 1995; Mey, 2001; Epshteyn, 2007; Bamberg, 2011). The obtained results confirm the tendency of participants to identify key figures that are significant for understanding the details and actions of the main characters on the presented subjects (Belobrykina & Vorobyova, 2015; Gruschka, 2011; Bahtin, 1986; Matsumoto, 2003; Saarni, 1999).

This is confirmed by the statements of participants whose content contained words directly related to the main characters and their actions—for example: “boy” (56% of the surveyed), “adult” (64%), “child” (32%), “failed leap” (84%), “robber” (100%), “bank robbery” (92%), “depositing stolen money in another bank” (88%) “girl on the bench” (88%), “attempts get acquainted” (64%), “fight” (96%), “artist” (80%), “first/second man” (80%), and so forth.

Tables 1 and 2 show that the greatest interest for participants was caused by a certain moment, a hero, an event, or an action presented in the text or comics. Also, most participants pay attention to the features of the narrative in the text (a way of expressing the author’s thoughts, certain words). In addition, they are interested in showing their own emotions in the analysis of the comics. However, there is a significant percentage of the surveyed who found the observed texts and comic books boring, because they did not differ in novelty or originality of the plot. Moreover, when viewing/reading comic books and texts with high degrees of blurring, some participants admitted that these stimuli did not interest them because of the inability to disassemble anything (category “influence of external factors on perception”).

Table 1 Indicators of the percentage distribution of data on the parameter of the manifestation of the interest of the participants to texts
Table 2 Indicators of percentage distribution of data on the parameter of the manifestation of interest of the participants to comics

Interrelations between individual parameters of perception of stimuli (with different degrees of blurring) with understanding were determined using the Spearman rank correlation coefficient (Brudnyy, 1975; Agafonov, 2006; Znakov, 2007).

Conclusion

In general, the following features of perception of visual stimuli by the subjects of different degrees of blurring can be noted:

  1. 1)

    When participants analyze texts and comics, it is common for them to fixate on what is important for understanding the details and characters of the presented plot.

  2. 2)

    When comprehending the comic books observed, the subjects are characterized by the selectivity of perception, which manifests itself in ignoring the intermediate results and transition to the final actions of the characters, which are key to understanding the entire plot of the image as a whole.

  3. 3)

    Participants differ in the level of volitional regulation of behavior: even when reading a blurry, difficult to comprehend text, they try to identify the main characters and use them to recreate the meaning of the text.