As shown in one of the epigraphs above, images were around long before text emerged (Shlain, 1998). Moreover, we currently live in a world where we are inundated with visual images, along with texts, in every sector of our lives. Images express our surface and inner thoughts as the physical manifestations of streams of consciousness and unconsciousness. With stimuli so abundantly available around us, we seek the most efficient way of acquiring and learning new information. Within the visual modality, images and texts are the most dominant means of information transmission. Although we have achieved the automaticity of reading from years of literacy experience, it takes longer for us to read a text than to perceive an image. We co-use images and words in written communication more than ever before through the use of various forms of emoticons and emoji. Images extend texts. Images are more appealing than texts. Images add features that texts cannot provide, such as color, shade, shape, hue gradation, and orientation.

Text on screen and digitally mediated texts, such as interactive text or hypertext that has hyperlinks to other texts readers can immediately access, are likely to have more images than traditional text, as the digital text tends to imbed image-based visual aids and advertisements. In this context, this chapter briefly compares the image to the text as well as contrasts the functions of the right and left hemispheres of the brain, and then reviews how images are processed compared to texts. Finally discussed are the implications of image and text processing for script relativity.

1 Images: How They Are Different from Words

Is there a truth to the old saying “A picture is worth a thousand words”? If so, what exactly is the distinction between the image and the text? At least four differences between images and words can be found at the surface level. First, images and pictures convey meaning by simulating the appearance of the world, as images are a display of the mental reproduction of the physical world. In contrast, written words convey meaning by using arbitrary symbols (though Chinese characters are much less arbitrary). Second, images are concrete mainly because they approximate reality, while written words are abstract especially in the alphabet because an alphabet, in general, consists of fewer than thirty graphs that do not represent the images, but represent the sounds of a spoken language. Third, images are perceived in a holistic and simultaneous manner (Shlain, 1998; Smith, 1988), while words are decoded in a one-at-a-time fashion as the eye moves linearly, particularly in Western alphabetic orthographies. Logan (2004) claims that the alphabet is processed in a linear, sequential, and abstract manner. This is consistent with one of models of reading that explains orthographic processing in visual word recognition. The SERIOL model of letter-position encoding posits that letters within the word fire sequentially upon reading (Whitney, 2001).

Nature provides raw materials. The brain performs the inner workings to comprehend the raw materials that Nature provides. In order to perceive the world through images, the brain relies on the processes of wholeness, simultaneity, and synthesis. The brain perceives the whole by integrating all parts holistically into a gestalt entirety (Shlain, 1998). Last, explicit training on how to “read” images is unnecessary. As we are endowed with the ability to process images, we make direct and automatic connections between images and reality. In contrast, reading needs to be explicitly taught. Automaticity is never gained without years of continuous reading.

Irrespective of the possibility that photographs can be used in other ways than aesthetic consumerism, as one of the epigraphs indicates, beautifully-shot photographs and well-drawn or well-painted images not only evoke strong emotional or intellectual responses instantaneously in the viewer’s mind, but also help engage and educate the viewer. They also add depth and context to the description of objects and scenes. Hence, images indirectly contribute to the storytelling process, and their impact is vast.

Reading words requires a different process than that of images. Words in alphabetic writing systems are composed of multiple graphs arranged in a linear sequence as in Roman alphabets or in a block as in Korean Hangul. The eye scans a series of graphs to ferret out meaning. An analysis of letter chunks (i.e., words) instantaneously occurs based on graphotactic rules that dictate the plausible combinations and collocational occurrences of graphs within the word because meaning is anchored to the plausible sequence of graphs within the word. To extract the meaning of a word, the brain relies on the sequential, analytic, and abstract processes to discern the orthographic and phonological components of the word.

2 Right Brain versus Left Brain

All vertebrates have bi-lobed brains with mirror-image hemispheres which perform the same type of task in the two hemispheres (Shlain, 1998). However, the bi-lobed human brains function differently with different strengths for each hemisphere. Ornstein (1997) asserts that the left hemisphere perceives the world in a bottom-up process, while the right hemisphere assesses the world in a top-down manner. The corpus callosum connects and integrates the two cortical hemispheres as the bridge of neuronal fibers. Shlain (1998) asserts that, in utero, the right hemisphere of the brain first develops before the left hemisphere starts its way of maturation. The right hemisphere is more sensitive to biological needs and integrates feelings, recognizes images, and appreciates music. It synthesizes multiple converging determinants to help the mind to process the sense organ’s input all at once. It is the right brain that can listen to the sounds of a seventy-piece orchestra holistically and appreciate the harmony (Shlain, 1998). It is also the right brain that can perceive objects concretely. It processes nonverbal information to the extent that a facial expression can be “read” without any attempt to translate it into words. The right brain is also attuned with the animal modes of communication. It generates sensational feelings, including love, humor, or aesthetic appreciation, which are distant from logic and rules of conventional reasoning. As they do not progress in a linear fashion, feelings are experienced in an all-at-once gestalt manner or in a flash like lightning (Shlain, 1998). In short, the right brain cognizes images by simultaneously integrating the componential parts in the visual field, gauging dimensions and distances, and synchronizing seemingly unrelated elements instantly (Brincat & Connor, 2006; Dehaene, 2009; Shlain, 1998). The right brain is good at perceiving space and making aesthetic distinctions in terms of balance, harmony, and the composition of the object in a swift and instantaneous manner. The right brain is associated with “being, images, holism, and music” (Shlain, 1998, p. 21; emphasis in original).

The left hemisphere functions differently from the right counterpart and harmonizes with the right lobe. The left lobe is largely associated with speech and action. Since words are tools for the abstraction, discrimination, and analysis of objects and categories as well as the implements of thought, the left brain tend to engage linear progression or processing. Unlike the right brain, the left brain relies on the duality between me-in-here and the world-out-there (Shlain, 1998). This dualism promotes objective thinking and enhances reasoning skills, which eventually leads to logic. Logic takes linear progression instead of a holistic gestalt processing. In essence, the left brain involves doing, speech, abstraction, and numeracy, which all take linear progression, unlike the right brain’s primary association with being, images, holism, and music (Shlain, 1998). Shlain (1998) further notes “the left hemisphere is actually a new sense organ designed by evolution to perceive time” (p. 23). If this remark can be extended to reading that primarily takes place in the left hemisphere, it would be reasonable to connect it to Dehaene’s (2009) neuronal recycling hypothesis. This hypothesis postulates that reading is a cultural invention to the extent in which the brain utilizes and recycles existing brain networks and circuits in order to be able to read because the neural pathways are not prewired or programmed to reading (Dehaene, 2009; Szwed, Cohen, Qiao, & Dehaene, 2009). Due to neuroplasticity which allows for the brain’s cortical architecture to reorganize and reconfigure for reading, the neurons and the cortex can adapt to the novel function of reading through accommodation (Perfetti & Liu, 2005).

Although the two hemispheres work in tandem with each other, each hemisphere of the brain controls the muscles in the opposite side of the body. The hemispheric specialization or lateralization is asymmetrical. Although the brain lateralization varies across individuals, the general symptoms of the brain dysfunction manifest the stark difference in the functions of the two hemispheres. Shlain (1998) summarizes the difference, based on his own medical practice, as follows: “If a right-handed person has a major stroke in the controlling left hemisphere, with few exceptions, a catastrophic deficit of speech, right-sided muscle paralysis and/or dysfunction in abstract thinking will occur. Conversely, damage to the right brain will impair the afflicted person’s ability to solve spatial problems, recognize faces, appreciate music, besides paralyzing the left side of the body” (Shlain, 1998, p. 18; emphasis in original). This description is consistent with the findings of patients with impaired facial and word recognition (Behrmann & Plaut, 2014). Data from patients and individuals with particular deficits provide valuable information regarding the optimal function of the given hemisphere of the brain because the data allow for comparisons between those with and without particular skills. These data attest to the different roles of each lobe of the brain.

3 How Images Are Processed Compared to Words

Seeing is automatic. According to gestalt psychology, objects and scenes are observed as a whole, which is the simplest form of perception (Smith, 1988). The whole of an object or scene is more important than the sum of its individual parts because observing the whole helps us find the order and unity among seemingly unrelated parts and pieces of information. For example, perceiving multiple flashing lights as a moving image is a result of the brain’s holistic information processing through filling in missing pieces for a whole configuration. Gestalt psychologists postulated that visual information is processed automatically, and that the automatic visual perception organizes the whole scene or the whole object (Smith, 1988). The automatic processing of images is a sophisticated system that not only selects and processes relevant information at a given moment, but also allows the attention mechanism to work efficiently. The ability to discriminate or identify a specific object, image, or word is dependent on individual differences. Images promote our aesthetic appreciation beyond what text can provide.

How the brain sees, perceives, and recognizes objects is one of the most intriguing topics in neuroscience. When we look at numbers, letters, or other shapes, neurons in the brain’s visual center instantly respond to the different characteristics and components of the shape of the stimuli to create an image that we see and understand (Brincat & Connor, 2006). The brain perceives an object in its entirety. This process is complex but swift. Unlike reading, people from different cultures process images in the same way because not only are our brains biologically hard-wired in the same way (with regard to image processing), but also the brain architecture is similar among all human beings to such a degree in which images are automatically processed in the right hemisphere without specific training (Dehaene, 2009; Shlain, 1998). The brain region V1, which is the brain’s earliest visual processing center located at the central posterior of our brains, identifies the simplest forms of images, such as lines and edges of contrasting intensities (Brincat & Connor, 2006). The downstream visual areas (i.e., V2, V3, and V4) work together to process basic visual forms in a goal-directed way or a stimulus-driven way, depending on the viewer’s intention and the task at hand.

Special neuronal pathways in the brain’s visual area integrate an object’s parts into a whole in a fraction of a second upon seeing one part of an object. According to Brincat and Connor (2006), visual processing does not happen in the eye, but happens at multiple stages in the brain, which engages at higher-level stages throughout object-image processing. Once a visual stimulus is presented, neurons in the higher-level visual cortex respond indiscriminately at first, signaling all the individual features within the object. Within milliseconds, the brain begins a rough categorization by putting the slices of information together to construct a whole picture and by exclusively responding to combinations of object-image fragments, rather than individual fragments. Brincat and Connor (2006) have not found a conflict between component perception and pattern perception, suggesting that persistent component regulation occurs in the posterior infero-temporal cortex cells. Responses to shape patterns seem to support the global perceptions of components. However, persistent responses to parts or simpler components could serve to make local structural information available throughout the pertinent process.

The cortex and subcortex in the brain consist of many different structures that deal with complex cognitive demands, such as memory, language, and spatial awareness. Since the brain makes sense of images rapidly, the visual system delicately functions to extract conceptual information from visual input in a fraction of a second. It seems that less than 20 milliseconds are enough to identify and discriminate complex visual input. Potter, Wyble, Hagmann, and McCourt (2014) measured the minimum viewing time required for visual comprehension using rapid serial visual presentation. They presented a set of six or 12 pictures for 80, 53, 40, 27, and 13 milliseconds per picture without an inter-stimulus interval. Results showed that the detection of pictures (e.g., smiling couple, picnic) improved as the exposure duration increased, but participants could accurately detect the stimulus beyond the chance level even at 13 milliseconds. These results suggest that the conscious detection of rapidly presented complex images occurs very quickly and is faster than word recognition. This also suggests that reading a word requires a different process from “reading” an image and that there are more steps involved in reading text.

Multiple strategies seem to be involved in object processing. Qiu and von der Heydt (2005) have examined figure and ground processing to find that figure-ground organization, which refers to a process by which the visual system distinguishes the foreground from background of the image, is encoded using two strategies of computation. One strategy exploits local information, while the other uses the global configuration of contours. Brain region V2 seems to combine microscopic cues for local information with gestalt factors for the global configuration of contours, which influences the response. These two encoding strategies are combined into a single neuron so that brain area V2 processes two dimensional figures as if the objects were presented in a three dimensional context.

Object recognition relies on visual features such as the juncture of two lines meeting at vertices (e.g. T, L). Since written language is a relatively recent invention, compared to spoken language, to the extent that it has not been around long enough to exert evolutionary pressure on our brains, according to Dehaene (2009), visual word recognition makes use of pre-existing mechanisms that have been commonly used for the visual recognition of objects and scenes. Szwed, Cohen, Qiao, and Dehaene (2009) examined the visual recognition of objects and words using invariant visual features to identify whether or not the visual characteristics of letters contribute to the reader’s or viewer’s swift recognition. Szwed et al. (2009) employed a naming task to present the partial pictures of objects and printed words in which either the vertices or the mid-segment line were retained while the other parts were missing. There was no significant difference in the pattern of recognition between objects and words. However, participants were more efficient when vertices were preserved, making fewer errors and responding faster than when the other part of the stimulus was preserved. Overall, the results suggest that vertex invariants are more important for object recognition and that the evolutionarily ancient mechanism that is hard-wired to process objects is being recycled for reading.

Inquiries into the whole versus its parts have been addressed by comparing the visual recognition of faces and words. Martelli, Majaj, and Pelli (2005) have examined whether objects are identified as a whole or by its parts and further whether faces are processed like words, given that words are different from faces qualitatively and that faces are different from words parametrically. As opposed to previous research suggesting that faces are processed as a whole, while words are recognized by parts sequentially, Mertelli et al.’s (2005) findings show that both words and faces tend to be recognized by parts. It seems that faces are recognized differently from objects due to the delicacy of individual facial features. One way to disentangle the intricacy of visual recognition would be to employ visual noise or visual alteration in stimuli. Albonico, Furubacke, Barton, and Oruc (2018) have examined perceptual efficiency and the inversion effect (i.e., difference in the recognition of stimuli between upright and inverted orientations) for faces, words, and houses, given that an inversion effect has been considered to be an index or marker of expert processing. The orientation manipulation yields different effects across faces and words. Results show that the recognition of inverted faces is significantly disrupted and that the recognition of inverted words and houses is minimally affected. Recognizing individual faces seems to take longer than objects because the brain needs to construct an internal representation of a face based on emerging signals for the combinations of face fragments.

Although the brain is built upon the genetic blueprint, the impact of literacy on facial recognition is also noted. Ventura (2014) argues based on a plethora of previous studies that different neuronal specificities for words and faces are involved and that reading acquisition changes face processing because reading competes with the cortical representation and the neuronal coding of faces. Similarly, Dehaene et al. (2010) show that, as literacy skills increase, cortical responses to faces decrease slightly in the left fusiform area, but increase significantly in the right fusiform area. The literate’s and illiterate’s brains seem to be different given the increased lateralization for faces in the right hemisphere among literate individuals. A greater left lateralization for reading and a stronger right lateralization for faces are also found in 9-year-old typical readers and dyslexic children (see Ventura, 2014, for review). Ventura et al. (2013) examine the relationship between literacy acquisition and the processing of faces and houses to explain the brain reorganization pattern. They found by using a face composite task that literate individuals are less holistic than illiterate counterparts in processing faces and houses. They indicate that, due to the brain reorganization resulting from literacy, literates tend to use analytic visual strategies in face processing in a task that requires selective attention to the parts of an object, while illiterates are consistently more holistic in processing faces and houses.

Li and colleagues (2013) have also examined, using an ERP, the effect of literacy on early neural development for word processing and its collateral effects on the neural development in face processing among preschool children. Their findings point toward a significant role of reading experience in the neural specialization for the processing of words and faces beyond the effect of children’s typical maturation. The neural development of visual word processing competes with that of face processing to the extent that the neural specialization for word processing delays the neural development of face processing before the neuronal circuitry is specialized (Li et al., 2013; Shlain, 1998).

More research has also been conducted in this line. Behrmann and Plaut (2014) investigated the hemispheric processing of words and faces in prosopagnosia (impaired face recognition resulted from right hemisphere ventral lesions) and face impairments in pure alexia (impaired word recognition resulted from left hemisphere ventral lesions). Prosopagnosic patients show mild but reliable word recognition deficits, while alexic patients reveal mild but reliable face recognition deficits. The mechanisms of face and word processing seem to be a consequence of interactive learning, which is the result of optimizing a procedure for specific computational principles and constraints upon the processing of faces or words.

Dehaene et al. (2010) monitored brain responses to spoken language, words or sentences, visual faces, houses, tools, and checkerboards in illiterate individuals, adults who became literate in adulthood, and adults who became literate in childhood. Regardless of when literacy was acquired (childhood or adulthood), similar brain organization was found among literate adult participants. As literacy skills advances, the left fusiform is engaged in reading. The left fusiform evoked a small competition with faces, and extended to the occipital cortex and area V1. Interestingly, a significantly reduced activation was observed for checkerboards and faces in the visual word form area. This suggests that words and images are processed in different regions in the brain.

In adults’ brains, faces and words elicit divergent activation in the ventral temporal cortex with faces being selectively activated in the mid-fusiform gyrus and words being activated in the lateral mid-fusiform/inferior temporal gyrus (Cantlon, Pinel, Dehaene, & Pelphrey, 2011). Based on adults’ category-based specializations manifested in the visual regions of the brain (e.g., fusiform gyrus), Cantlon et al. (2011) have investigated cortical representations in children to identify whether these specializations in the brain are driven by “building up or pruning back representations” (p. 191). Four-year-old children were tested on the four categories of faces, letters, numbers, and shoes, using fMRI. The researchers found that the specialization of visual categories in the brain varies depending on the characteristics of the stimulus. Specifically, faces and symbols are doubly dissociated in the fusiform gyrus before children learn to read. In addition, young children’s category-specific visual specialization is sensitive to the degree to which the knowledge of preferred categories increases, while the knowledge of non-preferred categories decreases. This study also indicates that the specializations of different categories, such as faces and symbols, take shape at the age of four when children typically begin to learn to read. Dehaene (2009) summarizes imaging studies that high-amplitude waveforms appear in the left hemisphere for word processing and in the right hemisphere for face processing. He continues to note that “[w]hen the data from multiple [epileptic] patients are placed in a standard anatomical space, faces appear to preferentially engage the right hemisphere, while word responses predominate in the left” (p. 81).

4 (Indirect) Support for Script Relativity

Technical advances in neuroscience over the last two decades have allowed us to unravel the brain’s networks and circuits. Both right and left hemispheres of the brain regulate our perception and understanding through complementary cooperation with each other (Shlain, 1998). When we see an object or scene, both hemispheres engage but the right hemisphere is largely active to process the image of an object or a scene. People with different cultural backgrounds process images largely in the same way, because our brains are hard-wired to process images automatically without any particular training (Dehaene, 2009; Shlain, 1998; Wolf, 2007). The unity of image processing in the right hemisphere across individuals with different cultural backgrounds indicates that all human beings have commonalities in perceiving objects and scenes. This may have to do with the notions that the right hemisphere develops before the left lobe starts to develop in utero, that image processing is an innate competence, and that images are processed in a top-down manner bypassing delicate bottom-up analysis (Shlain, 1998).

In contrast, when we read, the left hemisphere is primarily responsible for the recognition of words. The relative level of involvement depends on the importance and relevance of the stimulus to the viewer or the reader. These processes are a part of universality across both individuals and cultures. Research shows that spoken language and reading are predominantly processed in the province of the left hemisphere (Dehaene, 2009; Ventura, 2014; Wolf, 2007), as reviewed in Chapter 9. Interestingly, the subtleties of language that make it rich are “painted” by the right hemisphere through metaphors. According to Shlain (1998), metaphors are “the right brain’s unique contribution to the left brain’s language capability” and “the synergy between the right brain’s concrete images and the left brain’s abstract words” (p. 20).

Dehaene et al. (2010) also claim that words and images are processed in different areas of the brain with the left hemisphere involved in word recognition and the right associated with image perception. Dehaene (2009) asserts “[o]n the cortical surface, places and faces occupy extended and well-separated areas, but both are very far from the letterbox area in the left hemisphere. The place area, present in both hemispheres, lies close to the brain’s midline, while the face area is principally found in the right hemisphere” (p. 184). Importantly, Dehaene et al. (2010) also note that literacy acquisition changes the neuronal pathways or circuits of processing images. As a consequence of brain reorganization due to literacy, literate individuals use analytic visual strategies for face recognition because it requires selective attention to the parts of a face, whereas illiterates are likely to use holistic processing for both faces and other objects. It appears that faces are processed more like words than images (Martelli, Majaj, & Pelli, 2005). In fact, human faces are the most compound image with the infinite variety of faces and flexible facial expressions.

In particular, the results of literate and illiterate brain functions as well as brain liability collectively offer credence to script relativity. According to Dehaene (2009), due to the genetically constrained circuits of the brain, our learning is intensely constrained by the brain networks and the mechanism is strictly specified by our genes. Dehaene’s (2009) neuronal recycling hypothesis can explain the parameter of the brain’s constraints and its reconfiguration. It postulates that the brain is organized by neural maps that are biologically hard-wired to respond to the outer world and that the brain circuits for cultural tools, such as reading and writing, are not present at birth; as a result, neuronal circuits reorient themselves to accommodate the demands of the evolved cultural activities of reading and writing. Furthermore, the original organization of the brain constrains what can be learned to such a degree that cultural variabilities are limited due to neural constraints. This is consistent with Perfetti’s (2003) universal grammar of reading that explains universality as well as script specificity. This is also related to the system accommodation hypothesis (Perfetti & Liu, 2005). We are able to decode words, regardless of their size, shape, or case (i.e., uppercase or lowercase), as a result of rewiring (i.e., recycling) of the cortical architecture whose original functions have been strengthened for object recognition.

The aforementioned studies have not been carried out to directly test script relativity in comparison to object processing, because script relativity is a new hypothesis that I propose in this book. Experimental research along the lines of the comparisons of image processing and word processing as well as literate and illiterate people will shed light on script relativity. One potential challenge for testing this hypothesis comes with the current trend of digital text that is coalesced with images, especially with social media that are filled with emoticons or emoji. This trend adds huddles for teasing apart true text effects from intertwining effects with images. With this in mind, the next chapter reviews how digital texts are read and processed.