Keywords

1 Introduction

Photography as a social practice has changed with digitalisation as technological development induces new photographic conventions which include the widespread use of image filters and other types of image manipulation (Johannessen & Boeriis, 2021; Boeriis, 2021). Inexpensive software for digital photo manipulation allows any layperson with basic digital competencies to alter the visual content of a photograph. For instance, it has become less complex to remove unwanted persons or to change the appearance of depicted elements (such as body shape or skin details). As smartphone photography becomes an augmented part of human sensory motor apparatus (Blaagaard, 2013; Frosh, 2015; Han et al., 2017), and as digital photographs are distributed in a fast and vast digital social environment, the communicative practice of photography takes on an almost dialogical form where filters and other photo manipulations are part of the visual vocabulary (Boeriis, 2021). The indexical understandings of photography as documenting evidence still resonate in media discourses around photo manipulation (as can be seen in governmental initiatives for the compulsory explicit labelling of photo manipulation in Norway and Denmark), but the widespread quotidian use of image manipulation has consequences for the conventions of photographic meaning-making in general (Johannessen & Boeriis, 2021) and for the understanding of photographic truth and trustworthiness in particular. This article makes a first move towards elucidating the grammatical implications of digital photo manipulation in order to get a better understanding of the meaning potential affected by the digital manipulations. Thus, the filtered visual dialogue of contemporary photography is examined through undertaking an investigation of different editing options in digital photo manipulation software from a visual grammatical perspective.

The theoretical point of departure is in multimodal social semiotics (Hodge & Kress, 1988; Kress & van Leeuwen, 2001), and Kress and van Leeuwen’s visual grammar (2020) serves as the conceptual framework for the analyses and discussions of a number of options for photo manipulation in digital photo editing software. Kress and van Leeuwen’s grammar takes a contemporary Western cultural perspective on visual grammar (2020), and consequently the insights presented in this article apply only to this context.

From a social semiotic point of view, a photograph can be perceived as a semiotic artefact (text) consisting of multiple meaning-making choices (signs) instantiated from an overall repertory of possible choices (grammar) (van Leeuwen, 2005; Kress, 2010) that utilise semiotic technologies (media) (see Zhao et al., 2014; Poulsen, 2018). In this article the visual meaning-potential of photo manipulation is examined by combined perspectives on both technological editing options in photo editing software and the grammatical repertory of visual communication.

This article defines photographic trustworthiness from a social semiotic point of view as a dynamic phenomenon which relies on a semiotic truth agreement established between a text producer and a text receiver in relation to cultural conventions for photographic truth. Semiotic truth agreements are created in the way photographs relate to sociocultural conventions of claiming trustworthiness in connection to photographic genre, context and communicative aim. In each photo-communicative event, the text producer implicitly establishes a claim to adhere to certain kinds of truth by choices in the design of the photograph, and the degree of trustworthiness can be evaluated against this claim. If a text producer claims to adhere to a particular truth convention but uses photo manipulation techniques that violate this agreement, the text producer is untrustworthy. Therefore, semiotic truth agreements function as an account of how the content is to be received—which type of trueness the photographer vouches for. Different types of photographs have different kinds of conventions (in different contexts) for claiming trustworthiness. For instance, there are different genre traits between a documentary photo and a fictional photo or a work of art.

Manipulated photographs are not deceiving per se in social interaction, and therefore it is important to differentiate between different kinds of semiotic truth agreements and relate these to the various options for photo manipulation provided by digital editing software. This is achieved in the following by a close examination of the relation between technical resources for digital photo manipulation and grammatical resources for photographic meaning-making, explicating the ways in which different visual grammatical systems can be affected by digital post-production—and with what consequences for meaning-making in relation to trustworthiness. In other words, the area of concern in this article is digital post-production, and it explores how different visual meaning-making resources are involved in claiming visual trustworthiness in digital photographs in order to provide a more nuanced understanding of photo manipulation which can inform future critical citizens of a digital society.

2 Theoretical Background

In order to elucidate digital photo manipulation as a communicative phenomenon, and employing a social semiotic approach, this article takes a theoretical point of departure in the way visual meaning potential is affected by digital photo manipulation. Social semiotics is centred around a descriptive ambition of accounting for the resources available for meaning-making in semiotic modes (van Leeuwen, 2005). The article follows this ambition by focusing on both the practical technical resources for digital photo manipulation and the meaning potential which can be instantiated through these manipulations.

Social semiotics originates in Halliday’s Systemic Functional Linguistics (see Halliday, 1978; Halliday & Matthiessen, 2004), which initially described the resources of language. In the late 1980s, scholars in social semiotics began to take an interest in theorising and describing semiotic modes other than language, taking inspiration from Halliday’s general view on communication and semiotics (Hodge & Kress, 1988; O’Toole, 1994; Kress & van Leeuwen, 1996). Multimodal Social Semiotics has since then developed into a diverse, influential paradigm which focuses on meaning-making in potentially any semiotic mode—and on the interplay of different semiotic modes (Kress & van Leeuwen, 2001; Baldry & Thibault, 2006; Kress & van Leeuwen, 2020).

The social semiotic approach describes the practices of meaning-making in context, and the core idea is that members of a given community share knowledge about the available semiotic resources and their corresponding meaning potential (Hodge & Kress, 1988). These resources are conventionalised ways of instantiating semiotic resources which aim to aptly convey specific content (Kress, 2010). The available repertoire for meaning-making is described as grammars of semiotic modes within a community (e.g. language regions) (Hodge & Kress, 1988; Kress & van Leeuwen, 2001). However, crucial to the social semiotic approach is that grammar is not viewed as a set of rules but rather as a description of conventionalised practices for meaning-making in a particular cultural context (Feng & O’Halloran, 2013). Kress and van Leeuwen’s visual grammar takes a distinct common Western point of departure for describing the resources for visual meaning-making (2020).

Individual members of a society develop knowledge of and competencies within the overall semiotic systems of different modes (ontogenesis) through being a semiotically active member of the society (Halliday, 1978). At another timescale, each individual semiotic action (logogenesis) contributes to either sustaining or gradually changing the semiotic systems of the modes (phylogenesis) (ibid.) as the grammars of modes evolve along with cultural changes in the communities (Kress, 2010). Summing up, the societal conventions for photographic communication are on the phylogenetic scale; the individual photographic competencies are on the ontogenetic scale; and the processes of creating photographic texts are on the logogenetic scale (Johannessen & Boeriis, 2021). The social semiotic understanding of semiotic resources and meaning potentials as dynamic phenomena enables analysis of the accelerated phylogenetic development caused by digitalisation and social media (ibid.), and therefore the social semiotic approach is particularly apt for investigating digital photo manipulation as acts of meaning-making in relation to claiming trustworthiness.

Although the social semiotic approach speaks of meaning potential rather than fixed meanings (Kress, 2010), social semiotics has an ambition of being as descriptively precise and exhaustive as possible to provide comprehensive overviews of practices of semiotic modes. Detailed descriptions provide insights into particular semiotic modes and allows for detailed systematic analysis of texts using these semiotic modes. In this article, the intention is to present an inventory of a grammar of digital photo manipulation in order to get a deeper understanding of the meaning-making involved (ibid.).

When describing the resources available for instantiation in particular texts, Halliday has described three general domains of meaning, or three metafunctions, which are realised by different subsystems in the overall grammar (Halliday, 1978). These metafunctions have been carried over to theorising other semiotic modes than language from a social semiotic approach, and this includes photography (Kress & van Leeuwen, 2020). The three metafunctions of language and other modes are: the ideational metafunction, which involves the representation of the world in the text for the reader/viewer; the interpersonal metafunction, which is the enactment in the text of the relation between the producer and the consumer of the text; and the textual metafunction, which is meaning conveyed by the structuring of the text as a coherent whole for the reader/viewer (Halliday, 1978; Hodge & Kress, 1988).

In order to provide a systematic examination of the social semiotic meaning potential of digital photo manipulation, this article is structured along the logics of the semiotic metafunctions, which will be elaborated in separate sections below. However, first the article outlines how different software for digital photo manipulation can modify or transmute photographs.

3 Software

The industry standard photo editing software is Adobe Photoshop, which has been dominant in the photo editing industry for the past 30 years to a degree where the verb construction photoshopping has become a commonly used term for image manipulation. Other professional photo editing software includes, for instance, Adobe Lightroom, DxO, PhotoLab, GIMP, Capture One Pro, Corel Paint Shop Pro, Nik Collection by DxO and so on. In the first two decades since the introduction of Photoshop, digital manipulation of photographs was performed by a relatively small number of experts because it required specialised skills and the software was expensive (Boeriis, 2021, 3). However, in recent years the use of photo manipulation apps for smartphone devices has become a very common practice in everyday amateur photography (Blaagaard, 2013), whether via the in-built proprietary phone system software, common social media apps such as Instagram, or dedicated manipulation apps such as Betterme, Facetune, Canva, Pixlr, PortraitPro and so on. Many of these types of software are typically inexpensive, either free through social media apps or sold at a relatively low price in app stores. Recent editions of photo-editing apps are becoming increasingly advanced, and the gap between what can be done with professional software and phone apps has narrowed.

Digital photo manipulation works on bitmap grounds for calculating adjustments, which entails that the base level of adjustments are alternations in values of colour and brightness at pixel level. For the analysis of photo manipulation, this article proposes a fourfold basic subdivision of the photo manipulation alternations at pixel-level (see Table 13.1):

  1. 1.

    Global modifications are adjustments made at a global image scale level, where all pixels in the photo are affected. These include common filtering effects such as image colour tinting or overall exposure adjustments.

  2. 2.

    Local modifications are adjustments to pixels in selected areas of the image. The affected areas can be selected by many different inbuilt software criteria, ranging from simple marked areas to selected colours or to areas with high contrast.

  3. 3.

    Global transmutations are the repositioning of pixels over the entire image, and the adjustments can be compared to turning, squeezing or stretching a flexible canvas. At pixel level, the colour and brightness values of one pixel are moved to another pixel position in the picture frame, and this happens across the entire picture frame, which can result in distortions of the proportions of elements in a photograph and altering the perceived perspective in the photograph. Rotation effects are global transmutations in which pixels are moved relatively to a selected rotation point in the picture, for instance, in order to balance horizon lines. Photoshop’s perspective adjustments use squeezing or stretching to remedy different optical distortions such as perspectival warps (converging building lines) and aperture distortions (barrel roll or pincushion effects).

  4. 4.

    Local transmutations are radical adjustments in pixel values that can potentially change the content of the picture by overwriting the pixel values with completely different values and adding or deleting elements in the picture. A Photoshop effect that uses transmutation is the Clone Stamp tool, which can sample pixels in one area and paste them into another. This can be used to remedy skin blemishes by copy-pasting skin from a non-affected area onto the problem area. Another Photoshop tool that makes use of local transmutation is the Liquify tool, which can be used for pushing pixels in a certain direction to alter the shape of elements in the picture, for instance, pushing the waistline inwards to make a person seem slimmer or moving the position of the eyes to make a person’s face more symmetrical.

Table 13.1 Fourfold subdivision of photo alterations at pixel-level

Over the past decade, photo manipulation software has become increasingly based on content-aware functionalities—both in professional software and in smartphone photography software. Artificial intelligence and image content recognition have become central to photo manipulation software, not least the functionalities related to local modifications and local transmutations. Early implementations of content aware fill were introduced in Photoshop in 2010 (CS5), enabling computer-generated content in local transmutations. Since then, artificial intelligence has been improved to the point where the 2020 update of Photoshop implemented a machine-learning based object recognition functionality used for selection of objects in the workflow—and the artificial intelligence is continuously being optimised by gathering data about Photoshop users’ editing practices. Artificial intelligence and object recognition also play an important role in many automated functionalities in photo apps, in which the software is able to recognise faces or body shapes and apply pre-set effects to them. This is used in humorous ways in Snapchat, where funny masks that can follow facial expressions are superimposed onto the image of a person—for instance, giving the person the head and face of a kitten. Facial recognition can also be used to apply different beautifying filters, so that the image of a person is automatically enhanced in their selfies (see Boeriis, 2021, 8).

In the following, the theoretical framework provides a cornerstone to understanding the grammar of manipulated photographs. The next sections provide a systematic understanding of the social consequences of the manipulation of photographs by taking a point of departure in the meaning-making conventions of visual communication.

4 Manipulating Interpersonal Meaning Potential

The interpersonal metafunction in social semiotics is the intra-text enactment of the relation between the communicating parts (Halliday & Matthiessen, 2004, 106). In other words, the interpersonal metafunction is concerned with how the text expresses “social relations… between the sign-maker and the sign-interpreter and the people, places and things represented” (Kress & van Leeuwen, 2020, 17). Two general systems express the interpersonal meaning in images, namely point of view (choices of perspective) and validity, (degrees of expressed realism) (Boeriis, 2009). The article turns to the validity first.

4.1 Validity

The grammatical system validity (Kress & van Leeuwen, 2020) focuses on the conveyed and perceived reality of the content (Hodge & Tripp, 1986; Hodge & Kress, 1988), which entails an understanding of “as how true” the represented content is to be taken (Kress & van Leeuwen, 2020, 149). Validity as a multimodal concept is derived from Halliday’s linguistic concept modality, which deals with “the region of uncertainty that lies between ‘yes’ and ‘no’” (Halliday & Matthiessen, 2004, 147), and thus concerns how the truth value or credibility of verbal statements can be modified by the use of auxiliary verbs, adjectives, adverbs and intonation patterns. The multimodal validity system describes the conventions for (figuratively and literally) colouring the multimodal content (O’Toole, 1994, 9), that express subjective tone towards the content. The validity scale runs from highest validity, which is a seemingly objective representation of the content, to lowest validity, which is an overtly subjective representation of the content (Boeriis, 2021, 28). Validity is measured by a number of parameters called validity markers, and according to Kress and van Leeuwen these include colour saturation, colour differentiation, colour modulation, contextualization, representation of detail, depth, illumination and brightness (2020, 156–158). Kress and van Leeuwen’s validity system is predominantly focused on validity at a global level, for instance values across whole photographs, but overall validity can also be affected by local values in validity markers in the subsections of a photograph (see below).

The values of the individual validity markers combine into an overall validity profile, which is evaluated against certain standard validity profiles, called coding orientations, that are typically utilised for conveying different kinds of reality (ibid.,159–65). The more the validity profile of a picture deviates from a standard profile, the lower the validity. High validity conveys that the content is presented with the highest degree of objectivity and no subjective slant, whereas low validity conveys a more subjective tone because the content is presented in a marked non-neutral manner. Lowered validity means more meta-focus on the subjective stance towards the content—which subsequently can have different consequences for trustworthiness depending on the coding orientation of the photo. Kress and van Leeuwen (2020, 164) propose four different coding orientations with individual typical validity profiles relating to different communicative purposes.

The common-sense naturalistic coding orientation is based on what Kress and van Leeuwen have termed a “35 mm photorealistic representation” (ibid.) that is recognised by all members of a culture as being neutral naturalistic photography. Applying filters to a photograph lowers the naturalistic validity because the validity markers are altered away from the validity profile of a naturalistic coding orientation. The heavy filtering of colour, light and detail in post-production can lower validity and render the image non-real to the point where the validity is so low that it is no longer construed as depicting naturalistic reality. High naturalistic validity conveys an implicit assertion that the content is to be taken as a neutral representation, which implies ‘representational trustworthiness’, whereas lower validity conveys a more subjective tone which points towards other types of trustworthiness such as sensory, technological, abstract or emotive coding orientation which are discussed below.

The sensory coding orientation is based on the principles of sensation and pleasure, and is often used in contexts such as art and advertising (Ravelli & van Leeuwen, 2018). The goal of the sensory representation is not high fidelity to a photo-realistic depiction, but rather fidelity to a sensory reality, expressing the aesthetic value of the depicted motif to convey pleasure (or displeasure) (ibid.). The standard profile of sensory validity includes strongly saturated colours, (often) bright light and a lower amount of detail compared to the naturalistic coding orientation. High validity in the sensory coding orientation is related to claiming ‘aesthetic trustworthiness’.

The technological coding orientation is based on a more pragmatic criterion in which validity is related to the usefulness of the visual representations, as can be seen, for instance, in assembly manuals, maps or blueprints in more technical contexts (Kress & van Leeuwen, 2020). If a parameter does not provide information which optimises the use of the image, it does not have a scientific or technological purpose in the image, and consequently it will lower the technological validity. In this context, a naturalistic rendition of a piece of furniture or a building will not be of help in the process of assembling the parts into the final construction. Technological images often utilise low detail, low colour differentiation, no contextualisation and the isometric perspective (ibid., 164). High validity in the technological coding orientation is related to claiming ‘functional trustworthiness’.

The abstract coding orientation is based on a conceptual criterion in which the validity of representations becomes relatively higher the more an image reduces the individual motif to a general conceptual reference, as the focus is then more on the generic and essential qualities than on the concrete (ibid.). The abstract coding orientation is often used in science and high art and is associated with the domain of conceptual knowledge and abstract meaning. Therefore, the ability to produce and/or read texts grounded in this coding orientation is a mark of social distinction, of being an educated person or a serious artist (Ravelli & van Leeuwen, 2018). In the abstract coding orientation, the image is not about the concrete motif per se, but rather the motif functions as a reference to a more generic abstract idea or concept that is under scrutiny or discussion in the communicational context of the image. Thus, high abstract validity is related to claiming ‘conceptual trustworthiness’.

A fifth coding orientation could be called the emotive coding orientation (Boeriis, 2021). This coding orientation was developed to encompass the practices that have evolved in everyday smartphone social media photography in which the fidelity of the photograph is related to how well it conveys the emotions of the photographer at the time of taking the shot. Rather than a naturalisation, or a sensualisation, it is an emotivisation of the motif (ibid.). Moreover, the manipulated emotive smartphone photograph has higher validity if the filtering truly expresses the emotions of the photographer. Consequently, the high validity of the emotive coding orientation is conveyed through non-naturalistic choices in the validity markers. Pre-made filters, for instance, in Instagram, can be highly saturated or desaturated, they can have a low colour differentiation, rendering the shot highly tinted, or a contrastive dark and gloomy expression. Other Instagram users recognise the filter and have a keen understanding of the evolving conventions in the semiotic practice of photo manipulation in social media. When users post filtered photographs on social media, they are expressing feelings, regardless of whether these feelings are exaggerated or understated. Therefore, the emotive photograph can be a tool for communicating subtle emotional states, such as adding a melancholic feel to otherwise positive ideational content, which can convey nuances of the photographer’s feelings about the content (ibid., 35). In the emotive coding orientation, high validity is related to claiming ‘emotional trustworthiness’.

Most photo editing software has options for global modifications of colour, light and detail, which can be used, for instance, to make a photo darker or brighter, to raise or lower contrast, to introduce or remove a colour tint or to make a photo sharper or softer. Professional software and many phone apps offer functionalities for similar local modification that can alter the values of validity markers in selected areas, which impacts the overall validity profile in ways not accounted for by Kress and van Leeuwen’s validity systems. For instance, a very commonly used ready-made effect used for local modification is vignetting, which was originally a flaw caused by optical imperfections in the camera’s lens design, resulting in darker areas in the corners of the image, which is now viewed as having an aesthetically pleasing effect. A type of aesthetic effects commonly found in photo editing software emulate optical phenomena caused by chromatic distortions of analogue vintage cameras, darkroom techniques and not least the decay of old film stock over years (see, for instance, RetroCam, Snapseed or VNTG). These filters often involve split toning, which alters the colour differentiation differently in bright and dark tones through intricate local modifications, such as tinting the shadows blue and the highlights orange. This references the optical decay of vintage film while also having an aesthetically pleasing retro effect. Other examples of technical flaws that have made their way into photo manipulation as aesthetic options are: grains, scratches, flares, and light leaks, which are added as local modifications and transmutations in photo apps and advanced post-production software, such as DxO’s Analog Effects Pro.

In sum, the validity profile—and manipulations hereof—sets the scene for how individual photographs are to be perceived in relation to a given cultural and situational context. The manipulations discussed in the following parts of the article all perform on grounds set by the chosen validity profiles and how these can convey degrees of representational, aesthetical, conceptual, functional, or emotional trustworthiness.

4.2 Offering a Point of View

When composing a photograph, the photographer chooses the perspective by which the motif is to be viewed, which means offering the viewer a particular point of view, or position, from which to observe the motif—from below or above, frontal or from the side, from close distance or from far away, with or without eye contact. The photographer can move the camera to a different position, for instance, closer, to the side or to a higher vantage point, or instruct a model to look into the camera, turn in a direction or to move closer to the camera. The choices of points of view are closely related to the physical placement of the camera in relation to the motif and, consequently, points of view cannot easily be manipulated in photo editing software.

Nevertheless, some options are available for making perspective adjustments in Photoshop or specialised software like DxO Viewpoint, which are designed to repair distorted or converging lines caused by perspective and optical flaws from the lens design (typically used on photographs of buildings or other man-made constructions). By utilising global transmutation in perspectival corrections, it is possible to make small adjustments to the perspective and consequently manipulate the viewpoints, albeit to a limited extent.

However, neither the vertical point of view nor the horizontal point of view is easily manipulated in post-production, as the optical characteristics of the camera lens create a distinct perspective in the image. Attempting to manipulate these points of view entails the use of advanced global transmutation functionalities, such as perspective adjustment in Photoshop, but it is difficult to achieve convincing results with the currently available tools.

Distance can be manipulated in post-production by cropping the image, but it is more complicated to digitally make a shot appear from farther away because this requires complex compositing and often involves using secondary photographic material to fill out the blank space revealed by the new framing. Newer functionalities in Photoshop called Content-Aware Fill, which are based on artificial intelligence, can be utilised for these local and global transmutations, but it is still not an easy task to create naturalistic representations with artificial content.

It is also complicated to manipulate eye contact (Boeriis, 2009) in digital post-production, as changing the direction of the gaze involves complex compositing to move the pupil and iris to a different position in the eye-opening.

Global transmutation of the lateral point of view (rotate) (ibid.) often ranks highly in the adjustment hierarchy in software design. This tool is used to level the horizontal line in the shot, and such adjustment functionalities are often available in the in-built photo apps of smartphones. In some software, there are also aids to help to level the lateral point of view, such as for instance in Lightroom’s Crop and Straighten tool, with which it is possible to mark a line on the image at the desired level, based on which the software rotates (and crops) the picture automatically.

5 Manipulating Ideational Meaning Potential

With inspiration in Halliday’s (Halliday & Matthiessen, 2004) linguistic transitivity systems, Kress and van Leeuwen (2021) describe the conventions for ideational representation in a visual text as an array of processes (what can happen in an image) and their associated participants (elements involved in what happens in an image). In visual texts, the participants are, typically, depicted elements that are represented as whole entities (Boeriis, 2012, 141), such as persons, things or geometrical shapes. Furthermore, groups of elements can function as one participant (Boeriis & Holsanova, 2012, 266).

The following sections investigate how the meaning potential of visual processes and the elements involved as participants can be digitally manipulated in post-production.

5.1 Representing Existence

Halliday’s existential processes (Halliday & Matthiessen, 2004) can extend Kress and van Leeuwen’s visual transitivity system (Boeriis, 2009, 186) with a process type that describes the meaning potential of choosing what is depicted and what is omitted, which entails choices of representation of gender, race, stereotype and so on. In the analysis of existential processes in images, it is typically the case that simply by being there, existents are represented as having importance in the visual text. Omission and inclusion can be the result of simple everyday actions, such as choosing who is invited to the shoot and which way to point the camera.

Post-production manipulation has been performed throughout the history of photography, and perhaps the best-known examples of existential local transmutations are the infamous Soviet photo retouching images in which political opponents were simply paint-brushed out of group portrait photographs (see Boeriis, 2021). However, today these existential local transmutations can be done relatively easily in digital post-processing.

There are several specialised functionalities for manipulating the representation of existence in digital post-production of photographs that mostly operate at the local levels. Cropping is one of the simplest ways to alter existence by simply cutting off the part of the frame where an unwanted element is placed. Other common examples of existential local modification include hiding elements, for instance, in shadows (darkening), in blown-out, bright areas (brightening) or in unfocused areas (blurring). Such alterations also have consequences for the salience hierarchy (see below about foregrounding). Local modifications require software that enables controlled selections of parts of photographs, which is a speciality of, for instance, Photoshop and Lightroom, whereas not all photo apps provide these selection options for individual adjustments. A number of manipulation effects are specialised for making existential local transmutations: Cloning is designed to copy pixels from one area and blend them into another area, which makes it possible to both remove elements from the photo or to place elements which were not in the original shot. Removing elements can be done by painting in new pixels that replaces the element with a credible background. Different types of complex compositing can be applied to achieve similar results. For instance, placing an element in a new layer on top of the original can make it appear that the element was part of the original scenario.

5.2 Representing Attribution

While the existential processes typically relate to whole entities, attributive processes relate to the properties of (parts of) these entities. Boeriis (2009) describes three major grammatical categories of attribution in images which can all be manipulated in digital photo editing techniques, namely intensive attribution (properties of an element in itself), possessive attribution (properties provided by other elements added to, but not part of, an element) and circumstantial attribution (properties of an element caused by its surroundings).

The grammatical category Intensive attribution describes the visual properties of depicted elements in themselves—which mean the look and characteristics of element’s own components, such as size, surface structures, colour or shape (Boeriis, 2009, 192). In digital photo editing, there are numerous effects that are specifically designed for the purpose of retouching attributes in various ways through local modifications. Intensive attributes can, for instance, be manipulated by adjusting colour or brightness, which alters the way the elements look, for instance, brighter skin tone, more pronounced facial features on a person or a more saturated finish on a car’s bodywork. Similarly, cloning can be used to add or remove the intensive attributes of an element at the level of local transmutation. Typically, cloning is used to remove skin blemishes or imperfections in an object’s surface. There are numerous other techniques for making skin appear flawless and some are implemented as automatic functionalities in phone apps, which adjusts the general contrast and softness of skin tones. In Lightroom, there is a ready-made brush called Soften Skin, which is designed for painting in softness to the skin. The marzipan-soft appearance of fashion models’ skin is often achieved through a complicated process in Photoshop called frequency separation, which entails separating the colours in one layer and the structures in another and then retouching each layer individually. The result is that colour blemishes are evened out and structures like wrinkles and pimples are cloned out, while the minor skin structures and pores are still visible. This effect can make the models’ skin look completely perfect. Intensive attributes such as shape and size can also be manipulated by using different kinds of warp tools, such as the Photoshop tool Liquify, by which local transmutations are the result of pixels being pushed in a direction in the photograph, which can make a person seem slimmer, more muscular, more long-limbed and so on. Shape warping is also applied to faces to adjust facial features such as eye size and placement, chin shape, nose size and shape and so on. Photoshop and many photo apps use automated facial and body recognition for aiding the process of selecting the area to transmute.

In possessive attribution elements in a visual text are given additional meaning by elements that are not a part of the element itself, such as clothes, glasses or objects carried in the hand of a depicted person (Boeriis, 2009, 192). Cloning can be used for adding or removing possessive attributes. For instance, if a politician is holding bottle of alcohol in a photograph, the alcohol could be removed and replaced with a bottle of water (or vice versa). Some apps have the option of automatically adding naturalistic makeup to a face or superimposing funny faces onto a person. Such humorous functionalities have not (yet) been implemented in professional software like Photoshop.

Circumstantial attribution describes how elements can be attributed meaning by the surroundings in which they appear (Boeriis, 2009, 193). For example, a person behind the counter in a shop can be construed as a salesperson, and a person behind the steering wheel in a taxi as a taxi driver. Similarly, furniture placed in the dining room is dining room furniture. Through local transmutation of circumstantial attributes, persons can be placed in completely new settings where they may never have actually been. In Photoshop, this entails a precise masking of the person and compositing them onto a different image, which then serves as the background. Typically, local modifications of colour and brightness are needed in order to make a convincing composite. Phone apps such as FaceApp contain specialised functionalities with which comparable results can be achieved using just a few clicks (see Boeriis, 2021).

Attribution is related to three other grammatical categories in which attribution may pave the way for the viewer to either recognise a particular person or place (identification), a certain type or category (typification) or a particular symbolic meaning (symbolisation) (Boeriis, 2009).

Deepfakes (Poulsen, 2021) make radical use of face recognition and identification of facial attributes and expressions to superimpose the face of one person onto the body of another as (inter-photo) local transmutations. In a well-made deepfake, it is impossible to detect the manipulation, which can represent a person as saying or doing something that they have never said or done. A number of phone apps are available that offer deepfake functionalities, such as Snapchat, Face Swap Live or Reface, by which a person’s face can be superimposed onto a famous actor in a short sequence from a famous film scene in GIF format. These functionalities will be developed further in the future as artificial intelligence becomes better at facial recognition and face swapping (ibid.).

5.3 Representing Physical Actions

In visual representation, the equivalent to Halliday’s verbal material processes (Halliday & Matthiessen, 2004) are called action processes (Kress & van Leeuwen, 2020). Action processes can be expressed graphically by means of vectors, such as concrete arrows and indirect pointing lines (ibid., 58), as well as by other indications of energy being expended to make a material impact in the world of the photograph, for instance, depictions of movement, pushing, lifting or holding (Boeriis, 2009; Boeriis & van Leeuwen, 2017).

In post-production, it can be a complex process to add material action to a photograph where there is none because the cues for material processes often include many parameters in various delicate local transmutations, such as the expressed muscular tensions and physical positions of limbs, as well as the body’s interaction with the surrounding setting.

Action processes of movement can also be expressed through certain types of blurring in photographs, which originates from the fact that longer exposure times render moving elements blurry because they move through the frame during the time in which the shutter is open. This technically induced effect has become a convention for expressing (rapid) movement.

Photoshop provides a specialised effect called Motion Blur which offers the option of defining the amount of blur, the direction in which the blur should be added (the direction of the movement that is to be expressed) and how long the blur trail is—the longer the blur trail, the faster the movement. This effect can be used as a local transmutation to add an expression of movement to a particular object, whereas global transmutation is created by adding motion blur to the entire photograph, which emulates camera movement.

5.4 Representing Emotions, Thoughts and Expressions

In photographs, the emotions of depicted figures are read through the way their facial expressions and body language are conveyed in the image, and the visual expression of emotional states are called affective processes (Boeriis, 2009, 189). Emotions are read by looking at the way that the components of the face (and body) are composed (see also Martinec, 2001; Feng & O’Halloran, 2013; Forceville, 2005). For instance, a frown is indicated by the eyebrows being pushed down and lips pressed together, a grin by exposed teeth and slightly squinting eyes and a surprised expression by lifted eyebrows, wide open eyes and perhaps an open mouth. The photographic representation of emotions can be achieved by instructing models or by capturing a shot at the exact moment when a person looks like they are expressing a particular emotional state.

The emotions of depicted human beings can be manipulated in post-production, but it is not an easy task as it involves recomposing the facial features and body language. The processes of manipulating emotions, for instance, involve local transmutations of warping the position and shape of facial structures. In the Photoshop Liquify tool, the specialised face recognition workspace makes it possible to add subtle adjustments to mouth and eyes, which can create changes in the appearance of emotional states in depicted persons. It is also possible to make more extreme alterations, but these will often appear exaggerated and with surreal or comic effects. Some smartphone apps have dedicated filters for changing the emotions of depicted people, for instance, FaceApp has a functionality where the facial expression can be altered to either Upset or a number of different types of Smile, ranging from Tight to Wide. These effects involve local transmutations typically including warping the areas around the mouth, eyes and cheeks, and some also include existential local transmutation where the mouth is rendered as open with artificially added, but natural-looking, teeth.

Speech can be expressed visually by means of speech bubbles (Kress & van Leeuwen, 2021, 63), it can also be deduced from facial expressions even when the verbiage is not represented (Boeriis, 2009, 190; Boeriis & van Leeuwen, 2017). The shaping of the mouth can indicate whether a person is speaking, and in fact any visual mode of expression can be expressed in a photograph: gesture, body language, writing, drawing, dancing, etc. (Boeriis, 2009, 190). Similarly, mental processes can be expressed though thought bubbles as well as through the expression on a person’s face, indicating mental activity of different kinds. Thinking is typically instantiated through combinations of lowered eyebrows, focused gazes and perhaps a hand on the chin (see Boeriis, 2009 for further discussions of sensory, cognitive and affective processes).

Similar to the digital manipulation of emotional expression discussed above, using digital post-production tools such as Photoshop Liquify, it is possible to warp facial expressions in local transmutations to make a person look thoughtful, but it is a complicated process. This functionality is typically not available in the ready-made effects in smartphone photo apps, which indicates that there is less of a demand for this functionality in everyday photographic practice. Many photo apps for smartphones do, however, have ready-made functionalities for superimposing speech bubbles and thought bubbles onto photographs as existential local transmutations (for instance, Instagram, Snapchat and Camera+). It is also possible to open or close the mouth of a depicted person via photo editing, but this demands more specialised photo apps (such as Facetune or FaceApp) or complicated work with warping and cloning in Photoshop.

6 Manipulating Structural Meaning Potential

The textual metafunction is concerned with how the visual text is structured into a coherent whole and as such becomes a text. The weaving together of elements on the visual surface and the relative weighting and placing of elements help lead the viewer through the visual text and provide an understanding of how the elements are to be read in relation to each other (Boeriis, 2009). Manipulation of how photographs are to be perceived as texts includes the internal structural prioritisation and organisation of elements in the frame. Salience and information value are two important systems describing the conventions of structuring of elements in photographs in a Western context (Kress & van Leeuwen, 2020).

6.1 Foregrounding

The textual system salience provides a hierarchy of importance among the elements in a photograph by foregrounding some at the expense of others (Kress & van Leeuwen, 2020, 210). This is achieved by means of contrasting elements relative to the rest of the image by implementing contrast forms such as colour contrast, brightness contrast, focus contrast, framing contrast and size contrast (Boeriis, 2009, 237). By highlighting certain elements or areas, these elements are presented as more important and noteworthy (Kress & van Leeuwen, 2020, 210), and this provides a structuring of the text that gives an overall understanding of the internal structural logics of the layout, which leads the reader’s gaze or directs the attention towards some elements and away from others.

Choices in contrast forms instantiate a strategic salience hierarchy which, for instance, makes it possible to downplay controversial parts—in other words divert the attention by foregrounding some elements and backgrounding others. In practice, this can be achieved by a number of productional choices, such as placing elements in the foreground during the shoot, focusing on selected elements or by placing the motif in a brightly illuminated area of the setting.

Options for manipulation of the salience hierarchy in software are manifold. Quite literally, highlighting areas of importance by making them brighter (dodging) or darkening the surroundings (burning) are both local modifications; making some areas focused and others blurry (sharpening and blur) are also local transmutations. Popular methods of retouching images also include local modifications, including dimming bright areas that demand the viewer’s attention (burning) and removing distracting objects (cloning). Foregrounding can also be achieved by placing elements in front of others in a separate layer of foreground (compositing).

6.2 Placement

Under the term information value, Kress and van Leeuwen (2020) describe how the placement of elements in the (global) pictorial frame gives the elements different structural meaning depending on the area of the frame in which they are placed. In a Western context, structural meaning is related to either the picture’s horizontal axis of left-right which represent relations of before and after (given versus new); the vertical axis, expressing top-bottom relations of generality or specificity (ideal versus real); or the radial axis relations between centre and periphery (centre versus margin) (ibid.).

The framing of the shot has an important influence on the information structure in the photo. For instance, the photographer can choose to put the main motif to the left side of the frame, the given, signalling that it is a known phenomenon and a structural point-of-departure for the shot, or vice versa place it to the right, the new, signalling that in the structure it is presented as something new and perhaps more contestable (Kress & van Leeuwen, 2020).

In digital photo manipulation, the horizontal ordering of elements can be reversed by simply inverting the picture if there are no clear indications of left and right in the shot (for instance, writing), and the inversion of left and right in the mirrored photo will be an unnoticeable global transmutation. Changing the vertical order of elements in the frame is more complex, as a simple vertical inversion would result in the photograph being upside down. A convincing change of the vertical order entails a motif that lends itself well to local transmutations through a process of complex compositing in Photoshop.

Post-productional cropping is a common functionality in photo editing software, which can change the information structure of the original shot. For instance, cropping off the left side of the image will place elements as given in the new framing. Cropping can also change the aspect ratio of the photograph, which may impact on the perceived information structure.

7 Discussion

The trustworthiness of a given photograph can be assessed by correlating a photograph’s claimed adherence to a particular semiotic truth convention to the applied photo manipulation. The analysis demonstrates how all aspects of the visual meaning-potential described in Kress and van Leeuwen’s visual grammar (2020) which are related to claiming trustworthiness can be manipulated in digital postproduction of photographs. Many of these manipulations can be achieved through simple functionalities in the editing software, some of which are even partly automated, while a select few remain more difficult to achieve.

Based on the analysis, this article proposes that photographic claims of trustworthiness can be discussed in terms of the interpersonal validity system because it involves the degree to which the content is to be taken as true, factual and neutral or whether it is to be taken as more commented on, constructed and subjective. Global or local modification of colour, brightness and detail can alter the validity profile of a photograph closer to, or away from, the typical values of a given coding orientation, and in that sense increase or decrease the validity of the photograph. In all coding orientations high validity can be construed as representing the content as real (in different ways) and uncommented, whereas low validity makes the content less real and more commented, thus conveying more explicit subjectivity. The relative impact on trustworthiness by these choices depends on the particular coding orientation.

In the naturalistic coding orientation, the use of overt manipulations will typically lower the validity, whereby images do not claim representational trustworthiness. Nevertheless, overt manipulations do not necessarily equal low validity in all coding orientations. In the sensory coding orientation, overt manipulations can signal aestheticisation, and in the emotive coding orientation overt manipulation can signal a kind of ‘honest subjectivity’. In the abstract and technological coding orientations, heavier manipulation of colour and detail are often expected, as the photographs attempt to draw the appearance of the content away from the factual towards either more generic abstract concepts or to present the motif in a less complex way.

Even if specialised software offers (minor) fine-tuning of perspectives, the systems related to choices of point of view (interpersonal metafunction) are not easily manipulated in photo editing software because it is restricted by the characteristics of the camera lens and the perspective from which the photo was taken. Distorted perspectives create low validity in the naturalistic coding orientation but can be seen as higher validity, an aesthetic choice, in the sensory coding orientation, as well as in the emotive coding orientation, in which such manipulation conveys an unreal emotional state. In the abstract and technological coding orientations, a clear warping of image perspectives will typically result in a lowering of validity.

Manipulation of the physical actions of animate elements are neither easily created nor concealed in photo manipulation. This type of manipulation will most often be overt, thus lowering the validity and, consequently, photographs manipulated in this way will make less of a claim on trustworthiness. Overt digital manipulation that adds actional processes, for instance, by means of arrows or motion blur will in most cases also lower validity in the naturalistic coding orientation. Blurring as global transmutations can be construed as a representation of the photographer’s physical motion and therefore be seen as higher validity in the sensory coding orientation. Local blur transmutations can be seen as expressions of feelings in the emotive coding orientation. Arrows and local blur can cause higher validity because they can be construed as expressions of more abstract material processes (abstract coding orientation) or concrete indicators of actions to be made by the viewer (technological coding orientation).

The digital manipulation of existence and attribution (ideational metafunction) change what is represented in the shot as well as the attributes of represented elements. As shown above, there are several specialised tools for manipulating existence and attribution in different software, and many of these are part of an easy-access workflow in the software designs. Making changes to what elements in the photograph do, feel, think or express are typically more complicated processes that are not implemented as ready-made functionalities in the software—but it is possible to manipulate them all and thereby change the content of the photograph. When these changes are done overtly (exaggerated) in the naturalistic and sensory coding orientations, it lowers the validity and makes less of a claim of trustworthiness. However, in the emotive, abstract and technological coding orientations, the validity may be higher, as it can be used to convey exaggerated feelings, general themes or enhance important details.

The structural systems (textual metafunction) can also be manipulated in digital photo editing. Manipulations of foregrounding and the downplaying of elements (salience) influence the hierarchy of importance, which can be used to divert attention to selected elements and away from others. This can be achieved with ease in all software that enables local modifications. The reorganisation of elements in the photo (information structure) can be a more complex process in editing, which can alter the text-internal logics of a photograph.

Photographic trustworthiness (whether subjective or objective) is obtained as a function of the relation between grammatical choices and the context in which they are instantiated. The meaning-potential of validity and coding orientation is related to contextual and co-textual factors that set the scene for which degree of realness and subjectivity are to be assessed. These co-textual and contextual factors include general cultural conventions, genre conventions, situational agreements and text specific agreements. For instance, if a photograph is part of a documentary text or otherwise claims to be documenting, there are different expectations about the degree and type of manipulation than there are in fictional photographs or art photography. If a photograph is inserted into an overall text in which the surrounding verbal text proclaims that the visual content is to be taken as factual truth, this sets up certain expectations about the photographic content, whereas accompanying verbal texts, for instance, indicating reconstruction or model photo will set up other expectations about the photograph. The insights provided by this article can be useful for further investigation of this interplay between contextual factors and grammatical choices of digital photo manipulation.

8 Conclusion

This article demonstrates that taking a multimodal social semiotic approach is productive in paving the way for much-needed investigations into the meaning-making of photo manipulation practices in contemporary photography in relation to trust, truth claims and truthful representation. In this way, it was possible to discuss very concrete semiotic meaning-making resources that are instantiated in images when there are claims of trustworthy communication. Although these insights are limited to a Western context, they can provide inspiration for exploring other cultural contexts. They provide the first steps towards a systematic understanding of how the technological resources involved in photo manipulation influence photographic communication. In other words, this article provides a step towards a description of a grammar of digital photo manipulation. Based on the proposed analytical framework, future research in this area will be able to examine and criticise the conventions for claiming trustworthiness in photographic practices, including the conventions that have been naturalised in(to) software designs, and thus contribute to educating new generations in critical citizenship in a socially sustainable digital society.