Introduction

Portraits of Christ as a holy person are predominantly en face. This was famously commented by Albrecht Dürer in a self-portrait often called “Myself as Christ”. In contrast, profane portraits were predominantly painted in profile, including portraits of the suffering Christ (“The Man of Sorrows”). Why did medieval artists prefer to paint Christ en face? His face is thus directed toward the beholder, both face and gaze are intently directed at the beholder. Profane faces were noticeably more often painted in different degrees of profile. Is this a result of theological and historical conventions; or are there deeper biological and psychological reasons? Can face orientation and gaze direction influence how we judge positive and negative social attributes? Specifically, are almost symmetrical portraits with a direct gaze more associated with a positive attitude toward the portrait, as indicated by modern-day attribution of positive and negative adjectives to selected portraits?

Today, it is fairly easy to do internet surveys (cf. Folgerø et al., 2016). What such surveys miss out on accuracy and control of the environment and subjects might be more than well compensated by acquiring a large amount of data points. One such popular tool is SurveyXact. If we ask a large sample of people to assign, on a scale from 0 to 10, selected adjectives to portraits we will be able to ascertain if there are significant effects of face and gaze orientation. In Art History, frontality is the way to represent Christ as God. This iconographical way to represent his divinity is called the Holy Face, in opposition to the profile, where he is the Man of Sorrows. Frontally oriented faces of Christ in Western Art are almost symmetric. Previous studies have demonstrated that in the fifteenth and sixteenth centuries, almost all profane portraits (in contrast to the depictions of Christ) were painted in different degrees of profile, with gazes either directed toward the beholder, or averted away. It is rare to see a secular portrait in frontal view (Hodne, 2013). Why did these artists prefer to paint Christ with his face directed toward the beholder, while profane faces were represented in profile? Is it convention, or can there be other explanations? Conventions can obviously mirror biological and psychological trends, but we can see if the psychology is measurable today. One caveat is that we might have learned and internalized the conventions. An interesting follow-up study could be to see if cultures with minimal exposure to Christianity will have similar reactions to the types of portraits as defined by face and gaze direction.

There is a strong tradition in the West of copying the veil of Veronica as a template for the face of Christ. The blood and sweat on the relic were thought to be imprinted on the veil directly by the face of Christ, by the blood wiped from his face during his way to Golgotha. According to tradition, the intensity of Christ’s gaze in the veil made it necessary to cover the relic with a piece of cloth.

The symmetrical face with a strong direct gaze became the standard way to represent the Holy Face in Western art of the Renaissance, both north and south of the Alps (Morgan, 2012: 55–62). In the East, we have a corresponding history of King Abgar (6th c.; Edessa, Syria; today Urfa, Turkey) receiving a cloth with the face of Christ, the Mandylion, glowing with such a power that it was even imprinted on the tile on which it was placed, and this imprint is thought of as an icon not made by hand (acheiropoieton). This was the origin of the mandylion frequently painted and still found in countless churches in the East. The mandylion is strictly symmetric.

Interestingly, Christ the All Ruler or Pantochrator, in the East, is often highly asymmetrical, as in the famous deesis mosaic on the gallery of Hagia Sophia in today’s Istanbul (around 1280), where the left half of Christ’s face is almost in profile (nose close to the golden mean), while the right side is an face. This follows the prescription in the Painters Manual written by the thirteenth-century painter Manuel Panselinos (Torp, 1984).

We may ask whether there could be deeper reasons than pure convention for the strong preference of full-frontal portraits with a directed gaze. Such reasons could point to factors deep in the human emotional responses in face perception.

Neurobiological and Evolutionary Aspects on Preferences of Face and Gaze Direction

A fast capture of head and gaze direction is a significant factor for detecting the other’s intention (Emery, 2000). As such, it represents a selective pressure during evolution to be interested in detecting any signs of hidden intentions, positive or negative. The accurate detection of gaze direction depends on the great contrast between the dark iris and the bright sclera, which is solely found in humans (Kobayashi & Koshima, 1997), which can thus be thought of as an evolutionary trait that made possible shared attention that may accompany learning. In language acquisition, the detection of shared or joint attention, at the lowest level through detection of gaze direction, “may be important for language learning in human infants” (Emery, 2000, 588; see also Dunham et al., 1993; Mundi & Gomes, 1998; Tomasello & Farrar, 1986). Gaze direction was important in human evolution, with precursors in other primates (see Emery, 2000, particularly pp. 584–587).

People who look directly at their counterpart could signal aggression or superiority, as it does in other primates. People may want to monitor the other’s actions, but in doing so they may also express the wish to communicate or care for their counterpart. This also activates the so-called Theory of Mind (ToM)-network in the brain, which is the social network through which people analyze another person’s intentions (e.g., Baron-Cohen, 1997; Conty et al., 2007; Perrett & Emery, 1994; Yang et al., 2015).

A survey may explore how subjects react to the different face and gaze directions. The direction of face and gaze is experimentally manipulated. We have used three face databases.

  1. (1)

    The Dutch Radboud Face Database (RaFD: Langner et al., 2010)

  2. (2)

    The Brazilian FEI face database

  3. (3)

    Holy Face (non-manipulated) and secular portraits from the databases.

Using historical material and modern photographs must take into consideration how close the modern material is to the historic material; whether details, such as the degree of aversion of gaze, is similar across the material. Since we are working with original historical art, we require that the original is not manipulated. Hence, in the study of the Holy Face, all faces derive from non-manipulated original paintings. This is because we do not want to interfere with the artist’s so-called “design stance” (cf. Bullot & Reber, 2013).

The advantage of using modern portraits lies in experimental control. In photographs of frontal as well as half-profile views, gaze directions can be manipulated, resulting in four conditions: frontal or profile view, with directed or averted gaze (cf. Table 1).

Table 1 These sketches illustrate the experimental manipulations in each study (1, 2 & 3)

It is rare to find full-frontal portraits with averted gaze in the relevant time period, so this condition cannot be easily balanced for the paintings. Half of the portraits depicts the face of Christ while the other half restricts on profane faces. As paintings of Christ in frontal view with averted gaze are not available to us (in Western artworks) and since we accept originals only, the design will remain incomplete, and face orientation is confounded with holiness. However, a follow-up could use manipulated images in order to make test the hypothesis. It also makes sense to test the hypothesis on only photographs, and see if full-front direct photographs differ from the Renaissance paintings.

The survey asks the participants how much they agree with adjectives describing the face and gaze physiognomy of the target image. The experiment requires the participant to rate, for example, how authoritarian a portrait is, on a scale from 0 to 10. The adjectives all denote mental traits that are hard to observe directly and objectively. At the same time, they are common, familiar, and often applied as descriptions of a perceived persona. The adjectives belong to two different groups. The positive can be “harmonious”, “trustworthy”, “caring”, “inclusive”, and “respectable”, and the negative: “authoritarian”, “monitoring”, “evasive”, “intimidating” and “dominant”, but other adjectives can be used. It makes sense to check that all the adjectives are approximately equally rare, and have a similar number of syllables, as both frequency of use and syllabic complexity may affect how familiar the adjectives are.

Each study must be evaluated formally by statistical hypothesis testing.

How many respondents do you need? The easy answer is probably more than you think. It depends on the statistical test you use, and the number of items you ask them to judge. One common rule of thumb is that you need about a thousand (independent) data points. Since it is straightforward to get large samples using web-surveys, we could aim for a high number of respondents, for example 200. However, we will also have to consider the number of images that they will judge and how representative they are, as well as the time constraints on the subjects. Most subject might consider spending 10–15 minutes on a web survey, especially if some reward is given. If we assume that each image will take 10 seconds to judge, that will allow for 6 images per minute which gives a hypothetical limit of 60–90 items. If we go for the upper end, many participants may drop out. In the description above, we have four fixed conditions in a two by two design: two levels of face (profile, full) and two levels of gaze (direct and averted). The fully balanced set would thus be a multiple of four, and in addition some training examples and some unrelated fillers.

Participants are required to give their active informed consent by providing their email address, which will provide them a hyperlink to an online questionnaire hosted by the survey (for example, SurveyXact). The invitations could be posted on social media, or through email lists. Each participant must be informed on the principle of voluntary participation, including the right to withdraw from the study at any time without having to justify their reasons for withdrawal. Participants should be informed about the purpose of the study, for example research, and consent to that the data they give can be published. At the same time, they should not be nudged in any direction for the answers they give and they should be encouraged to take the survey seriously. Ideally, there should be a debriefing after the survey where participants can provide information on their experience for taking the survey. Moreover, they should be informed that all data is kept anonymous, which typically means that the link between the answers and the email is lost. The question of anonymity could also involve setting up anonymous accounts for the users, and that way make it possible to link the answers to an anonymous individual. All research must be in accordance with the ethical rules given by the university. For a typical survey, the answers to the questions are not likely to impact negatively on the participants, but it is your job as a researcher to make sure that it is so, and that the participants are informed.

The images may be taken from available databases. Check that you have permission to use the databases for research. One such database is the Dutch Radboud Faces Database (RaFD). Another collection could be Brazilian faces from the FEI face database. The different databases can be used as factors in the analysis. It is also possible to analyze the faces and provide some control variables, for example on face symmetry, eye and mouth width, and eye and mouth symmetry. The databases provide many angles, and it could be an idea to use the full front direct gaze version of each face to provide the control variable. The thought is that the proportions in the face may give additional cues to how we perceive the faces.

We may also sample Renaissance paintings (Fig. 1). The full front averted gaze is rare, so it is difficult to represent fairly, without manipulating the images. New possibilities may arise using modern AI technology for generating portraits (sometimes called “deep fakes”).

Fig. 1
3 paintings. 1, a portrayal of the Christ. Text below reds, holy face frontal view direct gaze. 2, a painting of a man in a collared gown. Text reads, secular half profile averted gaze. 3, a painting of a closeup view of a man who gazes towards his left. Text reads, secular half profile direct gaze.

Renaissance paintings

We measure how 10 adjectives associated with personality traits are rated in the photographs and paintings. Each combination of adjectives and images is presented once, exhaustively for each combination of adjective and painting, in a randomized order for each participant. Let us indicate how the images could be classified into different conditions based on Frontal (F) or Profile (P) view with a Direct (D) or Averted (A) gaze. The categories abbreviated are: FA, FD, PA and PD. In the investigations on the Holy Face, there are only three categories: Holy, (Profane) Direct and (Profane) Averted. One useful technique is the Cohen-Friendly Association Plots (assoc), implemented in the vcd package in R (Cohen, 1980; Friendly, 1992). The plot indicates the deviations from statistical independence (Pearson Residuals) of rows (conditions) and columns (adjectives). The association graph intuitively marks which adjectives are positively (blue) or negatively (red) associated with each image type. For example, a frontal face with a direct gaze is negatively associated with “evasive”, and a frontal face with an averted gaze is positively associated with “evasive”. A sneaky direct gaze from a picture in a profile is associated with negative traits. Figure 2 gives the extended association plot with color-coded Pearson Residuals (cf. Meyer et al., 2003).

Fig. 2
An association plot includes Frontal Direct, F. Averted, Profile Direct, and P. Averted on the left has adjectives like harmonious, trustworthy, inclusive, respectable, authoritarian, and dominant lie on the x-axis. Each Pearson residual is represented by its area therefore, no y-axis is plotted.

An extended association plot with color-coded Pearson Residuals (Meyer et al., 2003). Dutch Radboud Faces Database (RaFD). Abbreviations FD = Frontal Direct, FA = F. Averted, PD = Profile Direct, PA = P. Averted; harm = harmonious, trus = trustworthy, cari = caring, incl = inclusive, resp = respectable, auth = authoritarian, moni = monitoring, evas = evasive, inti = intimidating, domi = dominant

As we have indicated, it is possible to build a more advanced generalized linear model, with more control variables. Assuming that we have found adjectives that correctly associate with positive and negative value assignment, we can make this assignment explicit by multiplying the ratings for the negative adjectives with a constant −1. The assumption is confirmed by analysis of association. If positive and negative adjectives are assigned at random, we expect the values to sum near zero, i.e., a neutral evaluation on average. We should make use of a binomial distribution, as we do not have a continuous response variable.

An analysis of all experiments can be done using a mixed effects model implemented in the LmerTest package (Kuznetsova et al., 2015, cf. Schaalje et al., 2002) in the R statistics software (R Core Team, 2015). It is possible to build up specific models. One problem is that we cannot simultaneously model specific variance stemming from subjects as well as adjectives, or images. The following formulas are just some suggestions. The “~” can be read as “is modelled by”, or “is estimated by”, a “*” indicates “interaction effects and main effects”, and within parenthesis are the “random effects” that we might think of as the sources of variance. The “|” can be read “for each”, as in “(1 | adjective)” is “a different intercept (starting point) for each adjective, and “(type | adjective)” can be read as “a different intercept, as well as a different slope for each adjective compared to the (first) baseline adjective”. The notation is very efficient for specifying a model, but we need to be aware that to estimate a slope, we need at least two levels of a factor, for each factor the right of its “|” sign. For example, we cannot have a slope per participant for “personal gender”, as there is only one gender per subject within the experiment, and similarly if the adjective is associated not (fairly) associated with each type we cannot estimate slopes for the type per adjective. (Type indicates the combination of face and gaze as a four-level factor).

  • score ~ type + (type | subject) + (1 | adjective)

  • score ~ type + (1 | subject) + (type | adjective)

We may include gaze and face direction as two fixed factors. We include slopes for all combinations of gaze and face direction for either the subject or the adjective.

  • score ~ face * gaze + ((face * gaze) | subject) + (1 | adjective)

  • score ~ face * gaze + (1 | subject) + ((face * gaze) | adjective)

We suggest to include different starting points for each image as well. Such a model might not have different effects (slopes) for the combinations of face and gaze.

  • score ~ face*gaze + (1 | subject) + (1 | item) + (1 | image).

The analysis will have to consider that we are using a dependent variable (score) that is not a continuous variable, and it is not necessarily on an interval scale either. This demands that we use a different family of distributions to evaluate the model. The recent developments in statistical testing are to use a model testing that is closer to machine learning techniques, and focus on how well the model explains the variance in the data. In that advanced analysis, the residuals should be randomly distributed and follow a normal distribution closely (if the residuals are random, and the model is good). This is only hinted at in this presentation, as the field is developing very fast and we do not want to mislead our readers that there is an easy formulation for making the best possible model. We would simply like to take the opportunity to point out some possibilities, which may be implemented in several ways, and hint at the work needed to analyze the model to see if the model has a reasonable fit to the sampled data. We would also recommend to try to sample as much data as possible, with both a high number of participants, and a high number of items, as well as using control variable. One useful control variable would be the presentation order of the images. It is expected that participants will become increasingly more comfortable with their decisions through the experiment, but we currently cannot reliably measure how long the decisions took in an internet survey, as the computer equipment will be different, and the presentation rate may be affected by internet quality and bandwidth.

The main benefit of surveys is that the number of participants can be very large. This gives the possibility to test much more material, as each participant might judge different items. In order to control the drop out rate, and fatigue effects, we need to limit the size of the experiment for each subject. The drawbacks of surveys are that it is difficult to know who the subjects are, in what environment they are solving the task, what equipment they are using, and the bandwidth of their internet connection, as well as other sources of uncertainty. The data quality may also put limits on what is possible in the analysis. Judgements are categorical data, and that limits our statistical tests. However, many of the drawbacks of surveys can be argued to be compensated by the access to much more data: more subjects, more items, and thus more data points. It could even be argued that the potential complexity of the (uncontrolled) environments could make the experiments more ecologically valid, as the participants are solving the task “in the wild”, although the situation is still limited by the use of computer equipment, even if the computer could be at a coffee bar, a student room, or in the back seat of a car.