The impact of attractiveness on job opportunities in Italy: a gender field experiment

This paper assesses the impact of being attractive and not being native on the gender gap in the opportunity of obtaining a job in Italy. To do so, we propose a field experiment that consists in sending 9680 fictitious curricula vitae to real firms looking for employees. We estimate an Heckit model in order to consider different response from firms and then to calculate the probability to receive a callback. We show that gender gap in opportunity of receiving a callback is a very important issue and this gap is affected by interaction with both attractiveness and not being italian natives, especially for the most qualified jobs.


Introduction
In this paper we aim to shed light on the profile of discrimination in the Italian labour market. In particular, we deal with gender discrimination, focusing on how it is shaped by not being Italian or being a physically attractive candidate. For this purpose, we carry out an empirical analysis using a database created ad hoc by sending fictitious curricula vitae (hereafter CVs) to real job openings. In particular, we have sent the same resume with the same skills several times to all the job postings displayed online between September 2011 and august 2012, changing the attached photo or attaching no photo at all; we also sent CVs of Italian candidates with photo 1 3 in order to calculate the impact of attractiveness for Italian candidates. In addition, our sample includes CVs of Italian and immigrants candidates without photos. This feature allows us to determine the impact of being immigrants compared to being Italian, regardless of the impact of attractiveness. It is worth to note that the scale of our experiment is larger than that of similar analyses.
The focus of our research is twofold: on the one hand, we consider the differences in the skills of the individuals who are employed and, on the other hand, we analyse the reasons why job recruiters perceive the various candidates differently. Consequently, we will try to answer the following questions: do employers discriminate against women and men more if they are not attractive or if they are foreigners? How does gender-based discrimination interact with that based on attractiveness or nationality in different kind of jobs?
In comparison to previous papers on discrimination in the labour market based on field experiments (for a complete review, see Rich 2014;Baert 2017;Neumark 2018), we build a large database which is unique in several respects. Indeed, we analyse the marginal impact of discrimination on the basis of the joint effect of gender and attractiveness and also that of gender and not being a native Italian. Moreover, while other articles basically analyse only the main characteristics of the job (e.g. hard work or front office), we also collected data on the types of work (managers, technicians, sales, etc.). In this way, we can study the impact of discriminatory variables such as gender, physical appearance and nationality on each type of work. This kind of investigation is uncommon in Italy (Patacchini et al. 2015). Actually, several scholars study gender inequality and gender discrimination in Italy, analysing the gender pay gap with non experimental tools (see, for example, Naticchioni and Ricci 2012;Mussida and Lucarelli 2014), while literature on this topic generally focuses on attractiveness and racial discrimination against the differential salaries (Campos-Soria and Ropero-Garcia 2016). In our paper, we identify the gender gap with the difference in callback rates, since we have sent fictitious CVs to companies that require work. This allows us to study the gender gap in opportunities and not in wages.
One criticism is that several analyses on the influence of attractiveness and nativeness on the creation of a gender gap during the hiring process consist of small samples of students who answered hypothetical questions about hiring decisions. Instead our analysis is based on a much larger sample of real job openings posted by actual employers. The underlying idea is thereby to evaluate whether attractiveness and nativeness interact with gender gap according to the different kind of job.
A second criticism revealed by most of the experimental studies concerns the impossibility for researchers to control for employee qualifications and skills. Conversely, the design of our experiment gives us complete control and observability over candidate backgrounds: since all the applications should fulfill employer requirements, for each job offer, we sent CVs of applicants who are identical in terms of education, work experience, language and computer skills, but they are associated to different names, nationality, sex and pictures (or lack thereof).
Our experiment has never been conducted before in Italy. We chose this country mainly because it is considered one of the main fashion countries in the world where physical appearance has always been considered very important. Moreover, Schwab (2019) in the Global Competitiveness Report 2019 mentions that Italy is ranked 62th out of 141 countries, in terms of "Female participation in the labour force". It represents an improvement with respect to 2015 (93rd out of 144), but the issue of gender discrimination in Italy still exists. Therefore our investigation may be helpful in explaining the low participation of women in the Italian labour force market. Some contributions confirm that, historically, Italy represents a country in which beauty and attractiveness have always played a relevant role, as underlined by Gundle (1997Gundle ( , 1999: "Feminine beauty has been more discussed, appreciated, represented in art and associated with national, cultural identity in Italy than in any other country". Finally, in Italy immigration represents a relative recent phenomenon which drove the country from an emigration to a new immigration nation (Bauer et al. 2000). According to ISTAT (2017), in 2016 immigrants were more than five million, reaching 8.3% of the Italian resident population. This makes Italy a perfect country to also study the impact of not being Italian native (from now on we will use the word "native") on the gender gap. In fact, our results show a huge gender gap in Italian labour market that increases when gender interacts with not being natives or with a lack of attractiveness, especially when the job requires a high qualification.
The paper is organized as follows: Sect. 2 reviews the main literature on the topic, while in Sect. 3 we describe the data used in the empirical analysis. Section 4 focuses on the statistical methodology we applied and Sect. 5 presents the main empirical results. Finally, Sect. 6 concludes.

Conceptual framework
Our paper is inspired by several contributions about occupational segregation and gender gap. According to Bergmann (1981), "Occupational segregation is the distribution of workers across and within occupations, based upon demographic characteristics, most often gender". Dolado et al. (2004) and Meulders et al. (2010) claim that occupational segregation in the USA seems lower than in Europe. Within the European context, Dolado et al. (2004) and Campos-Soria and Ropero-Garcia (2016) find that northern countries almost always have been characterized by higher levels of occupational segregation than southern ones. During the past decade northern countries have undergone a desegregating process whereas Southern countries, especially Italy and Spain, have experienced an increase in segregation (Bettio and Verashchagina 2009). According to Simon (2012) and Campos-Soria and Ropero-Garcia (2016), segregation seems to explain a larger share of the gender wage gap, which appears to be substantial in the Italian labour market.
In our paper we follow the approach of gathering experimental data on gender differences in promotion opportunities proposed by Baert et al. (2016): "the research question determine the data to be obtained instead of the data determining the questions that can be asked". Baert (2018), underlined that studies based on self-reported information from employers suffers from at least two methodological limits. First of all, employer's attitude and behaviour may not reflect their actual hiring believes 1 3 (see also, Pager and Quillian 2005). Second, they could adapt their answers to the perception of what is socially considered desirable (Azmat and Petrongolo 2014). Baert et al. (2016) overcome the methodological problems coming from laboratory and field experiment using fictitious job applications in order to investigate gender gap in the opportunity to receive a callback, instead of investigating on wage gap. 1 Bygren et al. (2017) and Brandén et al. (2018) use the same methodology: the former contribution tests gender discrimination in Sweden labour market and investigates whether a fatherhood premium or a motherhood penalty exist, while the latter studies the trailing spouse phenomenon. The use of fictitious CVs does not allow researchers to investigate on actual wage offers, but only on the hiring opportunity throughout callback rates. Consequently, we set the following hypothesis.
Hypothesis 1 Gender gap in opportunities depends on the characteristics of workers, but also on characteristics of the industries.
The second part of the hypothesis is common to studies that does not use field experiment and investigates on wage differences. For example, Meyersson et al. (2001), Bayard et al. (2003) and Korkeämaki and Kyyrä (2006) suggest that gender gap could arise even if there is no strong wage difference within each job, because female dominated jobs are in lower paying industries than male dominated ones (see also, Campos-Soria and Ropero-Garcia 2016).
Focusing on the analysis of gender gap regardless to the impact of the most male and female dominated industries, would produce an underestimation of gender inequalities in labour market. For instance, economic theory on family migration for job reasons suggests that couples move for the sake of the man rather than the woman because investing in the man's career produce benefits for the couple, or the family, as a whole (Mincer 1978). This makes a greater level of segregation for jobs that imply a higher level of migration, and it depends on the characteristics of the job (Brandén et al. 2018). For the same reason fathers are usually advantaged over mothers in the labour market (see, Charles 2011;Bygren et al. 2017). Moreover, Baert et al. (2016) claim that women receive fewer callback for job interview if they apply for higher positions due to the general distaste of employers, coworkers and customers to collaborate with them. Åslund and Skans (2012) show that removing names from the applications reduce the difference between men and women in terms of callback rates and in terms of discrimination. This is the reason why our analysis of gender gap in the Italian labour market includes the different categories of jobs for which candidates can apply.
Following the pioneering paper of Dion et al. (1972), physically attractive people are perceived to be more sensitive, kind, modest and outgoing. According to Feingold (1992), a robust association for both men and women exists between physical attractiveness and numerous personality traits, such as social skills, mental health and intelligence. His main idea is that companies prefer attractive rather than unattractive people because attractive people are considered to be more competent. Hamermesh (2013) discussed the advantages of beautiful people in labour, loans and marriage markets, in sales and in happiness.
Several laboratory experiments have investigated the role of beauty in the labour market (see, for example, Cann et al. 1981;Hamermesh and Biddle 1994): man and attractive candidates are significantly preferred over women and their non attractive counterparts, even after evaluating specific skills. Moreover, the penalty for not being attractive is greater for women than men, and it is robust across occupations. Moreover, the advantages from attractiveness can be gender specific depending on job type (Heilman and Saruwatari 1979;Parrett 2015), years of career (Biddle and Hamermesh 1998). Such preference persists even if physically attractive workers do not appear to be more skillful than less attractive ones (Mobius and Rosenblat 2006).
Field experiments confirms preferences for physical attractiveness. Busetta, Campolo and Panarello (2020) manipulates CV photos digitally in order to have both normal weight and obese applicants, finding evidence of discrimination against the latter group in the Italian labor market. A similar approach was taken previously by Rooth (2009). The analysis starts as an experiment on obesity but it ends up as an experiment on attractiveness and obesity. The results show that both men and women receive significantly lower callback rates in the event of obesity, but also that the results tend to be driven by obesity for women and by attractiveness for men. Ruffle and Shtudiner (2015) respond to advertised job openings in Israel, finding that CVs of women with no picture have a significantly higher callback rate than those of attractive or plain-looking women. 2 Lopez Bóo et al. (2013) find that attractive applicants are called more often than unattractive ones, but unlike Ruffle and Shtudiner (2015), they find stronger effects among male candidates. An Italian field experiment using fictitious CVs is conducted by Patacchini et al. (2015). They study the interaction between homosexuality and physical appearance and find a strong penalty for homosexuals and a beauty premium for females only but this premium is much lower when the "pretty" woman is skilled. Baert (2018) investigated the impact of the public information about job candidates emerging by the profile picture on Facebook on hiring choices. He found a strong effect of attractiveness, which become even higher when applicants are highly educated and recruiters are female. From the above literature we can formulate the following hypotheses.
Hypothesis 2 Attractiveness matters, there is a beauty premium and a penalty for non attractive employees.
Hypothesis 3 Gender gap and attractiveness interact. The interaction is different among job types.

Hypothesis 4
The beauty premium and the differences among genders are not related with individual skills.
As we already mentioned, the discrimination carried out on non native applicants is crucial. One of the most important factors to define the success of immigrants integration is their participation to the labour market. In this regard, the situation in Italy does not seem to be favourable. Indeed, recent immigrants to Italy seem to struggle in the labour market, dealing with significantly lower wages (Venturini and Villosio 2008) and higher unemployment rates (ISTAT 2013). Moreover, they are usually forced to find employment in lower skilled occupations, not suitable to their education (Riva and Zanfrini 2013). Their income is about 20% lower than Italians' income, skills and jobs being equal, and this gap doubles to about 40% in terms of family disposable income (Saraceno et al. 2013) when we consider female candidates, which makes discrimination further increasing.
There are many papers similar to our analysis in terms of experimental design on racial discrimination. Bertrand and Mullainathan (2004), for instance, use different names distinctly associated with Whites and African Americans and find large racial differences in callback rates. Applicants with White sounding names need to send about 10 CVs to get one callback, whereas those with African American names must send approximately 15 resumes to receive one callback. This 50% gap in callback rates is statistically significant, and a White name yields as many more callbacks as an additional eight years of experience. Since applicants names are randomly assigned, this gap can only be attributed to name manipulation. Carlsson and Rooth (2007) adopt a similar methodology in studying the situation of Arabs in Sweden. They find large differences across occupations, with callback rates varying from 10% for computer professionals to over 100% in the case of shop sales and cleaning. Drydakis and Vlassis (2010) analyse the labour market opportunities of Albanians in Greece and conclude that Albanians face a 43% smaller chance of access to occupations, and also a significantly lower level of insurance coverage. Wood et al. (2009) conduct a correspondence test in Britain, finding that there are considerable gaps in callbacks between whites and several different ethnical groups. Oreopoulos (2011) analyses response to online job postings in Toronto to investigate why immigrants struggle in the labour market. He finds substantial discrimination against applicants with Indian, Pakistani, Chinese, and Greek names compared with English names. Busetta et al. (2018) form a database of ficious CVs of first-and second-generation immigrants and find that ethnic and gender discrimination in the Italian labour market is significant.
Hypothesis 5 Racial differences matters in terms of labour discrimination. Discrimination is stronger in high specialized jobs than in low ones.
It is worth to note that, previous papers find changes of the gender gap connected to the interaction either with attractiveness or with nativeness, but none of them investigates on the impact of both attractiveness and nativeness on the gender gap at the same time as we do.

Experimental design
In literature, conventional labour force and household surveys collect data which can not be easily used to measure discrimination or to analyse its mechanisms. This is because they do not contain all the characteristics that employers observe when hiring, promoting or setting wages. The difficulty in using conventional data has led many scholars to use one of the two main field experiment techniques, namely audit testing (situation testing in the UK), and correspondence testing. In the correspondence testing (see, Baert 2017, for an overview of these experiments), one member of the minority and one of the majority group apply for the same jobs in order to check for discrimination. This is different from audit testing which does not involve individual testers. In this second case, pairs of written applications are sent to job openings, making the applications similar in all relevant aspects, but the one to be tested (Bursell 2007).
The first strand of the literature, such as Goldin and Rouse (2000) who examine the effect of blind auditioning on the hiring process for selecting musicians for symphony orchestras, measures the amount of gender discrimination. Many other contributions carry out an audit technique (see, among others, Bertrand and Mullainathan 2004;Carlsson and Rooth 2007;Drydakis and Vlassis 2010;Wood et al. 2009;Rooth 2009;Ruffle and Shtudiner 2015;Lopez Bóo et al. 2013). The weakness of this method lies in three main aspects. First, even if attempts are made to match auditors on several characteristics and to train them for several days, it is not always possible to eliminate all the differences between auditors (see, Heckman and Siegelman 1993;Heckman 1998). Second, such methods are not double-blind: as auditors know the purpose of the study, they could behave in such a way as to influence data either in favor or against the existence of discrimination (Turner et al. 1991). Third, audit methods are extremely expensive, therefore it is difficult to generate samples large enough to avoid significant differences in results between couples. Given these weaknesses, we assess the impact of gender, attractiveness and not being native on job opportunities, following Baert et al. (2016). Hence, we carry out a field experiment based on the correspondence technique by sending 9680 European format 3 CVs to 1210 firms looking for employees.
The Italian labour market represents an adequate context for our research, especially regarding the impact of physical attributes. Since the Italian habit is to send CVs without photos, this lack should not penalize applicants because it does not necessarily indicate that they are bad-looking. However, lately many websites suggest 1 3 to include the photo, a request motivated by the use of a professional dress code. Thus, even in the presence of the picture there should not be any kind of penalty.
We sent all the resumes in the period between September 2011 and August 2012. At that time, the use of social networks was not yet widespread, so the probability that false profiles were discovered by potential employers was very low. We regularly scrutinized job postings on all the main online job service websites 4 offering positions in Italy. We chose only the websites that require no registration in order to prevent firms to detect that CVs in question were fictitious.

CVs sent to employers
As our goal is to obtain as many responses as possible from employers, we included in the CVs all the characteristics required by the advertised job postings so that the applicants would not be perceived as overqualified. Using this procedure, we sent eight CVs to each advertised job posting, 5 identical in every respect except name, surname, nationality and photo (or lack thereof). In this way, we intended to polarize the results focusing on the effect of attractiveness, nativeness and gender, regardless to other differences in candidate's profile. Thus, all applicants for the same vacancy have the same characteristics in terms of age (28 years old), education, 6 and amount of work experience. The only other difference in the CVs was font and font size as in Rooth (2009).
Consequently, each employer received 8 CVs from 4 females and 4 males, 7 of 28 years old each one, living in Rome, and with exactly the characteristics required for the job. We randomly matched first name, surname and photos. Four CVs contain photos of applicants representing attractive and unattractive Italian women, and attractive and unattractive Italian men respectively. The remaining four CVs have no photos attached and concern Italian, and North African women and men. These CVs do not include photos because we are not interested in evaluating the impact of 5 As a precautionary measure, in order not to let employers realize that they were receiving identical CVs, we staggered the dispatch of the CVs to the same firms over a few days. As each firm receives thousands of CVs, we are convinced that receiving eight CVs over few days should not make them suspicious, neither loosing too much time in relative terms. For the same reason, we used different names and addresses. All the addresses belong to the city of Rome in order not to make the scrutinizers perceive the candidates as different because of where they lived. Finally, we randomly chose the order of CVs sent to the same firm. 6 To prevent the scrutinizers being influenced by the prestige of the school or the university in which the applicant had studied, we used institutions considered comparable. For the CVs with university degree, we used "La Sapienza" University, the largest university in Italy and located in Rome. 7 In this context, the best experimental design would be to send applications with identical information to the same employer, except for the photo. As pointed out by Oreopoulos (2011), such a strategy would be impossible to implement without employers becoming suspicious. Therefore, we decided to associate a different name and address to each different photo (or no photo included). attractiveness on immigrants opportunities to be called for a job interview; in other words, we aim to use them as control variables.
The design of our experiment gives us a complete control over candidates' backgrounds. This is in line with, as previously illustrated by Ruffle and Shtudiner (2015) and Lopez Bóo et al. (2013). Indeed, our applicants are very similar in every respect for each kind of job offer as long as the differences only concern gender, nativeness, photo and attractiveness. This methodology ensures that perceived productivity characteristics on the supply side are held constant.
The degree of differential treatment can be noticed from the number of callbacks for a job interview. Differences in response rates between candidates can only be due to different pictures or lack thereof. Responses are considered callbacks if the employers invite an applicant to a job interview. In order to minimise the inconvenience of an interview, we promptly declined any invitation via email, addressing the reason of a previous acceptance of another position.

Ranking photos by attractiveness
In choosing the pictures to be included in the resumes, we selected 20 male and 20 female photographs from the internet, modified in order to make them unrecognizable. All faces used in the photos are caucasian smiling individuals, thus eliminating racial preferences. One hundred students (50 women and 50 men) from the University of Messina were invited to evaluate the CV photos using a score between 1 and 10. We then calculated the total score obtained by each photo, and identified unattractive/attractive males and females as those obtaining the lowest/highest scores (see Table 1). In order to test the robustness of the ranking, we calculated how many students assigned the maximum to the most attractive male and female photos. The inverse procedure was followed for the less attractive photos. 8 Overall, the most attractive woman has received a score of 858, while the least attractive 247, the most attractive man received 926 points and the least attractive 223. Since a large majority of students (over 85%) agreed on who is the most and least attractive among women and men, we assume that our rankings are not influenced by any subjectivity.

Assigning attributes to candidates
Unlike the procedure applied by Rooth (2009), once we made the association between names, surnames, address and photos, we kept it for the entire experiment.
In practice, we maintain consistency in the application, since we can not exclude that the same vacancy may be present in different job offers on the web. Accordingly, whatever is the distortion produced, it will affect each job offer in the same way. 9 In order to minimize the effects of differences in names influencing our results, we chose the most common first names and surnames in Rome, home to the eight fictitious applicants 10 (see Table 2).
As regards names, surnames and addresses of the fictitious foreign candidates, we apply the same procedure. Since we aim to assess the impact of nationality on the probability of obtaining a job interview, we assign to foreign candidates the most common non white nationality of immigrants in Italy, which is Moroccan. We made this choice also because Moroccan are recognizable as not being Italian, which make them potentially subjects to higher levels of discrimination based on physical appearance or on religious reason. The foreign applicants have the most common surnames in Morocco, namely Elalawe and Benkeran; the chosen names are Mohammed for males and Fatima for females. We created eight email accounts, Basically, we customize the CVs sent for responding to each job posting. In practice, we merge each association among name, surname, address and photo, with the personal characteristics in terms of education, work experience, language and computer abilities which completely fulfill the skills required by the job. This design strategy should prevent any matching problems by making each candidate comparable with the other applicants. Table 3 summarizes the characteristics of job openings in our dataset. The variable names in parentheses are those that will be used in our analysis. The target population is equally divided into CVs which include photos of an attractive (variable A) and an unattractive (U) person, CVs including no photo of an Italian and a foreigner (F), and CVs for men (M) and for women (W).

The data
Sending CVs with no photos allows us to consider as a benchmark the Italian individuals with no information on their attractiveness and to control for discrimination based on not being Italian. Moreover, our design strategy of sending fictitious CVs that exactly meet the firms' requirements should exclude any difference in the rate of response, due to matching problems among candidates.
Being aware that beauty might be relevant and contribute to worker productivity in some of the fields, we divide job positions into front and back office tasks. Thus, we classify all job openings according to whether the position involves face-to-face contact with the public. In particular, we define as front office jobs (FO) those which either explicitly state that the job requires face-to-face contact with people, or where such contact could be unequivocally inferred from the job advertisement. Otherwise, the job is classified as back office. For instance, we include jobs belonging to fields like sales and customer service in the first category. By contrast, in the back office category we put jobs like accounts management, budgeting, industrial engineering, and computer programming. As for front and back office jobs, we define as hard work (HW) jobs for which physical strength is explicitly required, or those for which it may be unequivocally inferred. Otherwise, they are classified as jobs which do not imply hard work.
In our analysis we also consider that the job offers may a high school diploma (High), a university degree (Grad) or may not require any qualification. In terms of functions offered, we follow the International Standard Classification of Occupations (ISCO) and the target population of job offers is divided into the 1 3 following types of work: managers, professionals, technical jobs, clerical jobs, commercials, skilled and elementary occupations. Table 3 also reports the distribution of callback rates. Focusing on attractiveness, in our sample the attractive Italian people have a callback rate of almost 50%, while unattractive applicants and Italians with no photo reach 13.512% and 37.975% respectively. Discrimination on not being Italian also appears to be significant, if we consider the markedly lower callback rates of 10.620% associated to foreign candidates. Furthermore, men get 28.926% of callbacks, while women 27.087%. The callback rate for back office jobs is about 29% and that for front office jobs is 26.644%; 26.385% is the callback rate for jobs involving hard work, and 38.420% for those not entailing hard work. In terms of qualifications required, we get the highest callback rate of 33.596% for jobs that do not require any qualification, while jobs for graduate candidates and jobs for high school diploma candidates reach 27.705% and about 23% respectively (see Table 3). In terms of the ISCO classification, we have approximately 47% for services, about 37% for elementary occupations, 26.492% for wire workers, about 34% for craftsmen and skilled workers, and definitely lower callback rates for managers, and scientific and intellectual professions (around 16% and 26%). Table 4 shows the correlations between job characteristics and classification. The matrix highlights that graduate jobs are negatively correlated to sales, front office and hard work jobs, while in case of executive and specialized jobs they are strongly positively correlated. Obviously, vacancies requiring high school qualifications are strongly negatively correlated to those requiring university degrees. Front office jobs are highly positively correlated to sales staff, and finally hard work is strongly positively correlated to service work, workmen and to unskilled work, and negatively correlated to technical jobs.

The model
As we mentioned in Sect. 3, we sent 9680 CVs to 1210 advertised job openings. Since we received 2711 callbacks that correspond approximately to the rate of 30%, our proposed analysis aims to inquire which are the principal attributes that affect the probability of obtaining a job interview. The dependent variable is the dichotomic variable RESP), which represents the employers responses; it is equal to 1 if the employer emailed the applicant to invite him/her for an interview, and 0 if the email was not sent. Since 312 firms did not reply at any CV, we must also consider a distinction between responding and non responding employers in our analysis. From the statistical point of view, the possibility that firms can not reply implies a problem of selection bias which can arise from censoring. In order to address this problem, we use the so-called Heckit model (Heckman 1979); this method was introduced to correct the selection bias occurring in nonrandomly selected samples and provides consistent estimates which eliminate the specification error in the case of censored data.
Analytically, our Heckit model is where N is the sample size. The first equation is the "selection equation", where s i * is an unobservable latent variable that is positive if the employer replies to the applicant, ′ i is the 1 × m containing all the characteristics that determine whether the reply is made or not, while is a vector of unknown parameters. The second equation is the "principal equation" that consists of the linear model of interest, where RESP i is the dependent variable, i is an k-dimensional vector of exogenous variables, and is a vector of unknown parameters. The potential sample  selection bias in this equation has been corrected by inserting the inverse Mills ratio i = ( i )∕Φ( i ) as additional regressor, where (⋅) and Φ(⋅) are respectively the density and the cumulative density of the standardised Gaussian. The explanatory variables in i could also be included in i and viceversa. 12 Moreover, we assume that the random disturbances i and i are i.i.d. and jointly distributed as a multinormal random variable with zero mean and a full covariance matrix. When the covariance between disturbances is non-zero, OLS estimation yields biased and inconsistent estimates of (see, Heckman 1979). In our proposed model, selection bias arises because the callback RESP i is observed only when the ith CV is sent to those employers who reply at least one time. In this case we observe the variable REPLY i = 1 as the approximation for the latent variable s i * ; the model is estimated by maximum likelihood (ML) with robust standard errors, which jointly returns consistent and asyptotically normal estimates of both equations in the system (1). Table 5 contains some descriptive statistics regarding the number of no reply on respect to ISCO classification and Italian regions.
There is no regularity among regions, while the percentage of no reply is significantly higher when firms are looking for a managerial position, probably because it is a more competitive sector. Therefore, our analysis consists of estimating three different heckit models based on three specifications for the vector i . For all models the benchmark applicant is Italian and applies in Lazio (whose capital is Rome), sending a CV with no photo attached. Moreover, the selection equation about the probability of receiving an email with the invitation for an interview is always estimated via a probit model where the row vector of the explanatory variables is partitioned as where contains the job sectors based on ISCO classification. The reference sector is represented by the variable Technicians i which is excluded to avoid exact collinearity, 2. � i = i i i i is composed by the job characteristics (variables FO i for front office and HW i for hard work) and the education level (variables Grad i for graduation and High i for high school diploma) dummies, 3.
′ i and ′ i are sets of controls that include the job advertisement dummies (see, for instance, Riach and Rich 2002) and the regional dummies; the variables in the first partition take the value 1 if the vacancy is posted by a given job search 12 It is well known that vectors i and i can not have many variables in common, since i may show an approximately linear dependency with respect to both of them, so the model will be under-identified for all practical purposes. Puhani (2000) shows that in this case the Heckit model suffers from some problems of collinearity. We adopt the most general solution which consists of dropping from the equation of RESP i those variables that show good predicting power in the selection equation.      Table 5 (continued) engine and 0 otherwise, while the variables included in the second partition take the value 1 if the job vacancy comes from a given Italian region and 0 otherwise. Some preliminary estimations led us to exclude the job advertisement dummies in i because they are collinear with the explanatory variables contained in vectors i and i .

Preliminary analysis on the gender effect
In Table 3, the percentages concerning the callback rates for females and males are quite similar. In order to asses whether men and women have different chances of being hired or discriminated against, we estimate the probability to be called back as an exclusive function of the gender. We call this basic specification as the "Model 0" which represents the starting point of our empirical analysis. Its formal definition is where the explanatory variable W i is a scalar dummy variable for women. The complementary dummy M i is omitted to prevent exact collinearity.
The estimation results in Table 6 represent a raw evaluation of the gender effect because it does not take into accounts other available regressors. Focusing on the principal equation, at a first glance the negative and significant coefficient of W i can be interpreted as a disadvantage due to being a woman compared to being a man on the job market. The use of the Heckit approach is justified by the statistical significance of estimated coefficient of the Mill's Ratio i .

Models on attractiveness and nationality with gender effect
Given the task of estimating the impact of attractiveness and nativeness together with the gender effect on the probability to be called back, we propose three different Heckit specifications which test our five hypotheses. The first specification is the model that estimates the general effect of beauty on the probability to be called back. Its equation is where � i = 1 i i i includes the constant and the scalar variables A i , U i and F i are dummy variables for the attributes already defined in Sect. 3. The corresponding parameters included in = 0 A U F � are crucial because they measure the difference between the applicant's feature of being attractive, unattractive or foreign and the reference applicant. As we claim before, is the coefficient associated to the inverse Mills ratio i that indicates if some sample selection bias is operating.
The second specification aims to analyse the interactions between beauty, uglyness and not being Italian attributes and job characteristics. The regressors are Politica (2021) 38:171-201 where the partition ′ i ⊗ ′ i contains all the interactions between the attributes included in vector i and the components of vector i . The symbol " ⊗ " denotes the Kronecker product.
Finally, the third specification is based on the cross products This vector differs from i already defined in Eq.
(2) because it contains the variable SaleServ i = Sales i + Services i that has been built because the use the two addends separately leads to exact collinearity. For the same reason, the vector i is not a partition of i . The model in system (1) does not take into account potential gender discriminations, therefore we split the regressors by defining i = W i i and i = M i i , where W i is a scalar dummy variable for women and M i is a scalar dummy variable for men. The resulting model is the one we call "genderified model", whose the principal equation is This specification allows us to carry out a battery of Wald tests on the null hypothesis of no gender discrimination given by the element by element equality where and k is the dimension of vectors w and m . It is worth noting that the "genderified model" corresponds to separate estimates for women and for men, so we can consider the complete covariance matrix to perform the test. Therefore, in our analysis we estimate the heckit twice, without and with the genderification, for each of the three models. The genderified version includes the Wald test on the estimated parameters according to the null hypothesis in Eq. (8). This allows us to introduce the concepts of female/male "premium" and "penalty" in being attractive or native.

Results
The analysis we performed consists of three heckit models for the dependent variable RESP. Following Eq. (1), for each Model 1, 2 and 3, we estimate the heckit twice, without and with the genderification. All estimation results are accompanied by some measures to evaluate the goodness of each model specification. In particular, since = , we estimate values of the correlation between the two errors ( ) and the scale factor given by the standard deviation of i ( ). Moreover, we provide the condition number, in order to control the degree of collinearity among a large number of regressors, and the Akaike, Bayesian and Hannan-Quinn information criteria (AIC, BIC and HQC) to determine which specification best fits the data. We also carry out a likelihood ratio test (LR) about the null hypotheses of no effects generated by the regional control variables. Moreover, we perform a conditional moment (CM) test for normality via the Outer Product Gradients (OPG) regression suggested by Davidson and MacKinnon (1993), since the Heckit estimator could suffer from inconsistency when the normality assumption fails (see, Pagan and Vella 1989, for details).
All the results are reported in Tables 7, 8 and 9 for the principal equation,  while Table 10 contains the selection equation estimates. A probit model with the same dependent and regressors of the selection equation is also provided in order to calculate the McFadden R 2 , the correct prediction percentages to assess the models' goodness of fit and a normality test on the model residuals. In our estimations some of the explanatory variables among those presented in Table 3 are dropped to avoid exact collinearity, while other variables are excluded to reach the maximum possible reduction of parameters, without losing of any relevant information.
From the statistical perspective the estimation results seem robust among the proposed models. The estimated coefficient ̂ lies around the value of 0.54 and all the related t-statistics highlight that a sample selection mechanism is indeed operating. This result, already obtained for the Model 0 (see Table 6), suggests that the Heckit model is always superior to OLS. In general, all the regression statistics indicate that there are not substantial differences among models. Independently of the model specification and of genderization, the estimated correlation ̂, = 0.998 . Moreover, the condition number is always less than the critical value 30 indicating that our estimates for all j = 1, 2, … , k, do not suffer from any problems of collinearity 13 and the values of the loglikelihood and all the information criteria are about the same in models. The CM test rejects the null hypothesis only for the non genderized Model 2 and Model 3, making us confident that, at least in all the other cases, the selection problem was satisfactorily dealt with. Finally, the LR tests lead to the conclusion that the regional variables are suitable. When we consider only the impact of gender on the callback rate of a candidate sending no photo CVs, no gender gap in opportunity appears. On the contrary, when we consider the impact of attractivess and not being Italian, such gap emerges. Attractiveness and not being native matters, but the difference between genders is also significant (Hypotheses 1, 2 and 5).
From the estimated parameter ̂A in Model 1 we observe that attractive people have a higher chance of being contacted by the employers. These results are in line with those obtained by Garner-Moyer (2010) for French labour market. He shows a major difference in callback rates between attractive (42%) and unattractive (16%) candidates during the first stages of the hiring process. As expected, our estimates suggest that there is a female premium in being attractive, because the probability   of attractive women to be called back is about the double of that of attractive men. This is in line with Hypothesis 2. In general, attractiveness and nativeness influence the gender gap. Since ̂U < 0 and ̂F < 0 , there is a penalty for unattractive and non native people. Indeed, the Wald tests confirm that the lack of attractiveness produce an evident penalty for women, while being non native reduces the probability to receive a callback for men. These conferm our Hypotheses 2 and 5.      Once these results were obtained, we proceed to further disentangle such differences in gender, considering both job characteristics and types. (testing Hypothesis 1).
The estimates of Model 2 show that sending a CV without photo increases the probability to be called back for men. As expected, in the case of front office jobs, attractiveness rises this probability and there is a preference of employers towards women. It can be noticed here a remarkable female premium in being not native. In the case of hard works, attractiveness is not a relevant attribute, and the gender gap always operates in favour of men.
Surprisingly, in our estimates all the coefficients of graduation are negative. Moreover, a relevant female penalty of being unattractive is observed. When higher levels of education are considered, our estimates reveal a strong gender difference only in the case of attractive applicants. Having a high school diploma or a degree produces opposite effects on the probability of being called back. Such effect is positive for attractive women, but it is negative for attractive men.
Hence, our analysis is substantial in line with Hypothesis 4: when two CVs contain the same skills, gender differences remain and attractiveness explains such differences.
Overall, our results support the presence of racial discrimination in the Italian labour market. Except in the case of hard works, where no specialisation is required, being a foreigner generally reduces the callback probability. This is consistent with Hypothesis 5.
Finally, Model 3 focuses on the impact of beauty and nativeness on different types of job.
Regarding the technicians (benchmark), the estimated coefficient of U i highlights only a female penalty in being unattractive, while for attractive and foreign candidates substantial preferences between males and females do not emerge. In this context, positive estimated coefficients are associated only to beauty.
Attractiveness is desirable for all applicants for technical and sales/services job positions, while it reduces the probability of receiving a callback in the case of professional, skilled and elementary activities. The Wald tests on gender differences highlight a female premium in being attractive for clerical jobs, and a female penalty in the case of elementary activities.
In general, being unattractive mostly increases the callback rate for female applicants, and the same features is disadvantageous for clerical jobs. The lack of attractiveness is likely to produce a gender gap in favour of women when they apply for managerial, professional and elementary activities. All these results confirm Hypothesis 3 according to which the gender gap mainly depends on physical attributes. Not being native play a relevant role in our results, but relevant differences between genders do not emerge. This suggests that nativeness is a cause of discrimination, but it does not impact the gender gap, consistently with the Hypothesis 5.

Concluding remarks
In our analysis we carried out a field experiment based on real job online openings in Italy to ascertain whether gender, attractiveness and nativeness have an impact at the early stage of job search. The sample population that we analysed consists of 9680 CVs sent to firms looking for employers in response to advertised job postings displayed online in the period between September 2011 and August 2012. Comparing the response rates for different categories, we obtained the following results: attractive applicants are those who receive the highest number of positive answers; both unattractive and non Italian native candidates obtained lower callback rates. Attractiveness is quantitatively more important for women than for men. Attractive Italian women have higher callback rates than unattractive or non native women. This discrepancy is greater for women than for men. Generally, there is a female penalty in being unattractive, but this does not apply for all the examined kinds of jobs. It is worth noting that the lack of attractiveness increases the gender gap in favour of men when the interaction between gender and attractiveness is significant. On the other hand, attractiveness reduces this disparity and sometimes an attractive woman has more opportunity to be called back than an attractive man. Most responses to unattractive subjects involve low skilled jobs, which is a clear sign of occupational segregation. Unsurprisingly, attractiveness appears to be essential for front office and executive jobs, while the lack of attractiveness is likely to strongly reduce chances for clerical jobs too. This applies even more for women.
Our estimates suggest that attractive candidates should attach a photograph to their CVs when have the opportunity to do so because an image increases the likelihood of being called for an interview. The opposite is true for unattractive candidates. In other words, it seems that a woman aiming to find a good job in Italy has to be attractive, while an unattractive woman, even if she is highly qualified, has little chance of getting a highly skilled job, at least if she applies online and attaches a photo. These results lead us to conclude that attractiveness increases the gender gap in the labour market, especially in terms of occupational segregation.
We also found that racial discrimination appears to be substantial. However, in the case of women, racial discrimination is less prominent than discrimination based on physical features. Instead, our results highlight a male penalty in being non native because the estimated percentages of callbacks are generally very low, especially for "soft" or highly qualified jobs. Conversely, unattractive or non native men have a higher probability of receiving a callback if they apply for a hard and poorly qualified work position. In general, non native women seem to be more likely to get a job interview in Italy than non native men; the exceptions are some job categories such as services, elementary and hard work for which men are preferred. On the other hand, non native men are taken into consideration only in the cases of hard or low skilled works.
In a future research it would be interesting to repeat our analysis using fictitious CVs with photos sent by candidates of different ethnicities. In this way it could be possible to estabilish whether nationalities interact with attractiveness in reducing or increasing gender discrimination, considering also the type of job the candidate is applying for.

Probit regression statistics
McFadden R