1 Introduction

Language skills are an important determinant of labor market performance. Previous studies have focused on the effect of language proficiency on earnings of male immigrants. Recent examples are Bleakley and Chin (2004), Miranda and Zhu (2013a, b), Di Paolo and Raymond (2012) and Yao and van Ours (2015). However, it is not only language proficiency that affects labor market performance. Also, language speech patterns may be important, i.e. it may matter whether a worker speaks a standard language or a dialect. Though among linguists there is no common definition of dialects, a dialect is usually referred to a variation of a language used by a particular group. A dialect may associate with social class. As for example is apparent from the “My Fair Lady” lyrics of the song “Why can’t the English?”: “Look at her, a prisoner of the gutter, condemned by every syllable she utters (...). An Englishman’s way of speaking absolutely classifies him. The moment he talks he makes some other Englishman despise him.”

A few studies investigated how speech patterns affect labor market outcomes. Gao and Smyth (2011) find a significant wage premium associated with fluency in standard Mandarin for dialect-speaking migrating workers in China. Carlson and McHenry (2006) presents the results of a small experiment on how speaking dialect affects employment probability. Bendick Jr. et al. (2010) using an experimental set-up studies the effects of a (mostly) French accent for white job applicants to New York City restaurants. These accents were considered as “charming” and they increased the probability of being hired as a waiter or waitress. To study the effects of speech patterns, Grogger (2011) uses NLSY data in combination with audio-information about how individuals speak. In the US labor market, black workers with a distinct black speech pattern earn less than white workers whereas black workers who do not sound distinct black earn the same as white workers. Grogger (2018) finds that lower wages of black workers are related to their speech pattern. These speech-related wage differences might be related to occupational sorting such that black workers whose speech is similar to white workers sort themselves into occupations that involve intensive interpersonal interactions. An alternative explanation mentioned by Grogger (2018) is employer discrimination, i.e. employers are prejudicial tasted against certain dialects. Grogger (2018) also finds that lower wages of US Southerners are not related to their speech pattern but are largely explained by family background and residential location. According to Das (2013) language and accents provide information about an individual’s social status. The spoken language may be a source of discrimination affecting earnings and promotion. In other words, a speech pattern may be a signal of unobserved productivity. It could also be that non-standard speech patterns reduce productivity at the workplace, for example because differences in language speech pattern between workers increase production costs (Lang 1986).

Language skills and speech patterns are acquired at early ages up to the teenage years. After this period it is hard to pick-up a different related speech. Rickford et al. (2015) present an analysis of audio recordings of participants in the Moving to Opportunity (MTO) project. In the MTO project, participants were randomly assigned to receive housing vouchers providing an opportunity to move to lower-poverty areas. Exploiting this randomization, Rickford et al. (2015) find that moving to lower-poverty neighborhoods at a young age changes speech patterns while moving at a higher age does not. Also using MTO data, Chetty et al. (2016) find that moving when young (before age 13) to lower-poverty neighborhoods is beneficial in terms of college attendance and earnings. Grogger (2018) suggests that the combination of the two findings is consistent with the notion that speech patterns affect earnings.

A dialect is a variation of the standard language, used in limited regions and different in mainly pronunciation, and sometimes vocabulary and grammar. Dialects can be acquired without training and play a role in informal communication, while the standard language is the instruction medium at schools. Speaking with a local dialect accent may signal lower language ability, limited education and lack of experience communicating with people from other regions. Moreover, a similar speech pattern can signal cultural affinity. People are more likely to trust those who speak the same dialect and conduct trade (Falck et al. 2012). All this implies that speaking the major languages results in an advantage in economic activities. Although in dialect-speaking areas dialect can be viewed as a separate skill, the return to dialects is somewhat limited in other areas in the country. Therefore, it is of interest to explore how dialect speech patterns affect labor market performance and whether it is premium or penalty in the labor market.

Our paper studies the relationship between dialect-speaking and hourly wages. We study the Netherlands as an example of a country with a lot of commuting such that the spatial segregation is limited. This is not only because there are various dialects spoken, but also Dutch natives are more homogeneous in terms of culture, physical characteristics and economic wealth than natives from larger countries. Moreover, to compare native dialect speakers with Standard Dutch speakers, we obtain purer effects of speech pattern than comparing immigrants with natives. We focus on native Dutch individuals who indicated not having problems with reading or speaking Dutch. We perform simple regressions starting with daily-dialect speaking as the only right-hand side variable and gradually introducing personal characteristics, family characteristics and geographical differences. The difference in wages between dialect-speakers and non dialect-speakers becomes smaller as more explanatory variables are introduced. We find that the unconditional difference in hourly wages between Standard Dutch and dialect speakers is about 10% for males and 8% for females. If we take into account personal characteristics and province fixed effects male dialect speakers earn 4% less while there is no significant penalty on female dialect speakers. Our findings suggest a significant wage penalty of daily dialect-speaking behavior on wages of males. Moreover, in provinces with a higher share of dialect-speakers, the wage penalty is less severe. Nevertheless, although our results are robust to various sensitivity tests our analysis does no go beyond conditional correlations.Footnote 1

Our paper is set-up as follows. In Sect. 2 we provide a description of data and linguistic background of Dutch. Section 3 presents our parameter estimates. Section 4 discusses possible explanations for our main findings. Section 5 concludes.

2 Data and background

2.1 Linguistic background

The formal language of the Netherlands is Standard Dutch. It is spoken in its purest form in Haarlem, a city close to the capital Amsterdam. Standard Dutch is spoken throughout the Netherlands but there are also many regional languages and dialects. Frisian, mostly spoken in the province of Friesland, is recognized as a separate language and promoted by the local government. In Friesland, both Standard Dutch and Frisian are considered official languages and instruction media at school. More than 80% of the adult inhabitants understand verbal Frisian, but only a small minority can write the language (Gorter 2005). In our paper for simplicity we refer to Frisian as a dialect. Other official regional languages include Limburgish, spoken in the province of Limburg by about 75% of the inhabitants. Low Saxon dialects are spoken in the provinces of Groningen, Drenthe, Overijssel and Gelderland by approximately 60% of the inhabitants. Other provinces have their own dialects such as Brabantish, spoken in Noord-Brabant or Zeelandic in Zeeland [see an overview in Driessen (2005) and Cheshire et al. (1989)].

Distances between languages depend on characteristics such as vocabulary, pronunciation, syntax and grammar. To quantify distances between languages various methods are used. Levenshtein (1966) proposed an algorithm based on the minimum number of steps to change a particular word from one language to the same word in a different language. The overall distance between two languages is based on the average difference for a list of words for which often but not always the 100 words from Swadesh (1952) are used. Levenshtein’s method can be based on written words but can also be based on phonetic similarities. This is especially helpful when comparing dialects as often these are spoken but not used in writing. Van Bezooijen and Heeringa (2006) use two samples of Dutch dialects and apply the Levenshtein distance measure to calculate the average linguistic distances between provincial dialects and Standard Dutch. In our paper, we use their distances, which are based on the New Dialect Sample. These distance measures are calculated from 100 words. The first column of Table 1 provides information about the linguistic distance of dialects to Standard Dutch averaged at the level of provinces. As shown, the linguistic distance to Standard Dutch of the dialect spoken in a particular province is the largest in Friesland and the smallest in Flevoland, Noord-Holland and Zuid-Holland.

2.2 Data

Our dataset is from the Longitudinal Internet Studies for the Social sciences (LISS) panel survey which is representative of the Dutch speaking population permanently residing in the Netherlands. In the LISS panel, background demographic variables are collected monthly while on specific topics data are collected annually (see for details: www.lissdata.nl). We use seven waves of panel data from 2008 to 2014. As is usual in studies on language effects, we rely on self-reported information about speech patterns. Respondents indicate their intensity of dialect-speaking by answering the questions Do you ever speak dialect? with possible answers Yes—daily, Yes—regularly, Yes—once a while and No—never. The indicator for dialect-speaking we use in our analysis is whether or not an individual speaks a dialect daily.Footnote 2 To focus on the effects of dialect speaking pattern and exclude the effect of language deficiency, we remove from the sample immigrants and native individuals who indicate having problems in reading or speaking Dutch. We merged the dataset with a variable measuring the geographic distance from residential municipality to Haarlem.

Table 1 Dialect-speaking by province.

The second column of Table 1 shows that the geographical distance to Haarlem of the individuals in our sample is on average the largest in Limburg and Groningen, and the smallest in Noord-Holland where Haarlem is located. Dialect-speaking is the most prevalent in Limburg where 68% of the individuals in our sample speak dialect daily, followed by Friesland with 48% and Drenthe with 34%. Flevoland, Noord-Holland, Utrecht and Zuid-Holland only have a tiny proportion of the sample speaking dialect daily. In addition, the so-called “Randstad” provinces have the highest share of individuals who never speak a dialect.Footnote 3

Figure 1 provides a graphical representation on the relationship between dialect characteristics and the geographical distance to Haarlem at the provincial level. Figure 1a shows that linguistic distance and geographical distance are highly correlated with Friesland and Drenthe as outliers. In Drenthe, the linguistic distance to Standard Dutch is smaller than in other provinces with the same geographical distance to Haarlem while in Friesland the linguistic distance to Standard Dutch is larger than it is in comparable provinces. Figure 1b shows that there is also a strong correlation between the share of (daily) dialect speakers and the distance to Haarlem.

Fig. 1
figure 1

Source: see Table 1

Linguistic distance, percentage daily dialect speakers and geographical distance to Haarlem by province.

In 2008, the minimum hourly wage in the Netherlands was about 7.8 Euro while in 2014 it was about 8.5 Euro. In the original data, some of the hourly wages are far below the minimum wage. To avoid a bias in the parameter estimates we removed from the sample all observations with an hourly wage below 7.5 Euro (2.6% of the observations). Our empirical analysis is based on 5721 observations of 1761 males and 5597 observations of 1756 females.Footnote 4 The densities and cumulative distributions of hourly wages by dialect status are presented in Figure 2. The differences between males and females are not large but individuals who speak dialect daily on average have lower hourly wages.Footnote 5 The median wage of Standard Dutch speaking males is 18.75 Euro while for Standard Dutch speaking females this is 15.86 Euro. Among dialect-speakers the median hourly wage is 16.76 Euro for males and 14.79 Euro for females. So, on average for males dialect speakers earn 10.6% less than non-dialect speakers. For females the difference is 6.7%. Of course, these differences need not be related to dialect-speaking itself but may be explained by other personal characteristics that correlate with dialect-speaking.

Fig. 2
figure 2

Kernel densities and cumulative distribution of hourly wages; by gender and daily dialect-speaking

3 Wage effects of dialect-speaking

3.1 Baseline results

Although we have panel data, from 2009 to 2014, only 5–6% of individuals changed dialect-speaking status. So, dialect-speaking behavior hardly changes over time. Furthermore, since speech patterns are unlikely to change quickly a change in self-reported dialect-speaking status probably does not reflect a big change in speech pattern. Therefore, it is not possible to use a individual fixed-effects approach to account for unobserved differences between individuals. Instead, we use a pooled cross-section set-up relating daily dialect-speaking to log hourly wages:

$$\begin{aligned} Y_{it} = \alpha _t + \gamma D_{it} + \beta _1 X_{it} + u_{it} \end{aligned}$$
(1)

where \(Y_{i}\) represents log hourly wages, the \(\alpha _t\) indicate calendar year fixed effects, \(D_{it}\) is a dummy variable for daily dialect-speaking, \(X_{i}\) is a vector of individual, family and province characteristics, \(\beta _1\) represents a vector of parameters and \(u_{it}\) is an error term. The parameter of main interest is \(\gamma\). The personal characteristics are age, educational attainment, having a religion and having a partner. Family characteristics are number of children and urbanization level of the residence and province characteristics are log of per capita GDP, log of population, area of main roads (km\(^2\)) and log of employment. The appendix provides details of all variables used in the analysis.

Table 2 presents OLS parameter estimates where the pooled cross-section nature of the data is taken into account by clustering the standard errors at the level of the individual. In the first column only calendar year fixed effects are taken into account. In the second column individual characteristics are included. The third column further adds family characteristics. Column (4) further includes province characteristics, while column (5) substitutes province characteristics for province fixed effects.

Table 2 Baseline parameter estimates effect of daily dialect-speaking log hourly wages; pooled cross-section

The first column of Table 2 shows that on average daily dialect speakers have lower hourly wages by 8% (females) to 10% (males). The second and third column show that this wage penalty is reduced once individual and family characteristics are introduced. The R\(^2\) of the regression increases a lot once individual characteristics are introduced. Individual characteristics absorb about 4%-point of the wage penalty, family characteristics add another 1%-point. For females the wage penalty becomes insignificant. If province characteristics or province fixed effects are also included, the wage penalty further reduces to about 4% for males and about 3% for females, where for for females the wage penalty is no longer different from zero at conventional levels of significance.Footnote 6

Table 3 Daily dialect-speaking and log hourly wages; pooled OLS estimates

Table 3 reports the OLS parameter estimates of the fourth column of Table 2 in detail. For both males and females, age is positively related to hourly wages. Workers age 55-plus earn substantially more than workers aged 45–54, who in their turn earn more than workers ages 35–44, et cetera. For males the age effect is stronger than for females. Educational attainment also has a positive effect on hourly wages, although again this effect is stronger for males than for females. The main difference between males and females is the effect of the partner. While males with a partner earn more than males without a partner, for females this is the other way around. Of course, this is not necessarily a causal effect, i.e. it could even be reverse causality with women with higher earnings being less likely to have a partner or men with higher earnings being more likely to have a partner. Religion, urbanization level of the municipality of residence and provincial characteristics do not have significant effects on hourly wages.

3.2 Additional parameter estimates

To investigate the sensitivity of our baseline estimates we performed a range of sensitivity analysis of which the results are presented in panels b to e in Table 4. For reasons of comparison, panel a reports the main baseline estimate of Table 2, column (4).

In panel b, we investigate provincial heterogeneity of the wage penalty. We examine dialect-speaking separately for the “Randstad” area and the remaining eight provinces. In the Randstad provinces the share of daily dialect-speakers is only 1–3%, substantially below average. As shown in panel b1, the wage of male dialect speakers in the Randstad is about 16% lower than the wage of male non-dialect speakers. As shown in panel b2, in the non-Randstad provinces, for males the wage penalty of dialect-speaking is about 4%. It is striking that for males the wage penalty is lower in the provinces where the linguistic distance to Standard Dutch is larger. Apparently, for males the wage penalty of dialect-speaking is not so much related to linguistic distance but instead to the share of people speaking dialect. If more people speak a dialect the wage penalty of dialect-speaking is smaller.

As in Friesland Frisian is recognized as a co-official language, we examine whether the wage penalty still exists. As shown in panel b3, we do not find a significant wage penalty of dialect-speaking, and for males there is an insignificant wage premium of dialect-speaking. In panel b4 we find robust results by excluding individuals from Friesland. For females, dialect-speaking has no significant wage effect in either of the provincial subsamples.

In panel c we use as indicator of dialect-speaking, whether an individual ever spoke dialect during his or her lifetime. We find the parameter estimates are somewhat smaller (in absolute terms) but not very different from the baseline estimates. In panel d, we add dummies for each category of dialect-speaking, ranging from daily dialect-speaking to dialect-speaking once in a while. This is to measure whether the wage penalty increases with the frequency of dialect usage. Not surprisingly, for males daily dialect-speaking has a significant negative effect of 5.3%, while speaking dialects regularly and once in a while have smaller and non-significant effects.

Panel e shows the heterogeneity of the wage penalty according to educational attainment. We include an interaction term of daily dialect-speaking with a dummy for having a higher vocational or university degree. We find that for both genders there is a larger wage penalty of dialect-speaking for the high-educated, while dialect-speaking is less of a concern for low-educated people. This may be because high-educated are targeted at occupations which require communication skills with people all over the country. Moreover, dialect-speakers with a higher education may be perceived by employers to have a lower unobserved ability whereas for low educated workers dialect-speaking is less of a problem.

In Table 5 we indicate how dialect-speaking affects other labor market outcomes i.e. employment, working hours and type of jobs. An individual is considered to be employed if he or she has any type of paid work, including family business and self-employment. Table 5 shows that neither for males nor for females daily dialect-speaking affects the probability to have job. Conditional on having a job, daily dialect-speaking does not have a significant effect on working hours either. However, conditional of having a job, daily dialect-speakers are less likely to have a non-manual job. Moreover, daily dialect-speakers are more likely to have a professionally lower ranked job.

Table 4 Sensitivity analysis parameter estimates effects of dialect-speaking on log hourly wages

4 How to explain our findings?

Our major finding is that dialect-speaking is related to a wage penalty as well as occupation sorting. Dialect-speakers are sorted to occupations with less income, or they are paid less than their counterparts in the same occupation. There are several potential mechanisms accounting for this disadvantage.

The first mechanism responsible for this is the inability to speak Standard Dutch. Yao et al. (2016) for example investigate the relationship between dialect speaking and academic performance of 5–6 year old children in the Netherlands. They find that dialect speaking has modest negative effects on language skills of young boys while young girls are not affected. However, it is not clear that this effect persists over time till adulthood. Giesbers et al. (1988) reporting the results of an investigation at primary schools in two dialect areas in the southern part of the Netherlands conclude that dialect-speaking is not necessarily an educational disadvantage. However, they do find that dialect-speaking children are disadvantaged in terms of school performance and their choice of secondary education. One of the surprising findings is that school teachers give lower grades to dialect-speaking children under their supervision. However, essays of dialect-speaking children were graded similarly as essays by non-dialect-speaking children by external graders, unaware of the speech patterns of the children.

Table 5 Pooled cross-section parameter estimates of daily-dialect speaking; other labor market outcomes

A second possible mechanism is perception of lower productivity. Even if workers do not have difficulty in speaking Standard Dutch at work, colleagues and employers may still recognize a dialect accent. Whether dialect-speakers actually have a lower productivity is hard to tell. From Table 4, the high educated workers face a larger wage penalty than their counterparts. We consider this fact as an indication that perceived productivity is not in line with actual productivity. The fact that a wage penalty of daily dialect-speaking is only present for males but not for females may also be related to differences in location of the job. Females are more likely to work closer to home than males. In our sample, the average distance for females between work and home was 10.5 km with a commuting time of 20.6 min. For males the average distance to work was 16.3 km and 27.5 min.

Our main finding of a wage penalty for dialect speaking is seemingly at odds with existing literature on the effect of being bilingual which suggests that there is a wage premium for being proficient in a second language (Christofides and Swidinsky 2010; Di Paolo and Raymond 2012; Chen et al. 2014; Cappellari and Di Paolo 2018). However, there is a clear distinction between speaking an additional language and speaking a dialect. Speaking a second language signals social identity, increases search efficiency and decreases transaction cost in the local labor market. Our context is different from bilingual systems in the sense that a dialect cannot be viewed as a separate language. Dialect speech pattern will affect how people speak the standard language and associate with social status. Moreover, although speaking a dialect increases search efficiency in the local market, it may not compensate for the disadvantage in national labor markets unless the local area is more developed than the country average, like for example, Catalonia in Spain and Shanghai in China. This is in line with our finding of a bigger wage penalty for dialect-speaking for individuals living in a Randstad province.

5 Conclusions

We investigate whether daily dialect-speaking affects earnings. Using data from the Netherlands we conclude that male workers who speak a dialect daily have significantly lower hourly wages. For females we also find negative effect but this is not significantly different from zero. High-educated male workers face a larger wage penalty of dialect speaking than their counterparts. Finally, we find province heterogeneous effects such that in areas where more people speak dialect, there is a smaller wage penalty. Our data do not allow us to make a distinction between various mechanisms that lead to a wage penalty for male workers. We can only speculate that the wage penalty is partly related to occupational sorting and partly to conscious or unconscious discrimination by employers or co-workers.

Our findings are all the more surprising since there are clear limitations to our empirical analysis. Although our results are robust to various sensitivity tests our analysis does no go beyond conditional correlations. Furthermore, our indicator of dialect-speaking is self-reported and related to daily use. So, it is not clear which dialect is spoken nor is it clear how the speech pattern is perceived by listeners. Similarly, if an individual indicates speaking Standard Dutch it is not clear whether this is also the perception of listeners. It could be that a self-reported Standard Dutch speaker is perceived to speak according to a particular area in the Netherlands, i.e. speech pattern can reveal area of residence. Also, it is not clear whether an individual speaks dialect at work. It may be that a daily dialect-speaker speaks Standard Dutch at work with a flavor of dialect. Even if the pronunciation makes the meaning of words perfectly clear, it may be slightly different from area to area.