1 Introduction

Nowadays, online employer reviews are available in large volumes and provide new research opportunities by complementing traditionally collected data (e.g., manually conducted employee surveys) in organizational sciences (Stamolampros et al. 2020). Online employer reviews are employer evaluations written by current or former employees on dedicated reviewing websites and typically reflect the accumulated experiences of employees (Höllig 2021). Thus, these reviews provide rich information about employers and have already served as foundations for numerous empirical studies, for example, in the context of employer branding (Dabirian et al. 2017), organizational culture (Das Swain et al. 2020) as well as corporate performance (Luo et al. 2016).

Fig. 1
figure 1

Studying online employer reviews through the lens of Herzberg’s Two-Factor Theory. The figure illustrates how we apply Herzberg’s Two-Factor Theory to online employer reviews on kununu. On the left we depict an excerpt of a positive employer review and on the right we depict examples for hygiene as well as motivation factors. In particular, we operationalize the overall ratings of reviews (displayed in the top left corner of the excerpt and highlighted in blue) as an expression of employee satisfaction and individual review aspects as individual feedback on either hygiene factors (example highlighted in red) or motivation factors (example highlighted in green). Note that the exemplary terms for hygiene and motivation factors stem from words that are frequently related to them in existing research (see Appendix A.3)

One prominent field of study in organizational sciences is employee satisfaction, as it is closely linked to employee motivation and their resulting performance and, as such, is crucial to the economic success of businesses (Dobre 2013; Kumar and Pansari 2015). In previous research of employee satisfaction, online employer reviews have been utilized to study the factors that influence it. For example, existing research focused on the positive effect of family led businesses (Huang et al. 2015) or of demographically diverse leader boards (Creek et al. 2019) on employee satisfaction, or how state-level minimum wages positively affected satisfaction of beginners, but negatively affected satisfaction of seniors (Storer and Reich 2021). Other studies exploiting employer reviews also focused on similarly concrete problems, for example, employee satisfaction among IT workers (Moro et al. 2020) or within the tourism and hospitality industry (Stamolampros et al. 2019).

Traditionally and in an offline context, measuring employee satisfaction is based on both a solid theoretical framework identifying important aspects contributing to satisfaction as well as empirical analyses of data manually collected through interviews and surveys in businesses (e.g., Nohria et al. 2008). However, such an overarching theoretical framework is still largely missing in the previous studies of employee satisfaction based on online employer reviews. Rather, each of these studies had their own interpretation of employee satisfaction and, thus, comparisons among them are difficult to transpose, especially when linking them to research conducted on traditionally collected data.

With this paper, we set out to close this gap by applying a solid and established theory of employee satisfaction—the Two-Factor Theory (Herzberg et al. 1959)—to online employer reviews. This theoretical framework has been utilized by many scholars (e.g., Alfayad and Arif 2017; Lundberg et al. 2009; Kotni and Karumuri 2018) to monitor employee satisfaction via traditionally collected data. In a nutshell, this theory identifies two types of factors: (i) hygiene factors (e.g., company culture, salary and working conditions) related to employee dissatisfaction and (ii) motivation factors (e.g., responsibility, advancement and recognition) related to employee satisfaction. Hygiene factors are essential to prevent dissatisfaction, whereas motivation factors are crucial for the actual satisfaction of employees (Herzberg 1966).

Table 1 Hypotheses

We believe that analyzing employee satisfaction expressed in online employer reviews through the Two-Factor Theory yields several advantages for both the analysis of online reviews as well as for the study of employee satisfaction. Firstly, online reviews are publicly available circumventing the time-consuming and cost-intensive process of manually collecting employee feedback. Secondly, online employer reviews represent a continuous data stream, allowing to precisely monitor hygiene and motivation factors over time. Thirdly, considering online employer reviews through the lens of the Two-Factor Theory explains review ratings and allows to compare results to findings from a plethora of existing studies conducted in offline settings (i.e., through traditional survey data). Lastly, the theory’s application to these new data sources can help us in learning more about the theory itself and whether and how its postulates have evolved in our modern times.

Hence, in our paper, we analyze a bilingual dataset containing more than two million online employer reviews extracted from the employer review platform kununu. Reviews contain feedback on various aspects (e.g., support from management or teamwork) in the form of ratings ranging between one and five stars as well as (optional) review text. We link these review aspects to either hygiene and motivation factors and interpret review ratings as an expression of employee satisfaction. In Fig. 1, we provide an exemplary illustration of this process. To see how the concepts of the Two-Factory Theory apply to reviews, we derive and test multiple hypotheses (see Table 1) focusing on the attention devoted to aspects as well as the textual content of reviews. Finally, we illustrate a practical utility of our findings through a prediction experiment.

In our study of online employer reviews we find that, in accordance with the theory, dissatisfied employees devote more attention towards review aspects related to hygiene factors and frequently include terms related to these factors in their reviews. However, our results reveal that hygiene factors go beyond mere dissatisfaction, suggesting a critical importance of hygiene factors in online employer reviews as they not only preventing dissatisfaction but frequently also foster satisfaction. We also find some evidence for the generalization of our results across industrial, cultural and employment status differences in the context online employer reviews. Finally, we leverage our empirical results to accurately predict employee satisfaction of individual companies, achieving a maximum balanced accuracy score of 0.87.

Overall, the contribution of our work is three-fold: First, we demonstrate how online employer reviews can be linked to the traditional Two-Factor Theory, allowing us to study influential factors for employee satisfaction through such reviews in a systematic way. Second, we add fruitful input to the discussion of this popular theory by applying it to a novel dataset. This bares potential implications for the factor assignment, as we revisit hygiene and motivation factors for the digital age. Finally, we utilize the combination of online reviews and hygiene and motivation factors to construct models predictive of employee satisfaction.

2 Research background

2.1 Employee satisfaction and the Two-Factor Theory

Employee satisfaction, also referred to as job satisfaction, generally describes how employees feel about their work (Janssen 2001). Researchers seeking to better understand employee satisfaction had varying conceptions about how it forms and how it can be explained. An early concept of employee satisfaction was introduced by Hoppock (1935), who described it as a combination of psychological, physiological as well as environmental conditions that cause individuals to be truthfully satisfied with their jobs. While the author mentioned the influence of external factors, he was convinced that employee satisfaction mainly depends on internal factors, such as personal traits, of employees. Similarly, Blood (1969) found evidence of employee satisfaction to be solely depending on the values and expectations one brings to work by interviewing more than 400 army personnel. These assumptions are different to the ideas of Vroom (1964), who described employee satisfaction to be dependent on the roles of employees at the workplace, suggesting that the actual content of work is most decisive for it. Schneider and Schmitt (1976) understood the satisfaction of employees very similarly and even conceptualized it to be completely depending on organizational conditions and not on predispositions of employees. Locke (1976) saw employee satisfaction to be the result of a mixture of both the conditions at work as well as the personal qualities of employees themselves. He also stated that the engagement of employees is closely related to employee satisfaction. In a more recent study, Choo and Bowley (2007) outlined employee satisfaction to be the outcome of job performance, for example, by achieving goals or by perceiving the general success of a company.

Another well-known theory that elucidates employee satisfaction is the Two-Factor Theory (Herzberg et al. 1959) introduced by Frederick Herzberg in 1959. Herzberg defined the Two-Factor Theory by collecting feedback from 203 accountants and engineers, asking them in which situations they felt either good or bad about their work. Leveraging the gathered feedback, he defined two different sets of needs that both contribute to employee satisfaction, namely (i) hygiene factors and (ii) motivation factors. Hygiene factors cover basic needs that are not directly related to the content of a job but rather represent the surroundings of it, such as the compensation for an employee’s work, the company culture or the interpersonal communication. On the contrary, motivation factors relate to the self-actualization needs of employees, focusing, for example, on responsibilities, achievement and the actual content of the work itself. Motivation factors follow the idea that humans strive for always improving themselves (Herzberg 1966, 2017), a fact that can only be satisfied by altering the content of work (Tietjen and Myers 1998). According to Herzberg’s theory, the satisfaction of hygiene factors can prevent dissatisfaction and poor performance of employees but only the satisfaction of motivation factors encourage high employee satisfaction and, as such, high productivity. Note that the absence of motivation factors does not necessarily lead to dissatisfaction among employees.

Ever since the introduction of the Two-Factor Theory, several empirical analyses tested for it in different industries or showcasing its general applicability. For example, Lundberg et al. (2009) investigated work motivation of seasonal workers in hospitality and tourism, finding support for the Two-Factor Theory, but also uncovering discrepancies in the needs of seasonal workers. In a related context, Balmer and Baum (1993) demonstrated the general applicability of the theory by using it to investigate guest motivation in hospitality. DeShields et al. (2005) used the theory to study the motivation and satisfaction of business students, translating hygiene factors to capture performance of advising staff and motivation factors to capture performance of classes and faculties. Again, researchers find support for Herzberg’s Two-Factor Theory.

More recent works studied, for example, the influence of employee voice (i.e., employees communicate their views to employers) on employee satisfaction by applying the Two-Factor Theory to feedback from 300 non-managerial employees (Alfayad and Arif 2017). Here, researchers found that acknowledgment of employee voice pushes motivation and therefore increases employee satisfaction. Holmberg et al. (2018) investigated reasons for shortages of nursing personnel in Swedish mental health care using the Two-Factor Theory. They based their analysis on interviews with 25 nursing personnel demonstrating the usefulness of Herzberg’s theory and identifying the lack of career advancements as a partial reason for these shortages. In another study, Hur (2018) reported differences between public and private sectors in how hygiene factors and motivation factors affect employee satisfaction. Kotni and Karumuri (2018) applied Herzberg’s Two-Factor Theory on data from 150 salesmen of the retail sector and found that they are more satisfied with hygiene factors as compared to motivation factors, suggesting discrepancies from the Two-Factor Theory.

With our work we aim to complement these studies on the impact and importance of hygiene and motivation factors, but instead of basing our study on manually collected survey data, we demonstrate how to apply Herzberg’s Two-Factor Theory to employee feedback collected from the Web.

2.2 Employee satisfaction and online reviews

Most works investigating online reviews of employers focused on the website Glassdoor.Footnote 1 For example, a previous study by Marinescu et al. (2018) described a selection bias in online employer reviews, where people with extreme opinions are more motivated to share their experiences as compared to people with moderate opinions. To counteract this problem, the authors suggested to provide incentives for reviews, which mitigates the motivational deficit of people who hold moderate opinions. Dabirian et al. (2017) extracted 38, 000 reviews of the highest and lowest ranked employers on Glassdoor in order to identify what employees care about and made suggestions to employers on how to become a great place to work. Chandra (2012) used Glassdoor reviews to uncover different perspectives of work-life-balance in eastern and western cultures. In another study, Luo et al. (2016) analyzed multi-aspect employer reviews on Glassdoor and reported a positive correlation between overall employee satisfaction and business performance. Notably, authors also discovered a negative correlation for some review aspects, including safety, communication and integrity. More recently, Green et al. (2019) analyzed reviews from Glassdoor and their influence on stock returns Their results indicate that companies for which reviews become more positive over time significantly outperform companies for which reviews become more negative over time.

In contrast to the aforementioned works, we focus on online employer reviews found on kununu. We opted for this platform as we find it to be a currently underrepresented data source for research conducted in organizational sciences. Further, the various review aspects of kununu are best suited for the present study. One existing work, which utilized 25, 827 reviews collected from the German version of kununu,Footnote 2 was conducted by Könsgen et al. (2018), who studied how review discrepancy affects job seekers. The authors found that high levels of discrepancies lead to increased intentions to avoid submitting applications to respective employers. In a preliminary work (Koncar and Helic 2020), we studied the interaction of employee benefits, employee positions as well as employment status with employee satisfaction expressed in employer reviews collected from kununu. We found that our results are mostly consistent with findings gained from studies conducted with traditional and manually collected survey data.

Apart from online employer reviews, other online content as well as social networks have been studied to benefit employees. For example, existing studies focused on why and how employees use social networking at work (DiMicco et al. 2008) or how employee engagement spreads in organizational social media (Mitra et al. 2017). Similarly, Shami et al. (2014) analyzed texts of internal and external social media platforms to extract emotions and opinions of employee chatter. De Choudhury and Counts (2013) investigated emotional patterns during times of high and low productivity. In another work, Guy et al. (2016) studied how users use the “like button” in an organizational context and how it may relate to organizational commitment. More recently, Saha et al. (2019) used data from LinkedinFootnote 3 to study role ambiguity (i.e., unclear responsibilities and degree of authority of employees) and its effects on employee wellbeing. Their proposed method can help to identify role ambiguity in organizations and demonstrates the potential of analyzing data from the Web and using gained insights to improve life at work.

Jointly, the presented studies highlight the added value of online employer reviews to compliment traditional qualitative studies and to answer diverse research questions in organizational sciences. In the following sections, we demonstrate how to link such reviews to a traditional theory in order to study employee satisfaction and, based on our findings, we propose a predictive model that may allow employers to assess satisfaction levels detached from reviewing platforms.

3 Dataset and methods

3.1 Dataset

KununuFootnote 4 is a platform allowing employees to anonymously review their employers and operates in Austria, Germany and Switzerland since 2007 and also in the USA since 2013. Hence, kununu is bilingual (German and English), but reviews can be composed in any language.Footnote 5 Each review on kununu consists of an overall rating ranging between 1 and 5, where 1 represents “very bad” and 5 represents “very good” experiences (as described by kununu). The overall score aggregates a variety of individual review aspects, each also ranging from 1 to 5 and grouped into four sections: (i) company culture, (ii) diversity, (iii) work environment, and (iv) career. We list the 13 individual aspects and their descriptions provided by kununu and stating what an aspect is about in Table 3. Note that there are five additional review aspects only available for the USA version of kununu (comprising Inclusive / Diverse, Handicapped Accessibility, Workplace Safety, Job Security, Challenging Work). For better comparability among countries, we exclude these five aspects from our analysis.

Fig. 2
figure 2

Characteristics of our dataset. The figure depicts selected key characteristics of our dataset, including the number of reviews over time as well as kernel density estimations of overall ratings and review length, respectively for each of the four countries contained in our dataset. In Fig. 2a, we present the number of reviews over time, depicting a steady increase throughout the years, especially after 2014, for the German and the USA based version of kununu. The two smaller countries, Austria and Switzerland, show similar behavior, but exhibit much smaller numbers in reviews. In Fig. 2b, we illustrate the kernel density estimation (KDE) for the overall rating. We observe higher probabilities for positive ratings as compared to negative ones for all four countries. Reviewers on the USA based version of kununu seem to be slightly more controversial as indicated by the higher probability for one-star reviews in comparison to German speaking countries. Regarding length of reviews (in words) consisting of optional free-from text, we observe long-tailed distributions for all four countries, suggesting our dataset includes many reviews with no or only short review texts, while only a limited number of reviews have longer review texts (see Fig. 2c; distribution truncated at 300 words, which lies still above the 95th percentile)

In addition to ratings, reviewers must specify a headline (maximum of 120 characters), whether they are a former or a current employee of the reviewed company, as well as whether or not they would recommend the employer to friends (answer not shown in reviews and only used internally by kununu). Reviewers can optionally state their position (i.e., either “employee”, “management”, “temporary”, “freelancer”, “co-op”, “apprentice” or “other”), as well as include suggestions for improvements, what they like and dislike about the company, and comments on any of the aspect ratings (aspect reviews) in free-form text. Finally, kununu provides employer profile pages stating the country and industry they operate in.

Table 2 Dataset statistics

Data acquisition & preprocessing.

For our analysis, we automatically extracted (see Appendix A.1 for a detailed description of this process) all reviews present on kununu (comprising Austrian, German, Swiss and USA versions) up to the end of September 2019. Extracted reviews include the name and industry of the reviewed employer, overall and aspect ratings, all free-form texts, the review date and the employment state (either current or former employee) of the reviewer. As kununu is bilingual, we link and normalize German names of review aspects (e.g., Kollegenzusammenhalt \(\Rightarrow\) Teamwork) and industries (e.g., Dienstleistung \(\Rightarrow\) Service and Support) to English ones, allowing us to compare German and English reviews. In Table 2, we list descriptive statistics of our preprocessed dataset, comprising 2, 240, 276 reviews of 385, 736 employers written over a time span of twelve years.

Preliminary descriptive analysis.

In Fig. 2, we depict selected characteristics of our dataset. In 2007, kununu had a comparatively small number of reviews where each of the three German-speaking countries had no more than 550 reviews. However, kununu grew rapidly over the years and aggregated more than 445, 000 reviews in 2019 alone (up to the end of September; see Fig. 2a). We report a slight increase in the mean and variance of overall ratings over time for each of the four countries contained in our dataset, indicating that reviews became more positive and reviewers more divided with time. We observe slightly different behavior for the reviews in the USA version of kununu, for which the mean overall rating decreases continuously in the first four years only to catch up with German versions in 2017, after which the variance of overall ratings starts to decrease. We interpret this as a first indicator for cultural differences reflected in our dataset.

Regarding ratings, we observe a large number of reviews with positive overall ratings and a lower number of reviews with negative overall ratings in all four countries (see Fig. 2b). Reviews in the USA are slightly different from those of remaining countries with overall ratings being more controversial which, again, depicts cultural differences among reviewers.

For the text length (measured in number of words) of reviews with optional review texts,Footnote 6 we observe long-tailed distributions for each country (see Fig. 2c), indicating that the majority of reviews contains only a few words, whereas only a small number of reviews contain substantially longer free-form texts (the 95th percentile is 278 words). After manually inspecting reviews, we report that the majority of reviewers specifically address a selection of aspects, suggesting that they devote attention only to those aspects that are relevant to them.

3.2 Methodology

We investigate online employer reviews through the lens of Herzberg’s Two-Factor Theory. For that, we first explain how review aspects relate to Herzberg’s hygiene and motivation factors as well as how we leverage review ratings to measure employee satisfaction. We then interpret online employer reviews in the context of the Two-Factor Theory by defining a set of 13 hypotheses derived from it. Finally, we illustrate how to use our findings to accurately predict employee satisfaction on a company level.

Table 3 Overview of individual review aspects on kununu

3.2.1 Assigning review aspects with the Two-Factor Theory

Following the definition of Herzberg’s Two-Factor Theory and providing the description of aspects as stated by kununu (see Table 3), we let three independent annotators assign the 13 review aspects to either hygiene or motivation factors. Note that we provide a detailed description of the annotation process in the Appendix section. We assess the inter-rater agreement between annotators by computing Fleiss’ kappa (Fleiss 1971) resulting in a value of 0.597, suggesting a moderate to substantial agreement among the three annotators. Annotators determined that company culture, internal communication, teamwork, work-life balance, office and work environment, as well as environmental friendliness relate to hygiene factors as they all address the surroundings of work done. On the other hand, all annotators consider freedom to work independently and career development as clear motivation factors as they are related to the content of the work done. Remaining factors comprising support from management, gender equality and attitude towards older colleagues are, according to our annotators, not clearly assignable to either motivation or hygiene factors. Note that our annotators are not the first to encounter such issues as the exact distinction between hygiene and motivation factors has been criticised in existing research (Parsons and Broadbridge 2006; Li 2018). After further discussing the respective assignments, the annotators agreed that the three aspects relate to both hygiene and motivation factors based on the following explanations: Regarding support from management, our annotators find that its description (see Table 3) includes the style of leadership, which can be interpreted as hygiene factor, but also addresses involvement in decision making, which can be interpreted as a motivation factor. Similarly, in case of gender equality and attitude towards older colleagues the description addresses equal treatment among colleagues (hygiene factor), as well as equal career opportunities (motivation factor). Hence, annotators linked these three factors to both hygiene and motivation factors. We list the resulting assignments, respectively for each of the 13 aspects, in Table 3.

3.2.2 Definition of employee satisfaction and dissatisfaction

We define reviewers’ satisfaction or dissatisfaction based on the overall rating of their reviews. In particular, we consider reviews with an overall rating less than or equal to the first quartile (overall rating \(= 2.42\)) to be created by dissatisfied employees and refer to them as negative reviews. Further, we consider reviews with an overall rating equal to or greater than the third quartile (overall rating \(= 4.54\)) to be created by satisfied employees and refer to them as positive reviews. This leaves us with 561, 515 negative reviews and 594, 409 positive reviews. The remaining 1, 084, 352 reviews are considered as reviews with neutral employee satisfaction and are therefore neglected for the remainder of the study. Note that slight changes to these thresholds do not qualitatively impact our results.

3.2.3 Hypotheses

We advance 13 different hypotheses (see Table 1 for an overview of all hypotheses), each focusing on distinct characteristics of employer reviews and derived from Herzberg’s Two-Factor Theory.

H1 (Attention).

As motivation factors positively influence employee satisfaction, we expect satisfied employees to write more in reviews of aspects assigned with motivation factors as compared to hygiene factors which, when fulfilled, are taken for granted and do not draw much attention (Herzberg et al. 1959; Alshmemri et al. 2017; Gawel 1996). On the contrary, if hygiene factors are not fulfilled, dissatisfaction among employees increases and, thus, we expect that dissatisfied employees devote more attention towards hygiene factors and complain about their absence (Herzberg et al. 1959; Alshmemri et al. 2017; Gawel 1996). Specifically, we operationalize attention in two different ways and define the following four hypotheses:

H1.1::

Satisfied employees write more reviews on motivation factors.

H1.2::

Dissatisfied employees write more reviews on hygiene factors.

Here, we compute and report ratios of the number of aspect reviews that contain optional review text in either positive and negative reviews, respectively for each country.

H1.3::

Satisfied employees write longer reviews for motivation factors.

H1.4::

Dissatisfied employees write longer reviews for hygiene factors.

For these two hypotheses, we define attention as aspect review lengths (i.e., how much attention was devoted to individual aspects by reviewers) and compare differences in medians between distributions of positive and negative reviews.

H2 (Sentiment).

As previous studies suggest that unfavorable experiences result in negative emotions (Bougie et al. 2003; Mattsson et al. 2004; Westbrook 1987), we expect that reviews from dissatisfied employees about hygiene factors (as their absence should lead to dissatisfaction) convey a negative sentiment. In contrast, we expect that reviews from satisfied employees about motivation factors (as their fulfillment should lead to satisfaction) convey a positive sentiment. Specifically, we investigate:

H2.1::

Satisfied employees write more positively about motivation factors.

H2.2::

Dissatisfied employees write more negatively about hygiene factors.

To test for these hypotheses, we investigate the sentiment conveyed in reviews and fall back on existing translated German and English sentiment dictionaries (Chen and Skiena 2014) which are comparable to each other since they originate from the same dictionary. Specifically, we compute the sentiment s of an aspect review by \(s = (W_p - W_n)/(W_p + W_n)\), where \(W_p\) is the number of positive words and \(W_n\) is the number of negative words in a review. Thus, s ranges from \(-1\) to \(+1\), where positive (and respectively negative) values represent a positive (resp. negative) sentiment and values close to zero indicate a neutral sentiment. We investigate the sentiment for positive and negative reviews and again compare differences in medians between the two distributions. Note that for this analysis we only consider German and English reviews (identified through automatic language detection) with at least one hundred words comprising at least one word of our sentiment dictionaries. We decided for this word minimum as shorter texts contain fewer sentiment signals and dictionary based approaches may have limited capabilities to accurately infer sentiment scores otherwise (e.g., see Heitmann et al. 2020).

H3 (Readability).

Existing research suggests that complaining is a behavioral response to dissatisfaction (Maute and Forrester 1993; Singh 1988; Zeelenberg and Pieters 2004), providing a way to cope with emotions by venting one’s dissatisfaction (Anderson 1998). Further, previous findings suggest that negative reviews are harder to read as reviewers address a wider range of issues when describing their bad experiences (Zhao et al. 2019). Similar findings report, for example, a positive correlation between satisfaction and better readability (i.e., negative reviews are harder to read) of reviews when assessing the helpfulness of online reviews (Korfiatis et al. 2008, 2012).

Following these previous observations, we expect similar behavior for online employer reviews and investigate the readability of positive and negative reviews in terms of the Two-Factor Theory. In particular, we expect:

H3.1::

Satisfied employees write more readable reviews about motivation factors.

H3.2::

Dissatisfied employees write less readable reviews about hygiene factors.

We test for these hypotheses by computing the Flesch reading ease (Flesch 1948), providing us with a score ranging between 0 and 100, where texts with values closer to 0 are considered to be harder to read and text with values closer to 100 are considered to be easier to read. Note that we try other readability formulasFootnote 7 as well, but results are very similar as these formulas are known to have high inter-correlation (DuBay 2004). Similar to our analysis on sentiment, we compute readability for positive and negative reviews, compare differences in medians and only consider German and English reviews with at least 100 words as scores might result in non-interpretable values otherwise.

H4 (Content).

Following the Two-Factor Theory (Herzberg et al. 1959; Alshmemri et al. 2017; Gawel 1996), we expect, independent from the reviewed aspect, satisfied employees to focus on content related to motivation factors and dissatisfied employees to focus on content related to hygiene factors. In particular, we expect that satisfied employees are the only ones able to experience and write about motivation factors while they take hygiene factors for granted and neglect them. Dissatisfied employees, on the other hand, should have no experience with motivation factors to write about and only focus on not fulfilled hygiene factors. Thus, we investigate the following two hypotheses:

H4.1::

Satisfied employees use more words related to motivation factors.

H4.2::

Dissatisfied employees use more words related to hygiene factors.

To test for these, we adopt the method from Hofland and Johansson (1982), which is based on contingency tables and chi-squared (\(\chi ^2\)) tests to assess which words are characteristic for either of two corpora. More precisely, for each aspect we look at the sets of the top 100 nouns (after removing stop words and lemmatization) included in positive and negative reviews and build the union of those two sets. Next, for each aspect and each word from the union we build a \(2\times 2\) contingency table, which keeps the count of a given word, as well as the total count of all other words in both positive and negative reviews. The null hypothesis of the \(\chi ^2\) test (which we perform with Yates Correction (Yates 1934) to counteract the fact that \(2\times 2\) contingency tables are not continuous) states that the occurrence of a given word is independent of the controversy of the comment. Hence, words for which we can reject this null hypothesis are used distinctively in either positive or negative reviews. For this analysis, we consider the top twentyFootnote 8 significant words in positive and negative reviews with regard to their \(\chi ^2\) values and to their relative frequencies in reviews in order to decide where is their usage significantly higher, respectively for German and English reviews. To evaluate if top words reflect hygiene and motivation factors, we compute overlaps with words that, according to the literature and the theory, are mentioned frequently and of high relevance in connection with both sets of factors. More precisely, we let three independent annotators read a selection of existing English studies investigating the Two-Factor Theory (Oladotun and Öztüren 2013; Malik and Naeem 2013; Smerek and Peterson 2007; ul Islam and Ali 2013; Alshmemri et al. 2017; Gawel 1996; Bassett-Jones and Lloyd 2005) with the aim to select important words related to both hygiene and motivation factors, leaving us with two sets of words. Annotators independently found 50 distinct words for hygiene factors of which we keep 22 words that were found by at least two of them. For motivation factors, annotators identified 35 distinct words of which we consider 11 words found by at least two of them. We then translate words to German and extend the sets of either language by adding synonyms manually selected by using Wiktionary.org and Thesaurus.com. We refer to Table 5 in the Appendix section for a complete list of extracted German and English words related to hygiene and motivation factors. To evaluate the overlap of words from our automatic subgroup discovery with words extracted by annotators, we compute the Jaccard IndexFootnote 9 respectively for positive and negative words, each aspect and language.

To check our results for robustness, we repeat this experiment and compare manually extracted words with the top words form the subgroup discovery by considering their word embeddings, allowing us to incorporate semantics in this analysis. Specifically, we use pre-trained German and English vectors (Grave et al. 2018) from the fastText libraryFootnote 10. We average vectors over words contained in a respective group (i.e., manually extracted German words, manually extracted English words, top positive German words, top negative German words, top positive English words and top negative English words) and compute cosine similarities between mean vectors from top words and manually extracted words.

H5 (Generalization).

There is previous work providing evidence both for (Lodahl 1964; Cummings 1975) as well as against (Behling et al. 1968; Furnham et al. 1999) the generalization of the Two-Factor Theory. To shed light on how our previous findings generalize, we study the following three hypotheses:

H5.1::

Results are independent of cultural context.

H5.2::

Results are independent of industry.

H5.3::

Results are independent of employment status (i.e., current or former).

To test for these hypotheses, we conduct the same analyses as for previous hypotheses respectively for each country and industry contained in our dataset as well as well for reviews from current and former employees. We quantify results by counting the number of times the two motivation factors are among the top (H1 and H2) five aspects and the bottom (H3) five aspects according to differences in median between positive and negative reviews respectively. This leaves us with 8 cases for cultural differences (2 motivation aspects times four countries), 86 cases for industrial differences (2 motivation aspects times 43 industries) and 4 cases for employment status differences (2 motivation aspects times two statuses).

3.2.4 Prediction

We now investigate the applicability of our findings from the hypothesis tests by conducting a prediction experiment. For that, we leverage the features computed for our previous analyses, allowing us to investigate the predictiveness of review aspects for employee satisfaction on a company level. More precisely, we want to predict whether or not a company has high or low employee satisfaction by exploiting the content as well as stylistic characteristics of reviews. Our results uncover that both content and style of aspect reviews are predictive of employee satisfaction.

Experimental setup.

We first split our dataset according to the language of reviews, leaving us with a German and English subset of reviews. Next, we aggregate review texts over individual companies and remove all companies with an aggregated review length of less than 10, 000 characters (to assure meaningful values for textual features), respectively for the German and English subset. This leaves us with 7, 148 companies located in Austria, Germany and Switzerland as well as 904 companies located in the USA. We frame our prediction task as a binary classification problem, predicting either a high or low employee satisfaction for a company. We define this low and high employee satisfaction through the reviews individuals companies had received. In particular, we consider the number of positive and negative reviews (according to Section 3.2.2) per company and employ a majority vote to decide whether a company has a high or low employee satisfaction level. As such, companies with a higher ratio of positive reviews are considered to have a high employee satisfaction, whereas companies with a higher ratio of negative reviews are considered to have low employee satisfaction. In cases of equal numbers of positive and negative reviews, we exclude companies from our prediction task, arguing that these companies have neutral employee satisfaction. After removing undecided cases and labelling companies, we remain with 2, 955 companies having a high and 4, 067 companies having a low employee satisfaction based on German reviews (minimum # of reviews: 1, maximum # of reviews: 7, 090, mean # of reviews: 50.84 over Austrian, German and Swiss companies), as well as 450 and 430 companies, respectively for English reviews (minimum # of reviews: 1, maximum # of reviews: 1, 148, mean # of reviews: 100.46 over companies located in the USA). To assess the predictive power of factors, we utilize the top positive and negative words extracted for individual aspects during the testing of H4 (Content) and count their occurrences in aggregated review texts to create numeric features for each company.

Table 4 Ratios of reviews with optional review text

Feature spaces.

Since we are interested in the differences of significance between aspects assigned to hygiene, motivation and both factors, we accordingly separate factors into three different features spaces. In particular, we consider the total count of the top twenty positive and the top twenty negative words for each aspect and group these counts according to aspects and their assigned factor. This leaves us with three features spaces comprising counts of top words respectively for aspects assigned to hygiene, motivation and both factors. We complement these three features spaces by including: textual features comprising the mean sentiment, the mean Flesch reading ease and the mean number of words over positive and negative reviews of a respective company, and the combination of all of the above feature spaces. Finally, we consider TF-IDF (minimum document frequency: \(10\%\); maximum document frequency: \(80\%\); maximum number of words: 5, 000; stop words removed) vectors of reviews combined with textual features, representing the upper limit and an approach dissociated from the Two-Factor Theory. This allows us to compare the predictive power of words related to hygiene and motivation factors with general bags-of-words.

Addressing imbalanced factor assignment.

To compensate for the imbalanced factor assignment (8 aspects assigned to hygiene factors, 2 aspects assigned to motivation factors and 3 aspects assigned to both factors), we further introduce a feature space with subsampled hygiene factors. More precisely, we randomly select two aspects assigned to hygiene factors for 1, 000 times, allowing us to do a fair comparison of hygiene and motivation factors.

Evaluation.

We conduct our prediction task using logistic regression with \(\ell _2\) regularization (to prevent overfitting the training data) as implemented in scikit-learnFootnote 11. To evaluate our models, we split data into train and held out test sets (80 to 20 percent ratio) multiple times by performing stratified random sampling over 20 random runs, respectively for each feature space. We report mean balanced accuracy (defined as the average of recall obtained for both classes and suitable for imbalanced datasets) over random runs for each feature space. In case of our subsampled hygiene factors, we report mean values over the 1, 000 random runs.

We compare results with an improved baseline determining the employee satisfaction of a company based on that of other companies operating in the same country and industry. For example, the employee satisfaction of a marketing company operating in Austria would be positive if the majority of all other companies operating in marketing and Austria would be positive, or negative otherwise. For that, we again conduct the 20 random train and test set splits and report the mean balanced accuracy over these random runs.

Fig. 3
figure 3

Results for H1 (Attention). The figure depicts the results for our two hypotheses (H1.3 and H1.4) focusing on the attention devoted by satisfied and dissatisfied employees towards review aspects. We expect that motivation factors receive more attention from satisfied employees and hygiene factors more from dissatisfied employees. With the box plot we illustrate the distributions of the number of characters in reviews and list aspects in descending order (top to bottom) by the difference in medians between positive (green color) and negative (red color) reviews. Vertical black lines indicate the median and the first and third quartile. Whiskers indicate minimum and maximum values still within 1.5 interquartile ranges. Stars indicate the significance of differences between positive and negative reviews based on two-sided Mann-Whitney-Wilcoxon tests (*: p-value \(\le 0.05\), **: p-value \(\le 0.01\), ***: p-value \(\le 0.001\), ****: p-value \(\le 0.0001\)). We find motivation factors to be more relevant in positive reviews. On the contrary, for negative reviews, we observe that hygiene factors are more relevant than motivation factors, supporting both H1.3 and H1.4

4 Empirical Results

4.1 Hypotheses

We now describe the results for individual hypotheses and provide an overview of whether or not we find support for them in Table 1.

H1 (Attention).

We list the ratios of aspect reviews having optional review text in Table 4 and report that ratios are higher for negative reviews as well as all aspects. This indicates that employees with negative experiences rather tend to write reviews than satisfied employees independent from aspects and their assignments to hygiene and motivation factors. We argue that this observation could be due to the well-known negativity bias (Baumeister et al. 2001; Hilbig 2009; Rozin and Royzman 2001) which suggest a general tendency of people to focus on negative experiences.

Regarding H1.1, we observe that satisfied employees do not write more reviews for aspects related to motivation factors as compared to other aspects, suggesting a rejection of our hypothesis. However, in case of H1.2, we see that dissatisfied employees tend to write more reviews for aspects related to hygiene factors, thus, indicating support for this hypothesis.

Fig. 4
figure 4

Results for H2 (Sentiment). The figure illustrates the results for our two hypotheses (H2.1 and H2.2) which focuses on the sentiment conveyed in aspect reviews written by satisfied and dissatisfied employees. We expect that reviews from satisfied employees on motivation factors are positive and reviews from dissatisfied employees on hygiene factors are negative. With the box plot we illustrate the distributions of sentiment conveyed in reviews. In general, we find that, as expected, satisfied employees express a more positive sentiment as compared to dissatisfied employees. While we find a more negative sentiment form dissatisfied employees towards hygiene factors as expected, we also find more positive sentiment from satisfied employees for hygiene factors instead of motivation factors. Thus, these findings provide support against H2.1 and for H2.2 and even suggest higher relevance of hygiene factors for satisfied employees than what we expected based on the Two-Factor Theory

In Fig. 3, we illustrate the review length (in characters) distributions for positive and negative employer reviews, respectively for each of the 13 aspects. We verify the significance of differences between positive and negative distributions for each aspect by computing two-sidedFootnote 12 Mann-Whitney-Wilcoxon tests at the Bonferroni corrected p-value of \(\alpha =0.004\) (corresponding to 13 aspect comparisons at \(\alpha =0.05\)). We report p-values smaller than our significance level for 11 out of the 13 aspects, meaning that their difference in text length between positive and negative reviews is significant. The two aspects with non-significant differences are company image (p-value \(=0.24\)) and attitude towards older colleagues (p-value \(=0.06\)).

Specifically, we observe the largest positive differences in medians for the motivation factors career development and freedom to work independently as well as the hygiene factor environmental friendliness. This suggests that satisfied employees devote significantly more attention towards motivation factors as compared to dissatisfied employees and supports H1.3. We discuss the case of environmental friendliness in detail in the Discussion section of the paper. For the majority of hygiene factors, we report that they are more relevant in negative reviews which supports H1.4. Notably, the aspect support from management (assigned to both hygiene and motivation factors) has the largest negative difference in medians, suggesting that dissatisfied employees write more about issues related to management than satisfied employees.

Overall, we confirm H1.2, H1.3 and H1.4, suggesting that motivation factors may be more relevant to satisfied employees while hygiene factors may be more relevant to dissatisfied employees.

Fig. 5
figure 5

Results for H3 (Readability). The figure depicts the results for the two hypotheses (H3.1 and H3.2) focusing on the readability of reviews from satisfied and dissatisfied employees. We expect that reviews from satisfied employees on motivation factors are easier to read and reviews from dissatisfied employees on hygiene factors to be harder to read. With the box plot, we illustrate the distributions of the Flesch reading ease (higher values mean easier to read) of reviews. We find that reviews on motivation factors are harder to read as compared to hygiene factors. These results refute our two hypotheses and are contrary to findings in previous research which showed that reviews form dissatisfied people are harder to read

H2 (Sentiment).

We depict sentiment distributions for positive and negative reviews respectively for each aspect in Fig. 4. To assess the significance of differences between medians, we again compute two-sided Mann-Whitney-Wilcoxon tests with our corrected significance level \(\alpha =0.004\). We report significant differences for 12 out of 13 cases, with differences for attitude towards older colleagues (median of positive reviews: \(-0.09\); median of negative reviews: \(-0.10\)) being non-significant at a p-value of 0.36, suggesting that ageism is generally perceived more negatively.

Overall, we find that, as expected, the sentiment conveyed by reviews from satisfied employees is more positive as compared to the sentiment of reviews from dissatisfied employees (all differences in medians are positive). Further, we report that, in accordance with our hypothesis H2.2, aspects assigned to hygiene factors are more negatively perceived in reviews from dissatisfied employees (medians for aspects assigned to hygiene factors ranging from \(-0.26\) to \(-0.14\).) as compared to those from satisfied employees (medians for aspects assigned to hygiene factors ranging from 0 to 0.13.), which are rather neutral. However, we observe that our results regarding H2.1 are inconclusive. For freedom to work independently and career development we report a neutral sentiment in positive reviews with medians of 0.08 and 0, respectively. This indicates that satisfied employees, in contrast to our expectations, do not write more positively about aspects assigned to motivation factors. In fact, we observe that hygiene factors are perceived more positively as compared to motivation factors in reviews from satisfied employees, suggesting that the former may not only prevent the occurrence of dissatisfaction but also foster satisfaction. As such, hygiene factors may be even more important than originally envisioned by Herzberg. When considering sentiment of both positive and negative aspect reviews combined, we report that reviews on aspects assigned to motivation factors are overall perceived more positively as compared to the majority of aspects assigned to hygiene factors, as dissatisfied employers are more neutral towards motivation factors.

Overall, we reject H2.1 and find strong support for H2.2 as well as a higher relevance of hygiene factors as initially expected based on the Two-Factor Theory.

Fig. 6
figure 6

Results for H4 (Content). The figure depicts the results for our two hypotheses (H4.1 and H4.2) focusing on the content of aspect reviews. We expect that the content of negative reviews reflects hygiene factors whereas the content of positive reviews reflects motivation factors. To analyse this hypothesis, we extract top words that are distinctively used in positive and negative reviews and compute the Jaccard Index to infer their overlap with manually extracted words (see Table 5 in Appendix A.3) related to hygiene and motivation factors, respectively for positive and negative reviews as well as each aspect. In Fig. 6, we illustrate results for German reviews and find larger overlaps for words related to motivation factors with words from positive reviews (strong support for H4.1). On the contrary, we report larger overlaps for words related to hygiene factors and negative top words (strong support for H4.2). We find similar results for English top words (see Fig. 6). Note that words related to motivation factors were also used by satisfied employees for reviewing aspects related to hygiene factors, indicating their high relevance in online employer reviews

H3 (Readability).

In Fig. 5, we illustrate the Flesch reading ease distributions for positive and negative reviews, respectively for each aspect. We again check for the significance of differences in a similar fashion to the analysis of H1 (Attention) and H2 (Sentiment). Here, we find 7 out 13 cases to be non-significant (support from management, gender equality, attitude towards older colleagues, environmental friendliness, teamwork, company image and overall compensation for your work).

Contrary to our expectations, reviews from dissatisfied employees are in general easier to read than reviews from satisfied employees (with the exception of support from management), providing different results compared to previous studies (Zhao et al. 2019; Korfiatis et al. 2008, 2012). Most notably, reviews from satisfied employees on aspects assigned to motivation factors are among the top three of hard to read aspect reviews with medians ranging between 43.68 and 43.85. This suggests a rejection of our hypotheses H3.1 as well as H3.2 and indicates substantial differences between the behavior of reviewing products and reviewing employers.

Connecting these results with the ones from H2 (Sentiment), we again observe that hygiene factors may be more important than initially thought as satisfied employees write more complex reviews for both hygiene and motivation factors when compared to dissatisfied employees. Following the results from H1.1 and H1.2, which indicate that dissatisfied employees would rather write optional review text, we observe that hygiene factors may be fundamentally important for both preventing dissatisfaction and fostering satisfaction and that motivation factors become only relevant when employees have higher ambitions to develop their careers. A potential explanation for the more complex reviews on aspects assigned with motivation factors may be that the subjects related to them are more complicated to describe or that they are only relevant to formally educated employees who may engage in a more critical thinking. Further corroborating this idea, we observe the largest negative median differences for aspects assigned to motivation factors, suggesting that satisfied employees write particularly more complex about motivation factors.

H4 (Content).

We depict the results for our fourth hypothesis in Fig. 6. Overall, we observe, as expected, only minimal overlaps with a mean Jaccard index of 0.02 for German top words and a mean Jaccard index of 0.03 for English top words. When considering the differences across hygiene and motivation factors as well as positive and negative top words, we find support for both H4.1 and H4.2. In particular, we report a higher mean Jaccard index for negative top words and words related to hygiene factors in both German and English cases (German & positive: 0.01; German & negative: 0.04; English & positive: 0.02; English & negative: 0.04). On the contrary, we observe opposite behavior for words related to motivation factors, for which positive top words have a higher mean Jaccard index for both languages (German & positive: 0.02; German & negative: 0.01; English & positive: 0.05; English & negative: 0.00). Similar to the results of H2 (Sentiment) and H3 (Readability), we observe that words related to hygiene factors are also relevant in reviews from satisfied employees, though less prominent as compared to reviews from dissatisfied employees. Further strengthening these findings, we report that words related to motivation factors are also used in reviews for aspects assigned to hygiene factors, indicating a high relevance of hygiene factors in online employer reviews.

Regarding the results of word similarities based on word embeddings, we find trends which are similar to our results based on the Jaccard Index (see Fig. 9 in Appendix A.3). For German, we report higher similarities between words related to hygiene factors and negative top words for 9 of 13 aspects and higher similarities between words related to motivation factors and positive top words for 7 aspects. For English, we find words related to hygiene factors to be more similar to negative top words for 8 aspects and words related to motivation factors to be more similar to positive top words for 10 aspects. This suggests that our results are robust for semantics and further strengthens the support for H4.1 and H4.2.

Fig. 7
figure 7

Results for H5 (Generalization). The figure depicts the results for our hypotheses (H5.1, H5.2 and H5.3) focusing on the generalization of our previous findings (H1 to H3). We expect that the two aspects assigned to motivation factors freedom to work independently (orange color and \({\times }\) marker) and career development (blue color and \(\bullet\) marker) receive more attention (H1) and a more positive sentiment (H2) while being harder to read (H3) in positive reviews as compared to remaining aspects (gray color) and negative reviews. We find support for H1 (Attention) across cultural (a), industrial (b) and employment status (c) differences. Results for H2 (Emotions) and H3 (Readability) are inconclusive among the three comparisons

H5 (Generalization).

We depict results for our hypothesis on the generalization of previous findings for H1 (Attention) in Fig. 7. Starting with cultural differences (H5.1; see Fig. 7), we report more attention devoted towards aspects assigned to motivation factors in positive reviews for Germany and Switzerland. In case of Austria, only career development receives more attention in positive reviews, while employees in the USA devote more attention towards all aspects in negative reviews. However, for all eight cases the median differences of aspects assigned to motivation factors is among the top five, providing similar results as observed for H1 (Attention). Regarding H2 (Sentiment), we observe that positive reviews convey a more positive sentiment for all aspects and countries. Exceptions are Attitude towards older colleagues for Austria (median difference \(=-0.52\)) and Internal Communication for Switzerland (median difference \(=-0.03\)) as these aspects received more attention in negative reviews. However, quantifying our results for H2 (Sentiment) and H3 (Readability), we find inconclusive results (3 of 8 and 5 of 8 cases respectively), indicating differences across countries when considering the sentiment conveyed by and the readability of reviews.

In Fig. 7, we report results for H5.2 and selected industries (based on largest positive and negative median differences for both aspects assigned to motivation factors). We observe longer positive reviews for both aspects assigned to motivation factors for industries media, construction and legal services, while for the public sector we observe longer negative reviews for these two aspects. Results for H2 (Sentiment) and H3 (Readability) are inconclusive again. Overall, we find support in 80 out of 86 cases for H1 (Attention), 41 out of 86 cases for H2 (Sentiment) and 34 out of 86 cases for H3 (Readability).

Finally, for attention differences between current and former employees (H5.3; see Fig. 7), we report more attention devoted towards aspects related to motivation factors in positive reviews from current and former employees (H1; 4 out of 4 cases). For H2 (Sentiment), we observe (in general) more negative reviews from former employees, suggesting that they may air their frustrations after termination. Distinguishing between hygiene and motivation factors is, similar to other comparisons, inconclusive (1 out of 4 cases). We find positive reviews on aspects assigned to motivation factors harder to read for both current and former employees (H3; 4 out of 4 cases), suggesting no difference between them.

Fig. 8
figure 8

Prediction task results. The figure illustrates the results from our prediction task aiming to predict employee satisfaction on a company level. Balanced accuracy is measured over twenty random train and test splits, respectively for each feature space and German (left-hand side) and English (right-hand side) reviews. The vertical dashed black lines indicate results for our improved baseline and error bars indicate bootstrapped \(95\%\) confidence intervals. Overall, prediction performance is good with varieties across feature spaces for both languages. Regarding German reviews (a), we observe that features from aspects assigned to hygiene factors achieve best performance which is slightly outperformed by the combination of features from all aspects. In case of English reviews (b), we report similar results, except that the combination of all aspect feature spaces does not yield better prediction performance as compared to the model with aspects linked to hygiene factors only. Even though textual features (i.e., text length, readability and sentiment) perform worse compared to features from aspects, the combination of all aspect and textual features achieves the best performance, respectively for German and English reviews. Note that for both languages performance of models with words related to the Two-Factor Theory perform almost similar to general bag-of-words models (TF-IDF + Textual Features) comprising a multiple of words and, hence, more information. This further highlights the predictive strengths of theory related words

Summary of hypotheses findings.

We find that satisfied employees devote more attention to motivation factors (strong support for H1.3), whereas dissatisfied employees devote more attention to hygiene factors (strong support for H1.4) which reflects the original definition of Herzberg’s Two-Factor Theory. Regarding sentiment conveyed in reviews, we find that dissatisfied employees write more negatively about hygiene factors (support for H2.2) and satisfied employees, contrary to our expectations, write more positively about hygiene factors instead of motivation factors (rejection of H2.1). Our results for readability of reviews refute our initial expectations of harder to read reviews from dissatisfied employees on hygiene factors and reflect the exact opposite behavior with harder to read reviews from satisfied employees on hygiene and motivation factors (rejection for H3.1 and H3.2). Further, we report that satisfied reviewers tend to specifically mention words related to motivation factors in reviews for aspects assigned to both hygiene and motivation factors (support for H4.1), whereas dissatisfied reviewers mostly mention words related to hygiene factors (support for H4.2). When investigating the generalization of previous hypotheses, we observe that some of our findings generalize across cultural, industrial and employment status differences (weak support for H5.1, H5.2 and H5.3). Overall, we find hygiene factors to be more relevant and important than motivation factors in the context of online employer reviews.

4.2 Prediction

We depict performance results for each feature space in Fig. 8. In general, we report accurate prediction performance with a mean balanced accuracy of at least 0.84 and 0.85 over all models, respectively for German and English reviews. As such, we outperform our improved baseline for German (0.65) by at least 0.19 and for English (0.68) by at least 0.17.

Inspecting the predictive power of review aspects linked to either hygiene, motivation or both factors, we report highest performances for our hygiene factors feature space with a mean balanced accuracy of 0.86 for German and English reviews. We observe slightly worse performance for models using features from aspects assigned to motivation or both factors in case of both languages. However, this seems to be an artifact from the limited number of aspects assigned to these factors as our models using the subsampled hygiene factors performed insignificantly better than these two. When combining the different features from all aspects (i.e., aspects assigned to hygiene and motivation factors), we do not see any further improvements compared to the performance of hygiene factors, signaling yet again the importance of hygiene factors in online employer reviews.

When we consider textual features for prediction models, we report lower performances with a mean balanced accuracy of 0.82 for German and 0.80 for English. However, when combining all features from all aspects as well as textual features, we could improve the mean balanced accuracy to 0.88 for German and 0.86 for English. In particular, for German reviews, note that the performance of the “All Combined” feature space is significantly better than the “All Aspects” one (bootstrapped \(p < 0.001\)), suggesting that capturing textual content and sentiment contained in reviews results in the best performance when predicting employee satisfaction for a company. Comparing the models based on the Two-Factor Theory to our dissociated TF-IDF approach including textual features, we observe only small deficits of 0.2 for German and 0.01 for English. This indicates that a small number of words related to hygiene and motivation factors can describe employee satisfaction in online reviews almost as well as all words.

Overall, we note that review content of aspects related to hygiene factors has high predictive power (equal to considering content of all review aspects) for employee satisfaction, further suggesting their high relevance in online employer reviews.

5 Discussion

We now discuss our findings and briefly address possible ethical implications as well as limitations of our work.

Online employer reviews through the lens of Herzberg’s Two-Factor Theory.

Overall, considering the results for H1 to H5 and keeping in mind the vast number of different employers and industries contained in our dataset as well as the limitation discussed in this section, we observe that hygiene factors are more relevant to reviewers than motivation factors. Thus, when analyzing such reviews in future works, we suggest to focus on hygiene factors as motivation factors seem to be only of incidental relevance to reviewers. In particular, our analysis revealed that, as according to the original theory, hygiene factors attract more attention of dissatisfied employees while motivation factors attract more attention from satisfied employees (see Fig. 3). This observation reflects the connection of the Two-Factor Theory to Maslow’s hierarchy of needs (Gawel 1996), suggesting that fundamental needs have to be satisfied in order to become motivated to strive for greater things. However, we also found that hygiene factors have the potential to increase satisfaction although, according to the original theory, they should only prevent dissatisfaction. Most notably, we depicted that hygiene factors are perceived more positively by satisfied employees as compared to motivation factors (see Fig. 4) and that they use terms related to motivation factors in reviews of aspects assigned to hygiene factors (see Fig. 6). To us, online employer reviews on kununu suggest that employers can satisfy the majority of their employees through fulfilling hygiene factors, whereas motivation factors may need to be fulfilled only for the minority of employees who want to climb the career ladder. Another possible explanation could be that some hygiene factors became more important over time and transitioned into motivation factors as current circumstances are very different to what we had in 1959. Howsoever, it is clear that hygiene factors are more important and relevant to online reviewers of employers and that motivation factors are considered incidental by them or only relevant to a minority of reviewers. This may only be a phenomenon in the context of online employer reviews or it may unfold a new (modern) interpretation of the theory. We suggest further (offline) studies to make more precise assumptions about how to conclude our observations.

Regarding the three review aspects that our annotators assigned to both hygiene and motivation factors, our results suggest that they are more similar to aspects assigned to hygiene factors than aspects assigned to motivation factors, further supporting a higher relevance of hygiene factors. For example, by manually inspecting selected aspect reviews of support form management, we see that reviewers mostly use this rating aspect to negatively comment on their supervisors or bosses and that they do not specifically address the points mentioned in the description provided by kununu. As such, reviewers neglect the part of the involvement in decision making processes, potentially explaining why this aspect is not related to motivation factors.

We now briefly discuss the aspect Environmental Friendliness, which receives the second most attention in positive reviews (see Fig. 3). Here, we argue that this may be a reflection of high environmental standards and awareness in European countries. Since more than \(67\%\) of reviews in our dataset originate from these countries, it explains the general relevancy in positive reviews contained in our dataset. Further, a recent increase in media coverage of climate change may also add to that observation. To test for this assumption, we investigate the ratio of reviews with optional text for Environmental Friendliness over the years. We find that, while in 2015 only \(6\%\) of reviews included dedicated texts for this aspect, in 2019 it were already \(17\%\), strengthening our explanation of increased environmental awareness among reviewers and indicating the importance for employers to take action. Thus, this example highlights the presence of longitudinal effects in shaping individual aspects related to factors that influence employee satisfaction.

Finally, we discuss discrepancies regarding readability between previous studies on product reviews (Zhao et al. 2019; Korfiatis et al. 2008, 2012) (dissatisfied reviewer write harder to read reviews) and our results (satisfied reviewers write easier to read). Here, we argue that the harder to read positive reviews on aspects assigned to motivation factors could be due to the fact that such factors are more likely to be granted to higher employee positions or only in specific industries which have to deal with more complex matters. To test for this assumption, we assess the mean Flesch reading ease of reviews for aspects related to motivation factors, respectively for each employee position as well as each industry. We find reviews from co-ops (i.e., employees who simultaneously study and work part-time) and managers to be hardest to read and reviews from apprentices and temporaries to be easiest to read. In the case of industries, we find harder to read reviews for sectors that may require formal education, such as tax consulting and auditing and software engineering. Further, we observe easier to read reviews for health, wellness & fitness or food production & farming, which may have less formal requirements. These results indicate that harder to read reviews from satisfied employees may indeed be due to a more advanced critical thinking and language through formal education. An in-depth study of this observation might be a promising research avenue for future work.

Predictiveness of employee satisfaction.

We demonstrated the predictiveness of employee satisfaction based on a logistic regression model achieving a maximum mean balanced accuracy score of 0.87. By creating different feature spaces, we uncover that review content of aspects linked to either hygiene and motivation factors are equally predictive for employee satisfaction and that only half of the aspects already yield best performance. Only when integrating textual features we could further increase prediction performance, suggesting that not only the content but also stylistic devices should be considered in prediction models, further corroborating similar findings by Siering et al. (2018). Since our proposed model solely relies on textual characteristics and indicates that we can accurately predict employee satisfaction based on them, we see real-world use cases for both employers and platform providers. The predictive model allows for assessing employee satisfaction expressed in arbitrary texts, for example, stemming from social media platforms, such as Facebook or Twitter. If, for example, an employee is referencing an employer in a Facebook post, said employer could potentially evaluate satisfaction levels through our model. Further, the presented model could help reviewing platforms to circumvent cold start problems with optimally inferring overall employee satisfaction of employers for which the amount of reviews is insufficient. This could be the case for new employers or employers with a smaller number of employees and, thus, probabilities for reviews are minimal. Reviewing platforms could then fall back on other texts found on the Web to take countermeasures and provide more accurate overall ratings for employers.

Ethical implications.

While our work solely intends to learn about online employer reviews in order to benefit employees as well as employers, performing such analyses may still put both of them at risk. For example, employers may attempt to identify reviewers (despite the fact that reviews on kununu are anonymous), as demonstrated in existing works, such as those of Almishari and Tsudik (2012) or Goga et al. (2013), who correlated texts of users to those posted in a non-anonymous context on other online social platforms. This has the potential to negatively impact the careers of both current (e.g., disciplinary transfer through offended supervisor) and former (e.g., negative reference letters from former employer) employees. Further, employers may misinterpret the general mood of their employees by relying too strongly on their reviews (e.g., because not all employees are aware of such platforms) and, in doing so, adjust their managerial decisions in a way that may create dissatisfaction among their employees.

Analyses of employer reviews may be used for company valuations, as recent research suggests that online employer reviews may relate to stock returns (Green et al. 2019). Thus, employees may negatively influence valuations by intentionally writing bad reviews (“review bombing” through trolls or bots) or, conversely, employers may trick future investors by whitewashing themselves through faked positive reviews. The manipulation of online reviews to harm or embellish reviewed entities is already a subject of research (Hu et al. 2011; Mayzlin et al. 2014). Finally, existing research highlights the importance of employer branding for job seekers (Cable and Yu 2006; Melián-González and Bulchand-Gidumal 2016). When building recommender systems based on analyses like ours, one must account for biased reviews to prevent discrimination against employers, as such biases may lead to reduced opportunities for employers to recruit well-educated and talented employees.

Limitations.

In our work, we explore reviews found on kununu, one platform among a variety of others providing the possibility to review employers on the Web. While we strongly believe that the amount of data and its variety (i.e., different countries and languages, multiple industries and hundreds of thousands companies) is appropriate for an analysis like this, we still acknowledge a potential sample and selection bias in the type of people that write reviews on kununu. This bias includes different interpretations of review aspects, for example, across countries. Grasping the full extent of cross-cultural differences calls for further qualitative and quantitative research. As such, the inclusion of other platforms, such as glassdoor.com, may help to generalize our study.

Further, the definition of employee satisfaction (i.e., positive and negative reviews) is based on a threshold and, thus, results may change according to it. However, slightly adjusting this threshold or using an alternative definition based on rating stars (i.e., reviews with less than or equal to two stars represent negative reviews and reviews with equal to or more than four stars represent positive reviews) did not noticeably alter our results. Similarly, the input from our annotators is depending on their individual opinions and results may differ if consulting other experts.

Finally, we test our hypotheses through a selection of methods that quantify various textual characteristics (e.g., the dictionary based sentiment analysis or text readability based on the Flesch reading ease). We acknowledge a certain limitation with the selection of methods used, as other approaches may yield different results. Thus, trying other text analysis methods may be worth exploring in future work.

Table 5 Extracted words related to hygiene and motivation factors

6 Conclusion

Summary.

In this work, we demonstrated how to apply the Two-Factor Theory to online employer reviews and investigated different characteristics of reviews from satisfied and dissatisfied employees. Overall, we reported that hygiene factors are more relevant to reviewers and that motivation factors are considered incidental or only relevant to a minority of reviewers. While we expected and found that dissatisfied employees devote more attention towards hygiene factors and satisfied employees devote more attention towards motivation factors, other experiments suggested a higher importance of hygiene factors contrary to our expectations based on the theory. For example, satisfied employees write more positively about hygiene factors as compared to motivation factors which contradicts with the definition of the theory. Finally, we inspected the generalization of our findings across cultural, industrial and employment status differences and demonstrated their applicability for predicting employee satisfaction on a company level.

Implications.

Based on our work, scholars could conduct similar analyses and extend the research of employee satisfaction through the Two-Factor Theory to other online employer reviewing platforms. This could contribute to our understanding of the theory and potentially benefit management and organizational sciences as a whole. The results of our work highlight the importance of hygiene factors in online employer reviews. These observations indicate potentially necessary adjustments of the theory’s factor assignments due to temporal changes since the introduction of the theory. Further, our analysis distilled deficiencies in some countries and industries with regards to employee satisfaction, highlighting an opportunity to counteract appropriately based on our results and, thus, create better working conditions for employees. Our prediction experiment uncovers the predictive powers of textual review content for employee satisfaction, demonstrating how employers could use such models to complement other feedback channels from their employees.

Future work.

A more detailed investigation of certain aspects, including gender equality and handicapped accessibility, might help to achieve more fair conditions at work. Another promising idea is an in-depth analysis of the temporal component of reviews, including the potential to develop tools that help in better understanding other longitudinal trends in employer satisfaction and the Herzberg theory. For example, one could precisely monitor how the importance of review aspects developed throughout the time span contained in the dataset (2007-2019) and, thus, we could better understand how hygiene and motivation factors change and shift over time. Further, our analysis provides insights into the needs of individual employees as well as what is offered by industries and companies, opening up possibilities to support individuals’ career choices, similarly to the work of Kern et al. (2019). To further increase the benefits of such an analysis for individual employees, one could consider additional information contained in reviews, such as the employee position of reviewers. In doing so, one could reveal the different needs, for example, for managers or freelancers. Finally, a more precise focus on differences across industries could further increase the impact of our work.

Fig. 9
figure 9

Word embedding similarity. The figure depicts supplementary results for our fourth hypothesis focusing on the content of aspect reviews. Similarly to Fig. 6, we see higher similarities between words related to hygiene factors and words extracted from negative reviews as well as higher similarities between words related to motivation factors and words extracted from positive reviews, respectively for German and English reviews