Introduction

Although constructive leadership concepts such as transformational leadership (Bass 1985) still dominates leadership research, the dark side of leadership such as destructive leadership (Einarsen et al. 2007), narcissistic leadership (Rosenthal and Pittinsky 2006) or abusive leadership (Liu et al. 2012) has received heightened attention in the last years. This is in line with our collective memory of organizational misbehavior in the last years ranging from Enron to the VW emission scandal; however, the field has become increasingly fragmented. Therefore, a deep understanding on the diverging field and its consequences on a number of key organizational outcomes is missing (e.g., Spain et al. 2014). This heterogeneity is reflected in the different level of analyses, ranging from individual levels such as Chief Executive Officers (Chatterjee and Hambrick 2007), middle managers (Zhang et al. 2017) to organizational levels (Brown 1997); or different cultural settings such as western- and eastern nations (Zhang et al. 2017). In addition, a plethora of outcome or mediating variables have been examined to better understand the effects of destructive leadership such as organizational resource depletion (Buyl et al. 2017), the adaption rate of technologies (Gerstner et al. 2013) or the organizational likelihood to be involved in lawsuits (O’Reilly et al. 2017). Most of these studies point to the double-edge sword of the field whereby dark sides of leadership can have negative as well positive consequences for organizational outcomes under certain boundary conditions (Rosenthal and Pittinsky 2006; Grijalva et al. 2015; Schyns and Schilling 2013).

Although previous research tends to engage in important narrative reviews (Rosenthal and Pittinsky 2006), meta-analytical reviews of a single construct (e.g., narcissism, Grijalva et al. 2015) or social psychological reviews (Paulhus and Williams 2002; Dinić and Jevremov 2019), these approaches may lack the necessary systematic, retrievable nature as well as the necessary breadth to fully understand the development of the destructive leadership construct. Moreover, classical reviews may be limited in scope, employ a short time period, analyze a small number of papers or may be prone to selection biases (Zupic and Čater 2015), thereby inhibiting our chance to make sense of these heterogonous literature streams. Furthermore, these studies may be particularly affected by unclear or unspecified definitions of constructs or differentiation of sub-dimensions, a problem with unobservable leadership traits (Antonakis et al. 2016; Tepper 2007). Therefore, in line with previous conceptualizations of the role of bibliometric studies in management (Zupic and Čater 2015), our aim is not substitute but to complement previous approaches (e.g., narrative reviews).

To address these aforementioned possible shortcomings, we propose a bibliometric approach and follow an established body of research (Di Stefano et al. 2012; Nerur et al. 2008; Ramos-Rodríguez and Ruíz-Navarro 2004) to provide an comprehensive overview about the intellectual origins and contributors of the destructive leadership literature by quantifying landmark papers (and authors) and their interrelationships. We complement this analysis by a visual clustering of key topics and its shifts over time, a decisive aspect of bibliometric studies (White and McCain 1998). Therefore, we hope to better understand the complex and non-linear consequences of destructive leadership.

To our best knowledge, this approach has not been employed yet the field of destructive leadership and organizational science. The first studies employing bibliometric approaches in the subfield treat leadership more broadly (Tal and Gordon 2016) or employ different methodologies (co-citation of authors) without relying on inductive and deductive search strings for their analysis (Dinić and Jevremov 2019). Therefore, our bibliometric study is valuable state of the art overview for empirical researchers, practitioners, and newcomers to the field. The paper further contributes to a recent stream of research reflecting on the intellectual antecedents and consequences of destructive leadership (Schyns and Schilling 2013) and the question how to synthesize elective perspectives (Nerur et al. 2016). Finally, the articles provides hints how bibliometric studies may benefit from combining insights from experts and predefined word lists in order to make these studies more robust against biases.

The remainder of the paper is organized as follows. "The dark side of leadership" section provides a brief conceptualization of dark leadership. "Overview of research methodology" section describes the methodological background including the bibliometric methods and the subsequent coding. In "Analysis" section, we briefly describe our research process. "Results" section provides results of the analysis and discusses potential drawbacks of the analysis.

The dark side of leadership

The dark side of leadership has been subject to scholarly scrutiny over—at least—the past decade. Researchers have attributed to the topic terms such as destructive leadership (Einarsen et al. 2007), narcissistic leadership (Rosenthal and Pittinsky 2006) or toxic leadership (Reed 2004), just to name a few. From these studies, it becomes evident that the dark side can be conceptualized at least on two dimensions. These two dimensions guide our bibliometric search strategy. First, dark personality traits as characteristics of top-leaders that have implications on the firm level (Conger 1990; Hambrick and Mason 1984). Second, dark traits can be conceptualized as interpersonal characteristics. For instance, conceptualizing dark leadership as behavior expressed by leaders that affect individual levels (i.e., employee outcomes). Tepper (2000, p. 178), using the term abusive supervision, conceptualizes this as “sustained display of hostile, verbal and nonverbal behaviors, excluding physical contact.”

However, Higgs (2009) describes in his narrative review that there are no current accepted definitions and delineations for the term destructive leadership. He argues (Higgs 2009, p. 168) that currently used key words to relate to the umbrella term destructive leadership are “leadership derailment”, “toxic leadership”, “Dark-side leadership”, “abusive leadership” or “destructive leadership”. Similarly, Tepper (2007, p. 262) argues that the literature is fragmented, poorly integrated and uses different methodologies.

In his taxonomy of dark personalities, Paulhus (2014) argues that a dark tetrad exist of personalities that consist of Narcissism, Machiavellianism, Psychopathy and Sadism. He argues that they can be classified based on the categories, Callousness, Impulsivity, Manipulation, Criminality, Grandiosity, and Enjoyment of Cruelty. As a consequence, all constructs overlap on the dimensions to varying degree but are distinct constructs. Paulhus and Williams (2002) argue that Narcissism, Machiavellianism and Psychopathy share high impulsivity and thrill-seeking along with low empathy and anxiety. This classification overlaps with other authors (e.g., Spain et al. 2014) that study dark sides of leadership and its effect on organizational outcomes. The conceptualizations above provide first ex-ante social psychological guidance on how to structure the field.

Overview of research methodology

Bibliometrics is a statistical analysis of physical units of publications such as books, articles, or other publications (OECD 2002). Thus, bibliometric network analyses complement traditional network analyses and introduce citation networks and collaboration structures and try to systemically visualize clusters of authors, methods, journals, disciplines, institutions or topics over longer periods of time. Bibliometric network analysis is a long established method in academia to quantify networks of academic literature (see Newman 2010 for an introduction) but particularly in strategic management and organizational theory as well as in adjacent disciplines. For instance, Ramos-Rodríguez and Ruíz-Navarro (2004) study the intellectual development of strategic management literature in one flagship journal. Podsakoff et al. (2008) study the influence of management scholars in a field and its dependence of university affiliation. Di Stefano et al. (2012) study two diverging literature streams (push versus pull innovation) via bibliometric to structure the field. Chabowski et al.(2013) study the global branding literature via bibliometric and consequently propose a research agenda. The advantages of those cited sources can be summarized by (1) bibliometric methods are quantifiable and objective and can therefore substitute or complement what experts have intuitively inferred; in addition, (2) as the management field consists of various research traditions and theories (Nag et al. 2007), bibliometric studies are able to uncover fragmented and hidden intellectual developments and (3) bibliometric studies cover extended periods of time and large samples to pinpoint the most influential ideas/schools of thought (Nerur et al. 2008).

Therefore, by employing a bibliometric analysis, we aim to (1) delineate the subfields that constitute the intellectual structure of a field; (2) determine the relationship between subfields, if any, (3) identify authors who play a pivotal role in bridging two or more conceptual domains and (4) visualize the intellectual structure in two-dimensional space in order to depict spatial distances between intellectual themes. To uncover interrelationships among and across literature, at least five approaches exist in the literature: citation analysis, co-citation analysis, bibliographical coupling, co-author-, and co-word analysis (Zupic and Čater 2015). Unlike citation analyses, which rely solely on sum of overall citations (“who cites whom”), document co-citation analysis (DCA) measures the relationship and interconnection between two or more documents by counting the number of references cited by both documents (Small 1973). Thus, a large common set of cited references indicates intellectual proximity. Vogel (2012) argues that the inclusion or exclusion of references in the bibliography, whether consciously or unconsciously, reflects the relative importance authors ascribe to related topics. Thus, the analysis is not bound to linguistic artefacts and does not rely on authors’ consensus (White and Griffith 1981; Vogel 2012). Its dynamic measurement is ideal to detect dynamically shifts in paradigms over long periods of time (Zupic and Čater 2015). Using large-scale data input, these frequently co-cited papers could be used to conduct multivariate analyses such as cluster analysis, factor analysis or multidimensional scaling (McCain 1989; White and McCAIN 1998). We mainly focus in this paper on DCA. This is being complemented by the citation burst, a citation detection algorithm by Kleinberg (2003) and Chen (2006). It identifies sudden shifts from few citations to many citations in the reference lists and indicates a finite period of time. It is established in Scientometrics (Takahashi et al. 2012; Cobo et al. 2011). To further complement the results, we conduct explorative co-authorship, co-word, journal, institutional and key word analyses that can be obtained upon request.

Our DCA is based on Freeman’s (1978) betweenness centrality metric in order depict pivotal points in the development of research. In addition, it provides a pathfinder analysis. According to Freeman (1977) and Mascolo (2013), it can be formalized in the following way:

$$C_{B} \left( i \right) = \mathop \sum \limits_{j \ne k} \frac{{g_{jk} \left( i \right)}}{{g_{jk} }}$$

\(g_{jk}\) can be written as the total number of shortest paths, \(g_{jk} \left( i \right)\) denotes the number of shortest paths from \(jk\) passing through \(i\). It is usually normalized by dividing the number of nodes. Consequently, betweenness is large if the level of observation (paper, author etc.) connects two distant parts of the networks, for instance, indicating that other authors must pass this author in order to reach any other node by the shortest way. The betweenness centrality metric is a commonly used metric in strategy research (e.g., Baum et al. 2014) and are of decisive importance in the composition of networks (Freeman 1977, 1978). In order to provide graphical clustering, co-citation is measured as cosine coefficient that normalizes for the geometric mean (Leydesdorff 2008). According to Chen et al. (2010), this can be formalized by \(w_{ij} = \left( {\frac{|A \cap B|}{{\sqrt {\left| A \right|*|B} |}}} \right)\), where \(\left| A \right|\) is a set of papers that cites \(i\) and \(\left| B \right|\) is a set of Papers that cite j, \(|A \cap B|\) is the co-citation count or the number time they are cited together. Although other measures exist to visualize the vector space, Leydesdorff (2008) argues that cosine is best suited as it is defined in geometric terms.

Finally, we conducted a pathfinder analysis, a commonly used method in bibliometric studies in Management (Di Stefano et al. 2012; Nerur et al. 2008) as well as in other disciplines (Housner et al. 1993; McDougall et al. 2001). It determines via a triangle inequality test whether a single link (the degree of relatedness) between two nodes (as represented here by references) should be preserved or eliminated and thus highlights particularly important connections in a network (Chen 2006; Schvaneveldt 1990). It is based on the geodetic and not the Euclidean distance between nodes (in this study research documents, authors, journals, co-authors, and co-words) (Nerur et al. 2008). It thus aims to provide the least-cost paths between nodes and excludes more expensive (i.e., longer) paths than the direct paths between nodes (Nerur et al. 2008). While other methods like multi-dimensional scaling (Di Stefano et al. 2012; Shafique 2013) exits to determine core links between nodes, Chen and Morris (2003) argue that pathfinder analyses demonstrate more predictable and more interpretable results.

After having finished the bibliometric analysis, we aim to provide a more structured analysis by categorizing the articles into four categories: conceptual, experimental, quantitative, and qualitative.

Analysis

In this section, we present details of each stage of the analysis that follows the recommendations of Zupic and Čater (2015).

Input word selection

Selecting appropriate key words is of heightened importance for bibliometric studies as the breadth and depth determines the boundaries of the network (Zupic and Čater 2015). In line with recommendations from content analytical methods and bibliometric approaches (Zupic and Čater 2015; Gamache et al. 2015), we chose a deductive and inductive approach by relying on predefined word lists from the literature as well as supplemented word lists from important academic experts. We derive seed words from Higgs (2009) for the initial literature. These seed words closely reflect prior approaches by including terms such as “narcissistic leadership”, “destructive leadership” or “toxic leadership”, similar to previous approaches (Schyns and Schilling 2013). An initial key word selection of about 300 words can be found in the Table 5 in the Appendix that we build from a snowball search on Elsevier Scopus, using the registered “key words” that authors provide. From this literature search, we also identify 20 key authors and survey them to supplement at least three key words. These authors represent a narrow yet highly influential group of scholars in the field. After providing an initial word list that were preselected in form of “dark side”, “dark triad”, “dark leadership”, “leadership”, “management”, “CEO”, “narcissistic”, “psychopathy” and “Machiavellianism”, we ask the following question: “In your opinion, which three keywords (regardless of which ones have already been mentioned) are important for this topic?”

In addition, we identify 20 university professors in Germany in the field of Organizational Science and Leadership and survey them with the same question as stated above. Therefore, a total of n = 40 experts were asked to participate. An overview about the surveyed experts can be obtained upon request but we chose not to disclose identities of experts due to privacy concerns.

A final list of 18 key words was derived that we included with truncation and the Boolean operator OR: “dark triad”, “abusive leadership”, “dark side”, narciss*, “abusive supervision”, “dark leadership”, “dark personality”, “dark tetrad”, “derailed leadership”, “destructive lea-dership”, egoism*, machiavellian*, “organizational neurosis”, “organizational psychosis”, “psychopath”, sadis*, “toxic leaders” and “tyrannical leadership”.

Selection of journals and databases

Bibliometric studies must carefully balance breadth and depth of the citied articles because it affects the size and interrelationships of the network. Journals represent “state of the art” knowledge and are used to accumulate knowledge in a field. We chose to limit our network to the 50 most important journals based on the Financial Times 50 Journal List, a commonly used research ranking based on academic peers. This list will limit the size of the network but enable more in-depth analyses of relationships, similar to predefined journals in traditional literature reviews (e.g., Nielsen et al. 2010). 49 of 50 journals were available in Web of Science. The Journal “Operations Research” (ISSN: 1526-5463) was not available at the time of inquiry.

It is sufficiently broad to ensure coverage of sub disciplines by including differently orientated journals such as Organizational Science, Organizational Behavior and Human Decision Processes, Human Relations, Journal of Applied Psychology, Journal of Management Studies or Journal of Management. Although other databases such as Google Scholar or Elsevier Scopus exist, we chose the database Web of Science because of its deep and long coverage of articles from 1945. Web of Science includes the Science Citation Index-Expanded (SCI-Expanded), the Social Science Citation Index (SSCI), and the Arts and Humanities Citation Index (A&HCI). We use the AND IS (IS = ISSN) function to search in the Web of Science database based on the ISSN number for Financial Times 50 journals. Based on the list of 50 journals, 42 journals were used for the subsequent analysis as they showed entries of the above provided key words and matched with the time scope of the analysis (2004).

A number of software tools to examine bibliometric relationships exists with various advantages and disadvantages (Cobo et al. 2011). We chose the tool Citespace (Chen 2006) because it is freely available and non-commercial, supports the above mentioned analytical steps (Cobo et al. 2011) and because it has been employed in previous research (e.g., Gurzki and Woisetschläger 2016). The program uses the title, descriptors, identifiers, abstract, cited references, times cited, and the year of publication to conduct the analyses.

Results

Co-citation results

Given our thresholds as cited above, we identify 569 articles based on 26,715 references. A descriptive inspection shows that 92% of the publication were published after 2004, indicating a heightened accumulation of publications after the year 2004. For the in-depth analysis (i.e., co-citation etc.), based on the inductive results, we therefore focus on the time after 2004. This focus yields 515 publications (Fig. 1).

Fig. 1
figure 1

Overview about the scope of analysis

Within these articles, most contributions stem from Journal of Business Ethics (25%), Journal of Applied Psychology (14%), and Journal of Management (8%), indicating that half of the field is dominated by three outlets. Most of the contributions stem from nine institutions in the USA and one from Singapore. These ten institutions contribute 41% of all articles. Figure 2 provides an overview about the geographical distribution of the results. Figure 3 provides an overview about the journal contributions.

Fig. 2
figure 2

Geographic distribution of results

Fig. 3
figure 3

Journal contribution in percent

Table 1 provides an overview based on 20 leading references sorted by citation frequency in the network. A larger network can be obtained upon request. Citation frequency is simply the number of citations (Chen 2006). The analysis indicates that Tepper (2007) literature review and Tepper et al. (2011) study belong to the most important references and can be considered basic research in this field. The latter study received the highest centrality score, indicating a central position in the network, for instance by connecting cluster 1 and 2. The former received the highest burst scores of all articles, indicating an attention generating publication. Mayer et al. (2012) study, drawing on social learning and moral identity of top-management-teams and ethical leadership received the second highest centrality score of 0, 09. Thau and Mitchell (2010) received the third highest centrality score of 0, 08, who study via three field studies two competing explanations (self-gain view versus self-regulation view) of abusive supervision on the reaction of subordinates. Other central papers in the network are Mitchell and Ambrose (2007), a study surveying employees to examine the relationship between abusive supervision and employee deviance; Mawritz et al. (2012), a study gathering survey data on 288 working to examine the “trickle-down” model (i.e., the effect of abusive supervision from high to low hierarchies) is another central paper.

Table 1 20 leading references sorted by citation count including centrality, burst and cluster number

Pathfinder results

Table 2 contains the pathfinder analysis based on the strongest citation burst. Interestingly, presence in the most frequent article table does not necessarily induce a strategic position in terms of centrality. The network-scaling algorithm reveals that some paper such as Hoobler and Brass (2006) become more important under this specification while papers such Tepper et al. (2006) remain relevant regardless the specification.

Table 2 Top-15 references after Pathfinder analysis with strongest citation burst based on 2004–2019

Citation burst

Table 3 contains the citation burst analysis. The burst detection allows to identify publications regardless of how many times their host articles are cited, thereby reducing the impact of time (Chen 2006). Although higher scores signal an increase of interest a reference is generating, a burst of 3.0 or above is considered a burst reference (Chen et al. 2008). The analysis indicates that Tepper (2007) and Mitchell and Ambrose (2007) received the most attention. While all studies receive attention after 2005 and tend to receive short attention of about two to three years, it is noteworthy that Tepper (2007) received attention across five years. Other high burst citations include Tepper et al. (2006), who employ a moderated mediation model of supervisor depression with 334 supervisor-subordinate dyads. Most of the studies tend to analyze the construct of “abusive supervision” and its influence in specific contexts (e.g., family-work conflicts, Courtright et al. 2016), indicating that other organizational outcomes remain highly under researched. We find little conceptual acknowledgment of constructs such as Sadism or Machiavellianism within the “abusive supervision” framework. Courtright et al. (2016) family-work conflict (FWC) theory as alternative theoretical mechanism to understand abusive behavior toward subordinates can be seen as “newest” trend.

Table 3 Top-15 references with strongest citation burst based on 2004–2019

Cluster analysis

The cluster analysis yields 10 clusters based on a modularity value of 0.61, indicating a rather fuzzy network with unclear boundaries. The silhouette score of 0, 7 indicates a good quality of each cluster (Chen et al. 2010). Table 4 provides an overview about the provided clusters. Interestingly, the analysis indicates that “supervision abuse” is a dominant theoretical lens but with distinct publication trends over time (mean year cluster 1: 2012, mean year cluster 2: 2004). In addition, we find that cluster 2 contains most publications with the highest burst score, further confirming our assertion that this cluster or the topic “supervision abuse” is a major trend or theoretical lens.

Table 4 Overview about clusters

In Fig. 4, one can see the visualization of the clusters and their interrelationships. The size of the nods indicates higher citation rates. An inspection of the results indicates that narcissism is implicitly or explicitly a dominant facet of the dark leadership research. The theory can be found in cluster 2, 3, 4, 5 and 10. Based on the illustration, Chatterjee and Hambrick’s (2007, 2011) CEO narcissism approach seem to be boundary spanning publications that spill over to other clusters such as CEO wrongdoing (Rijsenbilt and Commandeur 2013). Descriptive analyses reveal that Machiavellianism can be found in cluster 4 and Psychopathy in cluster 8 only while Sadism cannot be found in any cluster.

Fig. 4
figure 4

Illustration 1: Visualization of the network

Cluster 1 is the largest cluster with 105 publications with Tepper et al. (2011) as the highest citation frequency. Cluster 2 with 74 papers shows a large thematic overlap with cluster1 but distinct temporal patterns. The work by Tepper (2007) has received the highest citation frequency and therefore precedes cluster 1. Cluster 3 “moralized leadership” comprised 73 articles with Kish-Gephart et al. (2010) as highest cited source. In their meta-analytical review, the authors classify the antecedents of unethical behavior. This is followed by Gino et al. (2011) who study experimentally the link between self-control and cheating. Low self-control individuals exert a higher likelihood to engage in impulsive cheating. Cluster 4 “CEO narcissism” with 67 articles comprises Chatterjee and Hambrick (2007, 2011) who studied whether narcissistic CEOs affect organizational performance and whether narcissistic CEOs embrace certain technologies based on external media appraisal. High burst values of Chatterjee and Hambrick (2007) further indicate that this is a decisive publication. Rosenthal and Pittinsky (2006) are also part of the cluster who link narcissism theoretically to a number of outcome variables and discuss advantages and disadvantages of these types of leaders. Cluster 5 with 65 articles is “counterproductive work behavior research” including Berry et al. (2012) who compare self-reports and third-party reports counterproductive work behavior. Cluster 6 “work unit engagement” comprises of 63 articles. Within the cluster, Tepper et al. (2008) is the most important paper with 23 citations and a burst value of 5.09. The authors analyze the relationship between abusive supervision and organization deviance. Cluster 7 “role expectation” contains 33 articles with a mean publication year of 2007. Cluster 8 “corporate psychopathy” contains 14 articles. The most cited study by Boddy (2011) argues that corporate psychopaths have had a major part in causing the financial crisis in 2007. Cluster 9 “boundary conditions” with 11 publications. For instance, pay satisfaction or years of employment may affect the perception of subordinates and abusive supervision. Cluster 10 “CEO wrongdoing” contains 5 articles. The content overlaps with cluster 4 “CEO narcissism” as Rijsenbilt and Commandeur (2013) is the most cited study. The authors study all S&P 500 CEOs from 1992 to 2008 and examine the propensity to engage in managerial fraud. The cluster indicates that “wrongdoing” is a more objective, and externally accessible category (e.g., fraud, manipulation) compared to performance variables.

In Fig. 5, we provide a graphical representation of the 10 clusters. It becomes evident that cluster 1 and 2 became research priorities at different times (2004 and 2012). Interestingly, Cluster 1, 3, 4, 5, 6, 8 and 10 were research priorities in the same period of time (2009–2012), indicating a “hype” period of this kind of research.

Fig. 5
figure 5

Illustration 2: Timeline representation of the clusters based on mean citations

Results coding

In order to further classify the articles and to enhance the theoretical contribution through the bibliometric results, we chose—similar to previous research (Gurzki and Woisetschläger 2016)—to analyze the top-30 percentage of articles by dividing them into conceptual articles, qualitative, quantitative, and experimental approaches. The distinction between conceptual, qualitative, quantitative, and experimental approaches provides distinct means in the research process (Bryman and Bell 2015). Conceptual articles are publications based on overview articles and mathematical models in order to establish and develop theoretical concepts (Bryman and Bell 2015). Qualitative articles are publications such as interviews, focus groups, or ethnographic studies that contribute primarily to theory building (Bryman and Bell 2015). Quantitative articles are publications that rely primarily on primary and particularly secondary data for explanatory purposes and to test different theories and constructs (Bryman and Bell 2015). Experimental studies are publications that aim to test theories and causal explanations by providing clean laboratory settings or quasi-experimental settings (Bryman and Bell 2015). Although this classification may provide more latitude for researchers and therefore a possibility for biases, in line with previous research (e.g., Gurzki and Woisetschläger 2016), this analysis is primarily based to structure the field and to provide illustrative evidence.

Figure 6 shows the results of this coding. Overall, the top-30 percentage articles were classified (152). The analysis indicates that more than half of the publications can be assigned to quantitative articles (54%), followed by conceptual studies (27%), qualitative studies (16%) and experimental studies (3%). The results indicate that a shift from conceptual articles (e.g. 2004–2005) toward quantitative articles took place but this is in line with the overall higher number of publications. A slight increase in qualitative articles can be found from 2005. The dominance of quantitative articles may indicate that the field has become more mature with a solid theory foundation to engage in hypothesis testing. The minority of conceptual articles and a lack of experimental studies may point to research opportunities, in particular to test causal relationships. Between 2010 and 2013 the first experimental studies appeared. On the other hand, it may also reflect the decline of conceptual articles and non-empirical articles in the profession generally (e.g., Yadav 2010).

Fig. 6
figure 6

Illustration 3: Overview about the nature of articles

Discussion and conclusion

The objective of the paper was to trace the intellectual antecedents of the dark side literature by applying a bibliometric analysis. In order to accomplish this task, we apply a co-citation analysis of documents, pathfinder cluster analysis, a citation burst analysis and a cluster analysis. Finally, we code the main articles based on the structure of these articles. Our analysis indicates that research on the dark side of leadership is mainly focused on a relatively small set of researchers of a small number of institutions. In addition, the most influential paper can be traced around 2010, suggesting that the field is becoming more mature. The journals Journal of Business Ethics and Journal of Applied Psychology published the most research on the topic, suggesting that scholars interested in the field should closely monitor publications from these outlets. We also find that authors such as Tepper, Mitchell, Aryee, Mawritz, and Hobbler belong to key actors in the field. We also find heightened emphasis on narcissism while the construct of Machiavellianism or Sadism received less attention. We also found a dominant emphasis on a CEO level of analysis. These results raise the questions whether sufficiently attention has been paid to related yet distinct constructs on the CEO level other than narcissism (e.g., Machiavellianism) or whether other level of analyses have received sufficient attention. Based on the results, we see a more detailed and nuanced analysis of related yet distinct constructs (e.g., Machiavellianism or Sadism) as a major opportunity to move the field forward. This is important because social psychological research also points to the overlapping yet distinct nature of the construct (Paulhus and Williams 2002). The dominant emphasis on quantitative analysis raises also the question whether alternative research designs (e.g., experimental) have been employed sufficiently. These aspects point to important avenues for future research that also reflect broader concerns among management scholars (Antonakis 2017).

As with any research design, this approach is not without limitations. In line with previous research (Gurzki and Woisetschläger 2016), we chose to include only one source (WoS). Future studies may compare different sources of literature (e.g., Scopus). In addition, future studies may engage in an in-depth coding of different aspects of the retrieved articles (see Podsakoff and Dalton 1987). Deciding on a specific set of key words is challenging. Yet we believe that the integration of a predefined word list supplemented with different expert evaluations provides a fairly objective and representative picture of the field. Future studies may increase the number of surveyed experts, although we are aware of the difficulties when approaching experts with time-consuming tasks. However, we believe that incorporating different data sources (triangulation) from different authors (e.g., actual authors, academic experts, industry experts) and therefore blending inductive and deductive approaches is a distinguishing feature of this study. Hence, we encourage future research to make use of these opportunities to make search strings and bibliometric studies more robust against biases.

The results of the analysis via a bibliometric analysis provide some evidence that research streams have detectable characteristics. We believe that we provide preliminary evidence of the characteristics of the field that can be of great importance for practitioners and researchers alike.