1 Introduction

Structural Equation Modelling (SEM) based on Partial Least Squares (PLS) has become one of the most widely used approaches to analysing data and defining complex models in various research contexts. In recent years, the development of software (e.g., SmartPLS, Adanco, and WarpPLS) or packages in R (SEMinR, cSEM, and semPLS) has helped the diffusion and use of Partial Least Squares Structural Equation Modelling (PLS-SEM).

The rise in using so-called variance-based (also known as component-based, composite-based, or nonparametric based) approach is not only related to the diffusion of the software but is also mainly due to the growing development of methodological papers (see, for instance, Confirmatory Composite Analysis (CCA) method (Hair et al., 2020; Hubona et al., 2021), the HTMT approach (Henseler et al., 2015; Roemer et al., 2021), the cross-validated predictive ability test (Liengaard et al., 2021; Sharma et al., 2022) or conditional mediation analysis (Cheah et al., 2021a, b). These highlight the main characteristics of this approach, often in contrast to its covariance-based counterpart, showing the advantages of using PLS in a range of conditions.

Theoretical papers have been supported by the development of various types of scientific documents. More specifically, diffusion has been enhanced by the distinctiveness of PLS-SEM in empirical research in fields such as management (Memon et al., 2019), customer satisfaction (Ciavolino et al., 2015), marketing (Hair et al., 2012b; Sarstedt et al., 2022b), tourism research (Cheah et al., 2019b), knowledge management (Cepeda-Carrion et al., 2019), or information systems (Cheah et al., 2019a; Hair et al., 2017a).

However, the most significant influence comes from papers that support scholars in the use of this method by giving guidelines (for instance, Benitez et al., 2020; Henseler, 2021), step-by-step case study analyses, textbooks and handbooks (Hair et al., 2014a, 2018; Vinzi et al., 2010a) that continuously upload and inform scholars of the latest developments in the literature.

All this great emphasis has led to quantitative reviews: in particular, Khan et al. (2019) proposed a social network analysis using a WoS sample of 84 papers to analyse the structure of the authors, institutions, countries, and co-citation analysis to disclose the trending topics in the field. At the same time, Hwang et al. (2020) extended the Data Base (DB) used by Khan et al. (2019) to 108 papers (1979–2017), also considering Generalised Structure Component Analysis (GSCA) as research keywords to identify, via lexical co-occurrence, the dominant topics and links in the domain of PLS-SEM and GSCA.

This article integrates both Khan et al. (2019) and Hwang et al. (2020) but focuses only on the evolution of PLS-SEM, following a science-mapping workflow, as described by Börner et al. (2003), Cobo et al. (2011), and Zupic and Cater (2015) and extending the Web of Science (Clarivate database) sample to 3,854 documents. The workflow approach was implemented using the R package Bibliometrix (Aria & Cuccurullo, 2017). The database (provided as supplementary material) will ensure a systematic and reproducible bibliometric review of the literature produced so far.

Based on this assumption, the aim of this article can be summed up as offering a systematic, transparent, and reproducible bibliometric review (Donthu et al., 2021) on PLS-SEM originating on a global basis, providing a snapshot of scientific activity and highlighting the trend of annual scientific production, the seminal documents that boosted new theories and applications, the authors’ production, the scientific collaboration between scholars and countries, the source dynamic, and a historiographic overview.

The remaining parts of the paper are organised as follows: Sect. 2 introduces the selection criteria and methods adopted in this citation analysis; Sect. 3 shows the results divided into general data description (Sect. 3.1), citation analysis, and reference publication year spectroscopy (Sect. 3.2), the authors (Sect. 3.3), countries and institutions collaborations (Sect. 3.4), Sources (Sect. 3.5), and historiographic (Sect. 3.6) analyses. In the last section, we present the discussion and the future developments envisaged.

2 Materials and Methods

2.1 Selection Criteria and Strategy

The selection strategy followed the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines (Moher et al., 2009), as illustrated by the PRISMA flow diagram in Fig. 1. Additionally, bibliometric analysis was performed from January 1985 to December 2020 with the open-sourceR package Bibliometrix (Aria & Cuccurullo, 2017) for documents retrieved from the Web of Science (WoS) database maintained by Clarivate Analytics.

Fig. 1
figure 1

PRISMA flow diagram of the process of identification and screening of the documents included

The selection of the publications focused on the Structural Equation Model in Partial Least Squares (PLS-SEM) framework, also known in the literature as PLS Path Modelling (PLS-PM). The publications were selected using a query referring to the different names and acronyms adopted over time in the different theoretical and applicative contexts.

As defined in the identification and screening steps reported in Fig. 1, the query that describes the selection is the following: “Database: Web of Science Core Collection. TOPIC: (“Partial Least Squares Structural Equation Model*”) OR TOPIC: (“Partial Least Squares SEM”) OR TOPIC: (“PLSSEM”) OR TOPIC: (“SEM-PLS”) OR TOPIC: (“PLS Structural Equation Model*”) OR TOPIC: (“Partial Least Squares Path Model*”) OR TOPIC: (“PLS Path Model*”) OR TOPIC: (“PLS-PM”) OR TOPIC: (“Structural Equation Model* PLS”) Refined by: DOCUMENT TYPES: (ARTICLE OR BOOK CHAPTER OR REVIEW OR BOOK REVIEW). Languages = English. Indexes = SCI-EXPANDED, SSCI, ESCI. Time = 1985–2020”.

The period considered is 1985–2020, including only English documents (articles, book chapters, reviews, and book reviews) in the following Web of Science Core Collections: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), and Emerging Sources Citation Index (ESCI). The period selected starts from 1985 since it is the first year with consistent WoS database document metadata.

A qualitative selection of documents integrated the collection obtained with the query of the WoS database in order to take into account relevant papers that do not match the selection criteria but have more general metadata, such as, for instance, “Partial Least Squares” or “Variance based approach”. In these cases, the query will return a considerable number of documents in the PLS estimator field, referred to as regression approaches or data reduction methods.

2.2 Reference Publication Year Spectroscopy (RPYS)

Reference Publication Year Spectroscopy (RPYS) was proposed by Marx et al. (2014) and it is adopted to define an out-of-collection temporal profile to identify the most influential document in the context of PLS-SEM. In brief, RPYS allows identification of the seminal documents in the collection by considering the metadata from the references, including books, handbooks, or theses not indexed in WoS. The seminal documents can be considered the crucial push to developing new theories and applications. Two lines are usually reported in the plotting of RPYS:

  • The black line represents the number of cited references (NCRs) per year, which is the total number of documents cited in the collection each year. Years with several cited documents indicate publications that might be interpreted as historical roots (or landmarks) of the field analysed.

  • The red line is the deviation from the 5-year median at the current time t compared to the previous four (t − 1; t − 2; t − 3; and t − 4). The deviation helps identify peaks in the publication years, showing a smoother line that compares each previous year so as to identify publication years with significantly more cited documents than other years.

2.3 Network Analysis for Scientific Collaborations

The authors and countries collaborate through a network analysis based on creating a matrix A Documents x Attributes (e.g., authors, countries, and affiliations).

The matrix A is a rectangular binary matrix in which the ith row is the document, and the jth column is the attribute, for instance, the author, and the generic element ai,j is 1 when the jth author contributes to the ith document and 0 otherwise.

A collaboration network is based on the definition of a network where the nodes are the authors (or countries) and the links between nodes are the collaborations (Glnzel and Schubert, Glänzel & Schubert, 2004). The network creation is based on the following adjacency matrix: Bcoll = A·A, in which the diagonal element is the number of documents published by a single author i (or all authors in the country i). In contrast, the extra diagonal elements are the number of collaborations between authors i and j.

The authors' (or countries’) collaboration patterns are identified by the two-step multilevel algorithm proposed by Blondel et al. (2008), which is based on the maximisation of the score Q (Newman, 2003):

$$ Q = \frac{1}{2m}\sum\limits_{ij} {\left[ {b_{ij} - \frac{{\delta_{i} \delta_{j} }}{2m}} \right]} \;s_{i} s_{j} $$
(1)

where m = 1/2∑ij bij is the number of edges, bij is the generic element of the adjacency matrix B defined above, δi is the degree of node i, and si is the membership of node i to a community.

The Q score measures the connection of the nodes within a community, where Q = 0 is a random structure and Q = 1 is a perfectly connected structure. The multilevel algorithm is iterated through the two-step procedure (Blondel et al., 2008).

2.4 Historiographic

The historiographic analysis is based on a chronological citation network map defined by the most relevant direct citations in the extracted bibliographic collection (Aria & Cuccurullo, 2017; Garfield, 2004). The network map generation is based on the definition of a direct citation matrix of the documents in the collection, chronologically reflecting the connection among the top 20 authors (by default). The historiographic map represents the following entities:

  • Each node is a document cited by other documents.

  • Each edge represents a direct citation.

  • On the horizontal axis, the publication years are reported.

3 3. Results

3.1 Data Description

The initial query identified 5,289 documents, but after the Screening Step and the Qualitative Selection, the final number included in the collection is 3,854, as reported in Table 1. The documents showed an average citation per document equal to 19.62 in the period considered (1995–2020) and were published by 8,972 authors in 1,172 sources (Journals, Books, etc.). Even if the year range in the query starts from 1985, the oldest document returned by the query is dated 1995, so the collection actually begins from 1995.

Table 1 Main statistics about the SEM with PLS collection

The annual growth rate of the publication is 29.08%, demonstrating an impressive exponential growth rate. Fig. 2 shows the annual trend of the publication to be constant until 2010/11, and then the last ten years shows an impressive production of new papers that is still exponential growing.

Fig. 2
figure 2

Annual scientific production

Starting from 2011/2013, the attention on the SEM based on PLS became intense, and we can suppose different motivations. One of these came from the book “A primer on partial least squares structural equation modelling (PLS-SEM)” Hair et al., 2014a),which significantly impacted the use of SEM, particularly, with SmartPLS software support, which also helps the user with an intuitive GUI. Moreover, the upsurge from 2011 may be due to the appearance of the first article “PLS-SEM: Indeed a Silver Bullet”, by Hair et al. (2011). The increase in the slope of the curve from 2017 may be due to the appearance of the second edition book ().

Moreover, Rönkkö and Evermann (2013) published a paper with an emblematic title: “A critical examination of common beliefs about partial least squares path modelling”that criticises the PLS approach and, at the same time, stimulates a large production of new papers to show, in a more structural way, the advantages (Hair et al., 2017b; Henseler et al., 2014), opportunities (Sarstedt et al., 2016) and a new proposal (Dijkstra & Henseler, 2015a), from a theoretical point of view, but also applications in different fields (Petter, 2018; Sarstedt et al., 2014) and guidelines and tips to use it (Benitez et al., 2020; Cepeda-Carrion et al., 2019; Hair et al., 2019; Henseler et al., 2016a; Lowry & Gaskin, 2014).

3.2 Most–Cited Documents

Global citation (GC) measures the impact of a document on the entire database collection (provided by Clarivate). On the other hand, local citation (LC) measures the citations of a document in the collection analysed (in this case, n = 3,854 documents). These two indices express the global influence of a document on all subject areas (e.g., business, tourism, and engineering) and on a more specific field considered (e.g., PLS-SEM), respectively, which means the use of the document in different contexts and the use limited to PLS-SEM.

The ratio between LC and GC reflects the specificity of the document analysed and the influence on the well-defined field of SEM with PLS circumstanced by the database collected through the guidance from the PRISMA flow chart (see 2.1).

Based on GC, the document most cited is the paper by Henseler et al. (2015), titled “A new criterion for assessing discriminant validity in variance-based structural equation modelling”, followed by Chin et al. (2003), and Tenenhaus et al. (2005) in second and third place, respectively (Table 2).

Table 2 Most cited document local and global

If we also consider the LC, then the paper “An assessment of the use of partial least squares structural equation modelling in marketing” research by Hair et al. (2012b) is in second place of the most-cited papers.

From a global perspective, the first three papers show considerable interest in developing the methodological side of PLS-SEM, also confirmed by the work of Khan et al. (2019). These results show that the methodological developments are the primary references of all types of papers and that the authors pay great attention to them. Furthermore, in second place, the local citation results show a review paper that can support scholars in the marketing field by proposing guidelines for avoiding common pitfalls in PLS-SEM use.

The most influential papers in terms of the LC/GC ratio % are Rigdon (2012), Sarstedt et al. (2014), and Hair et al. (2017a), which represent the most-cited papers in the collection of the whole Clarivate database, balancing the specific and general role of these contributions/documents.

The paper by Ringle et al. (2012) celebrates the divorce between PLS and Maximum Likelihood Estimator (MLE), between factor and composite-based models, and the rethinking of the PLS approach as a purely composite-based method. The second and third documents guide business research and review information system papers published in the Industrial Management & Data Systems and MIS Quarterly. Then, the theoretical papers dominate the balance between LC and GC, followed by guidelines and review papers.

Reference Publication Year Spectroscopy (RPYS), proposed by Marx et al. (2014), was designed to create a temporal profile of cited references for the collection of selected documents, identifying the seminal publications in the defined research context (in our case, PLS-SEM), which represent the historical origins in terms of temporal roots.

The purpose of using RPYS is to answer the question: Which studies, theories, and ideas have influenced PLS-SEM and boosted new research and applications? Moreover, RPYS allows us to consider documents outside the collection, including, for instance, books, handbooks, or theses not indexed in WoS.

Figure 3 can help identify PLS-SEM’s historical origins in terms of temporal roots. As reported above (Sect. 2.2), Fig. 3 shows two lines, where the black line indicates the Number of Cited References (NCRs) per year, while the red line indicates the deviation from the 5-year median. Considering both NCRs and deviation from the 5-year median, the possible seminal documents are reported in Table 3, in which the first column is the year, the second column the denomination, and the last a description of the main theme.

Fig. 3
figure 3

Reference publication year spectroscopy

Table 3 Seminal documents

From 1975 to 1998, without considering the books by Cohen (1988) and Nunnally and Bernstein (1994), the seminal papers are mainly theoretical papers that contrast the covariance-based approach to PLS-SEM, with the first suggestion of its use in the IS field by Chin (1998).

Venkatesh et al. (2003) present the first significant use of PLS-SEM as a tool to validate a Unified Theory of Acceptance and Use of Technology (UTAUT), while Chin et al. (2003) presented a simulation study on the interaction effect considering measurement error.

For the next twelve years, the seminal documents are mainly theoretical papers, handbooks, guidelines, and technical documents. However, as already highlighted in the discussion in Table 2, even though there are many PLS studies that have influenced and boosted the development of PLS-SEM from the theoretical aspect of the analysis, such as hierarchical models, multigroup analysis, and the HTMT method, yet some authors still continuously criticise the absence of this type of contribution.

Moreover, handbooks, guidelines, and technical documents support scholars in adopting the approach in different fields, especially integrating consolidated theory and software such as SmartPLS and the R package plspm. Both have played and still play a vital role in the adoption of the PLS-SEM in various research fields. The first software still improves the usability and integrates new theoretical developments. See for instance, the third edition of the Primer PLS textbook by Hair et al. (2022) and the advanced textbook of PLS-SEM by Hair et al. (2018). The R package plspm, now discontinued, inspired the development of new open R Packages such as SEMinR, cSEM, and semPLS (Hair et al., 2021).

3.3 Author Network

The data collection considers 8,972 authors and Fig. 4 shows the production of the top 10 authors over time, where the red line is the timeline. For instance, in our collection, Ringle CM has the most extended production (2008—2020). Instead, Cheah JH started in 2018. The bubble size is proportional to the number of documents per year, and the colour intensity is proportional to the total citations per year.

Fig. 4
figure 4

Author production over time

In terms of lifelong production, the first four authors are Ringle CM, Sarstedt M, Hensler J, and Hair JF. They have on-going production and citation points along the timeline, particularly in 2012, 2014, 2016, and 2018–2020. Even if they started later in the field, some authors have become representative in production and citation, showing the increasing diffusion of SEM with PLS among authors from different countries.

It is of great interest to analyse the social structure of the authors in a way that shows scientific collaboration (Glänzel, 2001) and the strength of links.

among scholars in the global research community. The collaboration network reported in Fig. 5 is obtained by performing the multilevel algorithm (Blondel et al., 2008; Newman, 2003). The plot is interpreted as follows:

  • The colour of the bubble defines a cluster of scholars.

  • The size of the bubbles is proportional to the total number of papers published by an author.

  • Arrows (edge) show the connections in the network collaboration and the strength of the cooperation. Arrows of the same colour depict intra-scholar collaboration. The grey arrows show collaboration between groups of scholars.

Fig. 5
figure 5

Author collaboration network

Figure 5 shows 10 clusters (nodes = 50; minimum edges = 1) with the largest group with orange bubbles, including Hair JF, Ringle CM, Sarstedt M, Gudergan SP, Hensler J, Nitzl C, Dijkstra TK, Schuberth F, and Chin WW (the eight most active scholars from a theoretical and applicative point of view). The orange group published two of the leading textbooks on PLS-SEM (Hair et al., 2014a, 2018) and also developed the software SmartPLS (Ringle et al., 2015), PLS-Graph (Chin, 2001), and the R package cSEM (Rademaker & Schuberth, 2020). Looking at the principal connected authors, Fig. 5 shows that the orange group has a connection with the green group and, in particular, through Sarstedt M, Ringle CM, and Hair J who are connected in particular with Cheah JH who appears to be the pivot with the green group (including Ting H, Memon MA, Thurasamy R, and Ciavolino E) and the blue group (Ramayah T, Roldan J, Rasoolimanesh SM, Kock N, Jaafar M, Ali F, Li J), also directly linked to the orange one mainly through Ramayah T, Roldan J, and Rasoolimanesh SM. The other two groups are connected: the light green (Leal-Rodriguez AL, Ali M, Albort-Mortan G, Ariza-Montes A, and Hernandez-Perlines F) and the pink (Latan H, Jabbour CJC, and Nejati M).

The green group works both from a theoretical point of view, developing new contributions, methods, and strategies, and from an applicative point of view, in.

the fields of marketing consumer behaviour, also giving an outstanding contribution to reviews and guidelines on PLS-SEM,

The blue group focuses on technology, tourism and hospitality, and management research, focusing on technology adoption and usage in business and management.

The remaining groups are not linked to the others based on the database considered. Of course, the collaboration network is based only on co-authorship and does not consider the organisation of conferences, informal groups, research projects, and documents not included in the WoS collection.

Moreover, comparing these results with Khan et al. (2019), an evolution in the collaboration between groups can be spotted (see Fig. 5), as few clusters are totally isolated. This means that new connections have been created in the last three years and that new scholars can benefit from the ties and information that come from the experience of writing a joint paper.

3.4 Country Network

Table 4 reports the most productive countries, showing each country's inclination to cooperate with others. Finally, international collaboration is evaluated by considering the following.

  • Intra-country collaboration as the number of documents by authors from the same country: Single Country Publications (SCP).

  • Inter-country collaboration as the number of documents produced by authors from different countries: Multiple Countries Publications (MCP)

Table 4 Main country

Among the top ten productive countries, the most collaborative ones are Australia (MCP = 57.23%), Pakistan (MCP = 51.18%), and the United Kingdom (MCP = 61.17%), showing that highly productive countries (China, Malaysia, the United States, and Spain) have less propensity for cooperation (MCP = 36.59, 32.23, 33.90 and 28.09%, respectively).

Figure 6 shows three clusters of countries (nodes = 50, minimum number of edges = 1) emerging from multilevel analysis (Blondel et al., 2008). For Blue Cluster, the USA has the highest betweenness centrality (99.195), followed by Australia (59.043), Germany (30.067), and the Netherlands (9.512). In the Green Cluster, the UK has the highest betweenness centrality (133.060), followed by Spain (39.125), France (22.422) and Italy (12.929). For the Red Cluster, Malaysia has the highest betweenness centrality (127.451), followed by China (90.123), Pakistan (14.848), and Saudi Arabia (8.194).

Fig. 6
figure 6

Country collaboration network

The collaboration among the three clusters is ensured by countries with the highest closeness, acting as a pivot in their community (cluster): the USA, Australia, Germany, and the Netherlands are highly cooperative with China and Malaysia, but also with the UK and Spain; Malaysia, China, and Pakistan are also closely connected to Australia, Germany, the UK, and Spain.

A more comprehensive overview of collaboration among countries is shown by the country collaboration map (Fig. 7) that shows global cooperation (at a glance) between all scientific communities globally (minimum number of edges = 5).

Fig. 7
figure 7

Country collaboration map

The results mostly reflect the author network but can also present a general worldwide picture that can be useful to support scholars in the definition of international projects. In many calls (see Erasmus + and Horizon Program), one of the evaluation requirements or strengths is to show a consolidated collaboration network. Fig. 6 highlights the country collaboration network that support the future opportunity in creating a new collaboration or develop a new network in poorly developed ties that can be supported by funds from international calls.

3.5 Most Important Journals

The top ten places (4 ex aequo) for most productive journals in 1985–2020 are reported in Table 5, and the trend is shown in Fig. 8. The top ten places of the most prolific journals, covering about 17% (658 documents) of the whole collection (3854), are as follows:

  • First five places: The first journal is Sustainability, with 199 documents, even if it can be considered a little out of range compared with the general distribution of all the other journals. Journal of Business Research published 75 documents, followed by Industrial Management and Data Systems (60), Journal of Cleaner Production (48), Journal of Retailing and Consumer Services (44), and ex aequo Computers in Human Behaviour, International Journal of Environmental Research and Public Health, both with 41 documents.

  • The second five places. The second group in the top ten is defined by: Cogent Business and Management with 37 documents, followed by the International Journal of Contemporary Hospitality Management and the International Journal of Hospitality Management ex aequo with 29 documents and Frontiers in Psychology (28) and the Asia Pacific Journal of Marketing and Logistics (27).

Table 5 Most important journals
Fig. 8
figure 8

Cumulative occurrences per journal

These journals are in five main subject categories: Health and Environment (Environmental Studies, Green, Sustainable Science, Technology, Environmental Engineering, and Occupational Health); Business, Industrial Engineering and Psychology (Experimental Psychology and Psychology Multidisciplinary), and Tourism Management (Hospitality, Leisure, Sport, and Tourism).

Evaluating the contribution of these journals is interesting in terms of relative production of the total number of papers published until 2020.

The relative contribution in Table 5 is reported in the fourth column. It is possible to highlight that the comparative top five journals become Cogent Business and Management, Asia Pacific Journal of Marketing and Logistics, Industrial Management and Data Systems, Journal of Retailing and Consumer Services, and International Journal of Contemporary Hospitality Management.

Figure 8 helps to analyse the trend of the journals during the period. From 1995, the first consideration is that the open-access journal Sustainability has seen a substantial increase in production. Then, moving on to the standard journals, the trend line shows that production was driven by three journals (Journal of Business Research, Industrial Management and Data Systems and Journal of Cleaner Production) with three different main topics (Business, Industrial Engineering, Health and Environment), followed by the topics Psychology and Tourism Management.

3.6 Historiographic

An in-depth investigation based on a chronological network map offers fascinating cues to the historical evolution of SEM with PLS in the period 2003–2019. Fig. 9 shows the historiographic map that lays out the chronological network of the most relevant direct citations in the selected collection (Garfield, 2004). The interpretation of the historiographic map (see Sect. 2.4) is based on the node (document cited by other documents) and the edge (direct citation) shown along the publication year and reported on the horizontal axis.

Fig. 9
figure 9

Historiography

The map considers 20 documents and starts in 2003 with the paper by Chin et al. (2003) titled “A partial least squares latent variable modelling approach for measuring interaction effects”. Results from a Monte Carlo simulation study and an electronic-mail emotion/adoption study published in Information Systems research can be identified as the paper in first place. This paper introduces the PLS approach with latent variable models to estimate interaction effects considering measurement error. The PLS model is shown via a Monte Carlo simulation, considering two case studies in the information system domain. The following is the year-by-year evolution and contribution of papers in the collection.

  • The year 2005. This paper shows the basics of the algorithm with a more statistical-oriented discussion on the potentiality of the method, compared to the MLE estimator, with an insight into some extensions to more classical data analysis methods.

  • Tenenhaus et al. (2005). PLS Path Modelling.

  • The year 2009. This year, the first paper analyses the performances of the PLS in contrast with the MLS via Monte Carlo simulation, while in the second one, the authors show how to assess the hierarchical models and compare the results of PLS with MLE, giving some suggestions and limitations for the adoption.

  • Reinartz et al. (2009). An empirical comparison of the efficacy of covariance-based and variance-based SEM.

  • Wetzels et al. (2009). Using PLS path modelling for assessing hierarchical construct models: Guidelines and empirical illustration.

  • The year 2012. This year marked the appearance of one of the first papers that redesign and rethink PLS showing the main advantages of a composite-based method. Then, a deep-simulation study on the four types of hierarchical models is presented, considering the repeated indicator approach, two-stage approach, and hybrid approaches. Finally, three interesting reviews in strategic management, operation management, and market research are published, comparing different studies on PLS to offer comprehensive, rigorous guidelines to help scholars avoid the common pitfalls in PLS adoption.

  • Rigdon (2012). Rethinking partial least squares path modelling: In praise of simple methods.

  • Becker et al. (2012). Hierarchical latent variable models in PLS-SEM: Guidelines for using reflective-formative type models.

  • Hair et al. (2012a). The use of partial least squares structural equation modelling in strategic management research: a review of past practices and recommendations for future applications.

  • Peng and Lai (2012). Using partial least squares in operations management research: A practical guideline and summary of past research.

  • Hair et al. (2012b). An assessment of the use of partial least squares structural equation modelling in marketing research.

  • The year 2013. An extensive simulation study and discussion are proposed to evaluate the properties of GoF (Esposito Vinzi et al., 2008; Tenenhaus et al., 2004) and GoFrel (Vinzi et al., 2010b).

  • Henseler and Sarstedt (2013). Goodness-of-fit indices for partial least squares path modelling.

  • The year 2014. The answer to the paper of Rönkkö and Evermann (2013) is addressed this year by Henseler et al. (2014). Based on the limits of the Rönkkö and Evermann’s study, the authors debunk the myths of PLS, showing the importance of the estimator in the context of the social sciences. Moreover, a paper on the opportunity to use PLS-SEM in business research is presented.

  • Henseler et al. (2014). Common beliefs and reality about PLS: Comments on Rönkkö and Evermann (2013).

  • Sarstedt et al. (2014). Partial least squares structural equation modelling (PLS-SEM). A useful tool for family business researchers.

  • The year 2015. This is the year when the Consistent version of the PLS estimator was presented, giving a new impetus to theoretical developments, which means, from a statistical point of view, overcoming the limits of PLS by proposing a consistent version of the estimation of the parameters and also a new criterion for the goodness of fit. Moreover, a new criterion for assessing the discriminant validity, the heterotrait-monotrait ratio of correlations, is proposed.

  • Dijkstra and Henseler (2015b). Consistent partial least squares path modelling.

  • Henseler et al. (2015). A new criterion for assessing discriminant validity in variance-based structural equation modelling.

  • The year 2016. Three interesting tutorials are proposed: Mediation Analysis, which shows the latest procedures that should be adopted with new statistical findings; Common factor and Composite models are analysed via a simulation study, and guidance is proposed; new developments in the technology field are presented (cPLS, Confirmatory Composite Analysis, heterotrait-monotrait ratio of correlations). Moreover, this year, a new methodological approach is proposed to test the Measurement Invariance of Composite Models (MICOM).

  • Nitzl et al. (2016). Mediation analysis in partial least squares path modelling: Helping researchers discuss more sophisticated models

  • Sarstedt et al. (2016). Estimation issues with PLS and CBSEM: Where the bias lies!

  • Henseler et al. (2016a). Using PLS path modelling in new technology research: updated guidelines.

  • Henseler et al. (2016b). Testing measurement invariance of composites using partial least squares.

  • The year 2017. A review of applications in information systems studies published by IMDS and MISQ (2010–2014) shows an improvement in using more complex modelling.

  • Hair et al. (2017a). An updated and expanded assessment of PLS-SEM in information systems research.

  • The year 2019. A paper that can be seen as guidelines on reasons for choosing PLS-SEM and the way to report the results. Included rules of thumb and new guidelines on PLSpredict.

  • Hair et al. (2019). When to use and how to report the results of PLS-SEM.

4 Discussion and Conclusions

This study proposes the analysis of the scientific activity structure of PLS-SEM by a systematic and reproducible bibliometrics citation analysis. The purpose is to assist and guide researchers through the history and future of PLS-SEM theoretical and applied research.

The documents are analysed based on the PRISMA guidelines (see Subsect. 2.1). Given the selection process with the query adopted, the final number of documents included in the collection was 3,854. The sample of documents was retrieved from the Web of Science from January 1985 to December 2020, using the Clarivate database.

The adoption of the PRISMA approach, the availability of the WoS database as supplementary material, and the workflow approach implemented using the BibliometrixR package, guarantees the reproducibility of the results obtained.

The scientific activities investigated in the paper, through bibliometric citation analysis, are related to the annual trend of scientific production, cited and seminal documents, authors’ production, scientific collaborations, source dynamics, and historiographic overview.

  • The annual trend of scientific production shows a growth rate of 29.08%, depicting an impressive production of new papers. One of the reasons is the publication of the Handbook A primer on partial least squares structural equation modelling (PLS-SEM) by Hair et al. (2014a), which collected all the previous research, with practical examples also linked with the GUI software, SmartPLS. Moreover, the critical paper published by Rönkkö and Evermann (2013) provides the impetus for the production of new theoretical and application papers (Cheah et al., 2021a, 2021b; Dijkstra & Henseler, 2015a; Hair et al., 2017b; Henseler et al., 2014; Sarstedt et al., 2014, 2016) and guidelines and tips documents (Hair et al., 2019; Henseler et al., 2016a; Lowry & Gaskin, 2014).

  • Cited and seminal documents. The most cited documents in the WoS (Global Citation) and the local collection (Local Citation) are the papers by Henseler et al. (2015), Hair et al. (2012b), and Tenenhaus et al. (2005). In terms of LC/GC ratio % (balance between the specific and general), we find Sarstedt et al. (2014), Rigdon (2012), and Ali et al. (2018) and in terms of diffusion speed, it is possible to notice the paper published by Hair et al. (2019). With the exclusion of introduction, rethinking, and discriminant validity, all dominant papers answer the same question: How to interpret and report PLS-SEM results. Moreover, it is fascinating to analyse the references in the collected papers to obtain a historical view based on an out-of-collection analysis. This can help identify the seminal papers. Excluding the 2000s, the seminal papers are Wold (1975), Fornell and Bookstein (1982), Cohen (1988), Bagozzi and Yi (1988), Löhmoller (1989), Nunnally and Bernstein (1994), and Chin (1998).

  • Authors’ production and collaborations. Considering the numerical and time-trend production, the four top authors in PLS-SEM are Ringle, Sarstedt, Henseler, and Hair. The most important thing is the scientific collaboration network among scholars. Here, some research groups can be identified, marked by the colours orange, green, and mauve, which show the specific research characteristics, such as theoretical developments, new methods, and analysis strategies as well as specific fields where PLS-SEM is mainly used. Exciting connections are evident among the groups that show how they collaborate for the new challenging innovations.

  • Collaborations between countries. In terms of scholars, some countries are preponderant in terms of the number of papers, such as China and Malaysia. Some others are mainly devoted to collaboration, such as Australia, Pakistan and the United Kingdom. In any case, a good level of collaboration can be seen worldwide.

  • Source dynamics. The first five places of the most prolific journals (Sustainability, Journal of Business Research, Industrial Management and Data Systems, Computers in Human Behaviour, Journal of Cleaner Production and Journal of Retailing and Consumer Services) give another picture of the main themes addressed: Environmental Studies, Business, Industrial Engineering and Psychology.

  • The historiographic overview shows the evolution of the research in the data collection. The map points out the position paper in the year 2003 as a statistical-oriented paper that discusses the potentiality of the method in contrast with MLE, going year by year to evolve into the hierarchical models, the proposal of new strategies and indices, the limits of the PLS approach, and the answer to these limits by proposing the consistent version. The creation of papers, guidelines, and tutorials in different application fields is proposed to support scholars, show potential, and help interpret real case studies. In the last few years, new methodological proposals have come with HTMT, MICOM, and mediation and moderation analysis.

The results showed that some scholars and groups mainly drive PLS-SEM research. Nevertheless, it is clear that these groups are highly open to scientific communication and improving synergy to promote more translational research in this field.

The winning strategy has been the worldwide collaboration among scholars in geographical terms and on PLS-SEM. These interconnections are also evident in the hard work carried out on the theoretical aspects to answer new unsolved problems in PLS-SEM or improve the method's ability to give better results in terms of the measurement model and structural model. See, for instance, the following hot topics: NCA (Richter et al., 2020), CCA (Ciavolino et al., 2021; Hair et al., 2020; Henseler & Schuberth, 2020; Hubona et al., 2021; Schuberth, 2021), HTMT2 (Roemer et al., 2021), Nonparametric Distance Based Test (Klesel et al., 2019, 2021), Conditional Mediation Analysis (), CVPAT for prediction (Liengaard et al., 2021), Model Selection Criteria (Danks et al., 2020; Sharma et al., 2019, 2021).

Moreover, theoretical development is important in various research fields, where the method is crucial for the analysis of the data, in using new features by providing explicative examples, guidelines, tutorials, step-by-step procedures taking into account also the use of software like SmartPLS, Adanco, WarpPLS, or packages developed in R, as SEMinR, cSEM, semPLS.

Moreover, as Hwang et al. (2020) argued, it is worth exploring both PLS-SEM and GSCA together. This can be another interesting way to create new ties between these two research groups that are venturing on the development of composite-based structural equation modelling. In addition, Hwang et al. (2020) also stressed that it is worth exploring some new methodological developments from the GSCA technique, such as nonrecursive models, panel and longitudinal analysis, multilevel modelling, etc. Learning from these GSCA techniques will support PLS-SEM researchers in different fields to achieve a breakthrough in their work, particularly by discovering similar directions of methodological developments.

The interconnection among scholars worldwide, the openness of research groups for collaboration, and the balance between theoretical and applicative developments in this field seem to be the key to the success of PLS-SEM, which can be seen as a best practice to develop, improve, and support any research fields.

4.1 Limitations and Further Directions of Research

The limitations of this study refer to the use of the WoS database by Clarivate as the collection of data is limited compared to other databases, such as Scopus or Google Scholar. The main limitation of this database is that it does not include books and handbooks or some journals that are not indexed as they are mainly in the human and social science fields. Nevertheless, the choice is still the best as the WoS database is the most consistent and reliable from the metadata point of view.

Moreover, this paper is limited to citation bibliometric analysis, so it does not consider the content of the documents. Future research should include bibliometric content analysis to analyse parts of the documents, such as title, abstract, and keywords. This could help scholars understand the content evolution from the PLS-SEM point of view.