Introduction

With the growth of cloud computing both incumbent IT service providers and start-up companies were given the opportunity to take up new roles in this emerging market (Hogan et al. 2011; Leimeister et al. 2010). A role in this context stands for a “[…] set of similar services offered by market players to similar customers” (Böhm et al. 2010, p. 133). These roles primarily comprise providers of the three basic cloud service layers: Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS) (Marston et al. 2011; Mell and Grance 2011). Building upon these three consecutive and interrelated layers, a multitude of further roles, such as aggregators, integrators and market place operators, has evolved (Floerecke and Lehner 2015; Keller and König 2014). An organization can implement one or more of these roles at the same time (Floerecke and Lehner 2016). In the case of multiple roles an organization covers a so-called role cluster (Pelzl et al. 2013).

The evolution of the IT service market has led to an expansion and partial replacement of traditional value chains in IT service provision by network-like relations, forming a complex business ecosystem (Böhm et al. 2010; Leimeister et al. 2010). A business ecosystem represents, in general, a pertinent scope for systemic innovations, where different interrelated and interdependent companies cooperate to deliver customer solutions (Adner 2017; Peltoniemi and Vuori 2004). In line with the biological ecosystem perspective by Moore (1993), an organization cannot actively choose to be part of the cloud ecosystem or not. Each organization that provides or uses any cloud service or product automatically becomes part of it. The main trigger for the emergence of the cloud ecosystem was that cloud computing became an enabler for new, innovative business models, both on the provider’s and the customer’s side (Böhm et al. 2011; Iyer and Henderson 2012; Marston et al. 2011).

The cloud ecosystem has been continuously expanding with regard to the number of organizations, roles and service linkages, and has become increasingly complex and intransparent (Floerecke and Lehner 2015; Herzfeldt et al. 2018; Karunagaran et al. 2016). This missing transparency is considered as one of the main reasons why the world-wide adoption rate of cloud services has failed to meet the high expectations so far (Appelrath et al. 2014; Hentschel and Leyh 2016; Sunyaev and Schneider 2013). End customers namely face major challenges, in particular with respect to vendor selection and system integration (Hentschel et al. 2018; Karunagaran et al. 2016).

In order to understand the structure and composition of the cloud ecosystem, its comprising roles of market actors and relationships between them, and thus to improve the urgently needed transparency, the concept of ecosystem mapping (Bahari et al. 2015; Benedict 2018) can be used. So far, several attempts have already been undertaken to describe and visualize the cloud ecosystem by means of a descriptive model (e.g., Böhm et al. (2010); Hogan et al. (2011); Pelzl et al. (2013); Walterbusch et al. (2014)). However, the proposed models differ significantly with regard to descriptive elements (Floerecke and Lehner 2015). Beyond this background, Floerecke and Lehner (2016) developed a cloud ecosystem model based on a systematic analysis and synthesis of the previously published models. The resulting model, named as Passau Cloud Computing Ecosystem Model (PaCE model), comprises 26 roles for market actors, which are grouped into five categories – (1) client, (2) vendor, (3) hybrid role, (4) support and (5) environment – and entails the relationships between the roles (Floerecke and Lehner 2016). This has been realized under the guiding principles of the design science research paradigm, which consists of two phases: build and evaluate. Whereas building is the process of constructing an artefact for a specific purpose, evaluation is the process of determining how useful an artefact is with regard to certain predefined metrics (Hevner et al. 2004; March and Smith 1995). The building phase of the PaCE model was mainly based on the existing cloud ecosystem models and thus was dependent on their quality and correctness. In order to basically evaluate the PaCE model, an Internet search was conducted to check whether all the proposed roles are covered by at least one real market actor. However, the PaCE model has not yet been examined concerning its structural equivalence and completeness according to the real cloud business. Apart from this, the cloud ecosystem may have undergone structural and compositional changes since then due to market dynamics (Herzfeldt et al. 2018; Karunagaran et al. 2016). In addition to that, existing cloud ecosystem models lack a proof of their contribution to theory and practice (Floerecke and Lehner 2016).

Considering the identified research problems, the first goal is to evaluate the PaCE model regarding its structural equivalence and completeness and thus to improve and adjust it to the real cloud business. It can be assumed that there are prevailing role clusters in the cloud ecosystem, which have proven to be suitable in practice. Based on the PaCE model, this assumption is verified. The second goal is therefore to demonstrate the PaCE model’s usefulness by applying it as an instrument for the identification of role clusters, which are frequently covered by organizations, as well as isolated and non-isolated roles. This allows yet missing insights with respect to what roles respectively business models lead to synergy effects, are disjunctive, mutually dependent or even mutually excluding (Schwarz et al. 2017; Winterhalter et al. 2016). So far, Pelzl et al. (2013) have been the only scholars who tried to identify role clusters in the cloud computing domain. However, their study dates back to 2013 and was restricted to German cloud providers. Moreover, they applied a rudimentary value network model, including only twelve roles, and used a small sample size. Therefore, a new, more systematic examination of role clusters within the cloud ecosystem is indispensable. The third goal is the creation of a benchmark data set, which in future can be utilized for further investigations and in particular, for longitudinal studies of the cloud ecosystem.

To achieve the research goals, a quantitative cross-sectional analysis (Wilde and Hess 2007) of an adequate and representative subset of existing cloud providers was conducted. As no comprehensive list of cloud providers is available, a systematic internet search with the Google search engine served as instrument of data collection. All identified organizations were manually analysed and matched with the PaCE model’s current roles. When an organization’s activities were not covered by the PaCE model, the model was expanded or refined. Elements from the model that did not correspond to reality were removed. Based on the organization-role assignments the dominant role clusters, the isolated and non-isolated roles were determined by carrying out a two-step cluster analysis (Chiu et al. 2001). To facilitate future investigations and in particular longitudinal studies of the cloud ecosystem, the generated data set contains additional attributes characterizing the firms, among them the headquarter, size, legal form, offered deployment models and amount of cloud compared to total turnover.

Overall, the cloud business is characterized by a multitude of activities, which are not considered in common market overviews by market research institutes, such as Forrester or Gartner. They describe the cloud ecosystem coarsely by IaaS, PaaS and SaaS and a few further broad segments. The consequence is a lack of essential information, which is necessary to analyse and explain the market development as a whole but also the success of individual organizations. By means of a role-based description of the whole cloud ecosystem this gap shall be closed and a model for the analysis of relevant phenomena provided. The result is a categorization schema and thus an analytic theory (Gregor 2006), which enables, guides and supports future research in the field of cloud computing.

The rest of the paper is structured as follows: Section two provides the related work regarding cloud computing and business ecosystems. In addition, the initial version of the PaCE model is described. In the third section, the research design of this study is justified and depicted in detail. The results of the model evaluation as well as of the cluster analysis concerning dominant role clusters in the cloud ecosystem are presented in section four and discussed in section five. Section six provides contributions to research and practice, limitations and an outlook on future research.

Related work

Cloud computing research

Literature has come up with numerous definitions of cloud computing over the years, either with a stronger business or technical focus (Madhavaiah et al. 2012; Vaquero et al. 2008). The rather technically orientated definition by the National Institute of Standards and Technology (NIST) has become the standard both in science and practice in the meantime. According to NIST, “[c]loud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction” (Mell and Grance 2011, p. 2). Viewed from a higher level of abstraction, cloud computing is, comparable to car sharing in the automotive industry (Wolfenstetter et al. 2015), an example for the ongoing paradigm shift from selling products to providing integrated bundles of hardware, software and service components delivering value by their use (Floerecke et al. 2015; Sultan 2014). These bundles normally remain in the property of the provider and are shared by various customers, whereby both economic and ecological benefits are associated (Boehm and Thomas 2013). A closer look shows that the technology, which enables cloud services, is not new. Cloud computing is an IT operation model that combines a set of existing technologies and concepts such as virtualization, autonomic computing, grid computing and usage-based pricing (Foster et al. 2008; Zhang et al. 2010). Despite its low degree of novelty in technological terms, cloud computing has radically transformed the way IT resources and applications are implemented, deployed, provided, managed and used (Armbrust et al. 2010; Marston et al. 2011). Several scholars (e.g., Böhm et al. (2011); Iyer and Henderson (2010); Leimeister et al. (2010)) therefore consider cloud computing as a co-evolution of computing technology and business models. In the meantime, cloud computing is regarded as a foundational enabler for digital transformation initiatives currently taking place in many firms across all industries (Benlian et al. 2018).

The relevant literature distinguishes between three fundamental cloud service models: Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS). IaaS supplies infrastructural resources (CPU, storage and network). PaaS facilitates the development and deployment of applications based on a software development environment with programming languages, libraries and tools. SaaS refers to direct usable software applications. These three cloud service models form layers that are interrelated and build upon each other (Armbrust et al. 2010; Mell and Grance 2011). Cloud services on each service layer can be delivered via four main deployment models, namely as public, private, hybrid and community cloud (Marston et al. 2011; Zhang et al. 2010). Besides, there are intermediate forms, e.g., a virtual private cloud, which is a mixture of a private and a public cloud (Dillon et al. 2010). The deployment models particularly differ in their degree of operational isolation, regarding access to a specific cloud service and the physical location of the underlying hardware servers. The general key characteristics of cloud services are on-demand self-service, broad network access, resource pooling, rapid elasticity and service measurement. These characteristics distinguish cloud services from on-premise IT solutions (Marston et al. 2011; Mell and Grance 2011).

Research on cloud computing has rather focused on the technical aspects so far. Less consideration has been given to the major changes within the business perspective of IT provisioning (Herzfeldt et al. 2019; Wang et al. 2016). According to a recent literature review by Senyo et al. (2018), the most frequently addressed business issues in cloud research are adoption, cost, trust and privacy, legislation and ethics. Dealing with (parts of) business models and the business ecosystem have also been becoming more popular over the last years (Floerecke and Lehner 2018b; Herzfeldt et al. 2018).

Business ecosystem research

The discussion on business ecosystems was initiated by Moore (1993, p. 76) by taking biological ecosystems as a metaphor to describe multi-organizational networks in which “[…] companies coevolve capabilities around a new innovation: they work co-operatively and competitively to support new products, satisfy customer needs, and eventually incorporate the next round of innovations”. Subsequently, several scholars (e.g., Anggraeni et al. 2007; Mäkinen and Dedehayir 2012; Rong et al. 2018) picked up the idea of business ecosystems and studied it from various perspectives using different research methods. This is the main reason why no precise and generally accepted definition of business ecosystems has been established within the scientific literature so far (Adner 2017; Nischak et al. 2017). Nevertheless, as collaboration between organizations has increasingly become an imperative in a globalised world and many industries are facing radical changes due to fast growing digital technologies, business ecosystems as a research topic have gained significant importance in the last years (Jacobides et al. 2018; Järvi and Kortelainen 2017).

There is a broad consensus among scholars that a business ecosystem contains a multitude of loosely coupled entities acting as partners, subcontractors, complementors, customers or competitors, building the inner part of the ecosystem. Further entities or organizations form the periphery (environment) of the ecosystem (Iansiti and Levien 2004; Moore 2006). The environment of an ecosystem, which can be characterized by legal, political, cultural and social forces, stands in a reciprocal relationship with the inner part of the ecosystem (Adner and Kapoor 2010; Moore 1993; Peltoniemi 2006). Reciprocal means here, on the one hand, that the environment facilitates and moderates the interplay between organizations being located in the ecosystem, e.g., by defining laws and standards. On the other hand, the environment is reacting to and influenced by the actions and measures of participating organizations and is forced to adjust regularly. Overall, a business ecosystem represents a pertinent scope for systemic innovations, where different interrelated and interdependent companies cooperate to deliver customer solutions (Adner 2017; Peltoniemi and Vuori 2004). Business ecosystems are considered at different levels of abstraction, e.g., at company (e.g., Apple’s ecosystem), sector (e.g., cloud ecosystem) or regional level (e.g., Silicon Valley) (Adner 2017; Jacobides et al. 2018).

Beside business ecosystems, several further types of ecosystems, such as digital, innovation, platform, service, software or technology ecosystems, can be found in the information systems literature (Benedict 2018; Mäkinen and Dedehayir 2012; Peltoniemi and Vuori 2004). They are characterized by slightly different features and a different perspective. However, the boundaries between them are fluid. This is why the terms are often used synonymously by scholars (Järvi and Kortelainen 2017; Nischak et al. 2017). The focus of business ecosystems in the narrow sense is to establish economic relationships and to foster cooperation between the various participants using a certain technology. Existing research focuses on actors (roles), relationships between them as well as (economic) effects (Benedict 2018).

An organization cannot actively choose to be part of a business ecosystem or not. Each organization that provides or utilizes the specific technology automatically becomes part of it. It is thus not required to formally register as a member (Nischak et al. 2017). This understanding is in line with the biological perspective on multi-organizational networks by Moore (1993). As organizations continuously enter and leave the business ecosystem and relationships are formed, renewed and dissolved, a business ecosystem hence is a highly dynamic system (Basole et al. 2015).

In accordance with biological ecosystems, business ecosystems can be characterized by high complexity, interdependence, co-opetition and coevolution (Iansiti and Levien 2004; Moore 2006; Peltoniemi and Vuori 2004). Moreover, business ecosystems are defined as self-organized: No single organization can control the entire ecosystem, but, nevertheless, certain major companies may dominate it (Moore 1993; Rong et al. 2018). Such dominant companies are named “keystone players” (Iansiti and Levien 2004) or “lead firms” (Williamson and De Meyer 2012). The remaining firms are “niche players” (Iansiti and Levien 2004; Mäkinen and Dedehayir 2012), which form the majority of a business ecosystem. Overall, a business ecosystem allows companies to create value, which cannot be afforded by a single player acting outside of this system (Jacobides et al. 2018). The drawback is, however, that companies become mutually dependent, so that failures of one can impose failures to the others (Adner and Kapoor 2010; Peltoniemi 2006).

Many organizations being part of a business ecosystem offer a broad range of products and services and thus are characterized by significantly different profiles (Böhm et al. 2010; Floerecke and Lehner 2016). It is hence beneficial and necessary to use a role concept for capturing service portfolios. According to Böhm et al. (2010, p. 133), a role is a “[…] set of similar services offered by market players to similar customers”. In order to understand the relationships and interdependencies between the roles in a business ecosystem, the ecosystem mapping has often been utilized for visualization (Bahari et al. 2015; Benedict 2018). Within such role-based ecosystem models, each organization is described as an independent agent who can take one or multiple roles (many-to-many relationship) (Floerecke and Lehner 2015; Tian et al. 2008). Business ecosystem models can be classified as descriptive models and afterimages, defined by structural similarity and completeness with the ecosystem (Lehner 1995).

Role-based business ecosystem models have been proposed for a multitude of domains, e.g., blockchain (Riasanow et al. 2018a), internet of things (Papert and Pflaum 2017) and FinTech (Riasanow et al. 2018b). Despite the existence of business ecosystem models in the relevant literature the added value for research and practice is not so clear. Commonly, authors highlight that ecosystem models create transparency and allow organizations to reflect their roles or allocate to a role cluster (Floerecke and Lehner 2016). This paper aims to address this shortcoming by demonstrating the applicability and the practical use of ecosystem models in the area of the cloud computing.

Relevant literature provides only little guidance for the modelling procedure and the identification of entities forming a business ecosystem (Basole et al. 2016; Benedict 2018). The contribution by Adner (2017) is an exception in this respect. According to his recommendations, the following elements should be part of an ecosystem model: (1) roles of market actors, (2) their position within the ecosystem and (3) their relations (e.g., services, materials, information, influence and funds). Beyond that, Adner (2017, p. 43) suggests to include activities, which he defines as “[…] discrete actions to be undertaken in order for the value proposition to materialize”. This aspect is a constitutive component of business models, beside key resources, revenue streams or cost structure (Wirtz et al. 2016). From an ecosystem perspective, a business model can be seen as a detailed specification of how ecosystem roles are realized by organizations in practice (Floerecke and Lehner 2018b). Nevertheless, the activity element is not considered in this study because this comes along with the model use and requires a solid and evaluated model.

Research on cloud ecosystem models

Applying the business ecosystem concept to the cloud computing context, each organization can be characterized by the set of roles related to the services offered to their customers (Floerecke and Lehner 2016). The cloud ecosystem jointly enables services in a loosely coupled network through service refinement and resource integration (Iyer and Henderson 2012; Leimeister et al. 2010). This is particularly facilitated by the multi-layered architecture of cloud services (IaaS, PaaS, SaaS), but also by the increasingly modular structure within each layer, the growing degree of standardization and on-demand self-service (Floerecke and Lehner 2016). As a consequence, the traditional distinction between customers and providers is blurred in the cloud ecosystem (Floerecke and Lehner 2016). This is in line with the axioms of the service-dominant logic (SDL): Both are considered in a more generic sense as actors in a system of actors, co-creating value through resource integration and service provision (Lusch and Nambisan 2015). From the end customer’s viewpoint, single providers are more and more replaced by service bundles stemming from different vendors (Böhm et al. 2009; Iyer and Henderson 2012). It is therefore not uncommon that end customers do not know what component of the bundle is provided by which provider (Floerecke and Lehner 2016).

The cloud ecosystem has been continuously expanding with respect to the number of organizations, roles and service linkages, and thus has become increasingly complex and opaque (Floerecke and Lehner 2015; Herzfeldt et al. 2018; Keller and König 2014). This missing transparency is seen as one of the main reasons why the world-wide adoption rate of cloud services has failed to meet the high expectations so far (Appelrath et al. 2014; Hentschel and Leyh 2016; Sunyaev and Schneider 2013). End customers face major challenges, in particular with regard to vendor selection and system integration (Hentschel et al. 2018; Karunagaran et al. 2016). This circumstance explains the growing relevance of cloud service brokers, such as aggregators, integrators and market place operators, and consulting firms within the cloud ecosystem (Fowley et al. 2018). But also providers are facing challenges: they are continuously threatened by new market entrants and confronted with price pressure (Herzfeldt et al. 2018; Trenz et al. 2019). This is a challenge for providers of IaaS in particular, because basic IaaS services – without extensions such as managed or platform services – have become a commodity, similar to electricity and gas, over the recent years (Floerecke and Lehner 2018a, 2019a). Commodities are products and services that are highly standardized and to a large extent equivalent with respect to functionality and quality, irrespective of the specific vendor (Bruhn 2011). Therefore, the price of IaaS services has turned into the central decision criterion for end customers (Floerecke and Lehner 2018b, 2019b).

To gain an overview and improve transparency in this field, several attempts (e.g., Böhm et al. (2010); Hogan et al. (2011); Pelzl et al. (2013); Walterbusch et al. (2014)) have been made to formally describe the cloud computing ecosystem by means of a model. However, the proposed models show considerable differences with regard to their constructs (e.g., number of roles, types of relationships and scope of the model) and their form of presentation (e.g., non-standardized graph-based and process models) (Floerecke and Lehner 2016). Beyond this background, Floerecke and Lehner (2016) developed a revised cloud-specific ecosystem model based on a systematic analysis (for the detailed results the reader is referred to Floerecke and Lehner (2016)) and synthesis of previously published models following the design science research paradigm. The comparative analysis of the existing cloud ecosystem models revealed that only a few of them have been actually scientifically evaluated. If applicable, the various authors conducted interviews with domain experts, allocated single market actors to their suggested model roles, combined both approaches, tested a limited number of hypothetical business scenarios or applied use cases (Floerecke and Lehner 2016). To evaluate the PaCE model, an Internet search was conducted to simply check whether all the proposed roles are covered by at least one real market actor. An extensive model evaluation, which is an indispensable part of the design science research paradigm (Hevner et al. 2004; March and Smith 1995), is still outstanding.

By linking concrete organizations to their corresponding ecosystem roles, it is possible to identify prevailing role clusters. As each ecosystem role, in the case of a profit-seeking market actor, is corresponding with (at least) one business model (Floerecke and Lehner 2018b; Labes et al. 2013), the identification of dominant role clusters allows insights regarding which roles respectively business models can lead to synergy effects (Winterhalter et al. 2016). On the flip side, disjunctive or mutual excluding ecosystem roles might be unveiled (Schwarz et al. 2017). Research on role clusters in the cloud ecosystem is, however, nascent. So far, Pelzl et al. (2013) have been the only scholars who tried to identify role clusters. Their study identified ten role clusters among German cloud providers. As their study was already published in 2013 and due to the dynamic character of the cloud ecosystem, it can be at least partly considered as obsolete by today. In addition, the authors applied only a rudimentary value network model and used only a small sample size comprising 80 cloud providers. Therefore, a new, more systematic investigation of role clusters is imperative.

The Passau cloud computing ecosystem model (PaCE model)

Figure 1 shows the latest version of the PaCE model (Floerecke and Lehner 2016), which is to be evaluated and finalised in this paper. It comprises 26 roles for market actors, which are grouped into five categories – (1) client (c), (2) vendor (vend), (3) hybrid role (hyb), (4) support (sup) and (5) environment (env) – and entails the relations between the roles.

Fig. 1
figure 1

The Passau Cloud Computing Ecosystem Model (PaCE model)

A vendor provides basic services and/or products for his customers. In line with the central principle of service refinement and resource integration towards the end customer (c1), one organization can both take the role of a customer and a vendor. This role category is labelled as hybrid role and illustrated by split nodes. The end customer (c1), who is the only representative of the client category, does not deliver services to any other role. As starting point of the service request and end point of the service delivery, he actually pays for all value adding activities within the ecosystem. Supporters comprise roles offering non-technological services, such as certification, training and consulting, and are fundamental to enable optimum use of cloud services by the end customer (c1). The category environment contains roles in the periphery of the ecosystem. Here, the actors mostly are public administrations or non-governmental organizations (NGOs). The labelling of the directed edges explains what main services are assigned to an individual role. Overall, the PaCE model is an attempt to describe and visualize the cloud ecosystem, but it has not yet been evaluated with respect to its structural equivalence and completeness.

Research design

Research methodology

The intention of this research is explorative, attempting to generate a novel analytic theory according to Gregor (2006). Analytic theories are the most basic type of theory and particularly analyse “what is”. Analytic theories are thus appropriate when only little is known about a specific field or phenomenon. Variants of this theory type are referred to as new and revised frameworks, classification schemes and taxonomies. Such systematics provide a clear description of the uniformities of classes of phenomena, whereby they are an important prerequisite for targeted research (Gregor 2006). This is especially needed in the cloud computing domain where the structure and composition of the ecosystem is widely unknown and business actors are commonly investigated in a general and undifferentiated manner hampering research progress. Evaluation criteria of such artefacts are conformity with reality, completeness, clarity and in particular usefulness (Gregor 2006). To summarize, the analytic theory corresponds to the goals of this paper: The test of the PaCE model regarding its structural equivalence and completeness and the demonstration of its usefulness by an exemplary application scenario.

To achieve the research goals, the authors followed the design science paradigm. Design science creates and evaluates IT artefacts intended to solve identified organizational problems (Hevner et al. 2004; March and Smith 1995). The research process consists of two phases: build and evaluate. Building means constructing an artefact for a specific purpose. Evaluation is the process of determining how well the artefact performs regarding certain predefined metrics (Hevner et al. 2004; March and Smith 1995). The improvement and finalisation of the PaCE model can be assigned to the build phase and to some extent to the evaluation phase. The demonstration of the PaCE model’s usefulness is part of the evaluation phase. For both phases, a quantitative cross-sectional analysis (Wilde and Hess 2007) was conducted. Cross-sectional analyses usually comprise a single data collection along several organizations, which are subsequently quantitatively or qualitatively encoded and analysed. The result is a cross-sectional image of the sample organizations, which permits to draw inferences on the population (Wilde and Hess 2007).

As no comprehensive list of cloud providers is available, a systematic internet search with the Google search engine served as data collection instrument. In general, only organisations were considered that provide or use any cloud service or product, are an immediate supplier of a cloud provider, offer non-technological services based on cloud services or belong to the environment. The identified organizations were manually analysed and matched with the PaCE model’s current roles. When an organization’s value propositions and activities could not be represented by the PaCE model, the model was expanded or refined. Elements from the model that were not corresponding to reality were removed. Overall, the focus was on the ecosystem roles, whereas the relationships between the ecosystem roles were not part of the data collection and thus not subject to the model’s evaluation. Additionally, insights from 21 expert interviews with representatives of 17 cloud providers were used as source of information. These interviews originally were part of another research project (Floerecke 2018; Floerecke and Lehner 2018b).

Based on the generated list of cloud providers and their role assignments, the dominant role clusters were determined by a cluster analysis. The goal was to group organizations with a similar combination of ecosystem roles. A cluster analysis in general is an exploratory method to determine unknown correlations in a pool of data and to group similar records into clusters. The resulting clusters should necessarily be characterized by internal homogeneity and external heterogeneity concerning the relevant attributes (here: ecosystem roles) (Backhaus et al. 2016; Kaufman and Rousseeuw 2009).

For the cluster analysis the two-step clustering method, developed by Chiu et al. (2001), was chosen. This method is particularly suitable for categorical variables and large data sets. In addition, the optimal number of clusters can be determined automatically by the algorithm. These three features address the weaknesses of more widely spread methods such as k-means or agglomerative hierarchical clustering (Bacher et al. 2004; Chiu et al. 2001; Sarstedt and Mooi 2019) and explain the method choice.

Data collection

A systematic internet search, consisting of two phases (bottom-up and top-down search) served as instrument of data collection. The two-phase search approach was selected as no comprehensive list of cloud providers could be found. Regardless of the division in two phases, the data collection was split in two time intervals (July 2018 and November/December 2018), carried out by two different researchers. The reason for this time gap was a short-term lack of human resources at the research department. However, this provided the following advantage: Before starting the second round of data collection, the organization-role assignments were double-checked and corrected in the case of erroneous assignments. In the view that an ecosystem role is defined as set of similar services, the central criterion of the assessment was the offered value propositions (the core of a business model). More precisely, the goal was to partition the value propositions of the various companies according to recurring atomic classes of products and/or services.

In the following, the data collection process is described in detail: Based on the descriptions of the original ecosystem roles (Floerecke and Lehner 2016), an iterative search process was executed in order to find organizations representing the respective role (bottom-up search). For this purpose, a list of keywords was derived from each role description. The list was then processed role-by-role using Google’s search engine. As the search process revealed further keywords for certain roles, the list was successively extended, and the new keywords served as additional input for the ongoing search. The selection of organizations was completely independent from further characteristics, such as legal form or size. The actual decision whether an organization could be assigned to the respective ecosystem role was made based on publicly available information on the organization’s website. More precisely, the organization’s description about itself, its offered services and its main activities were examined using the keywords firstly and compared with the full role description secondly. In order to be assigned to the role, an organization had to meet the core characteristics of that role. To give an example: To be considered as a managed service provider (hyb8), a company not only had to offer any cloud service extended by an additional service on top of it, but it was also obliged to the arrangement of the contractual relationship with the customer on the basis of service level agreements. If an organization did not meet the respective ecosystem role fully, it was evaluated whether the organization fits to any another role or not. If not, the role was created and an appropriate description was derived from the website and existing literature, where possible. The newly constructed role was then integrated into the search process. The bottom-up search was terminated when either no additional organization was found or the predefined maximum number of 100 organizations was reached. The second termination condition was defined because of the assumption that the cloud ecosystem is extremely large, leading to a nearly infinite search process. The specific amount of 100 was considered suitable, on the one hand, as it is reachable at a reasonable expense and, on the other hand, as it is regarded sufficient to generate a widely representative subset of the cloud ecosystem. At the end, 758 organizations were identified and assigned to the roles of the PaCE model.

Thereafter, all the 758 organizations were examined in detail, role-by-role, checking whether they hold further roles by screening the related websites systematically (top-down search). This helped to unveil ecosystem roles that had not yet been part of the PaCE model and was the prerequisite for the subsequent meaningful identification of role clusters. By means of this additional top-down search, the number of organization-role assignments considerably increased and reached the final amount of 2294.

Besides organization-role assignments, additional information related to headquarters, size (small (<50 employees), medium (<250 employees) or large (> = 250 employees) – according to the EU Recommendation 2003/361/EC), legal form, deployment models and amount of cloud turnover was collected. To retrieve all this information, financial and business reports, but also the websites of the organizations and Crunchbase (a platform for business information about companies) were used.

Descriptive statistics

Representatives for all roles of the final PaCE model could be found. The number of organizations for each role varies strongly (Fig. 2), despite the comparatively low maximal target value of 100 organizations per ecosystem role in the data collection. The mean value is 76.47 with a standard deviation of approximately 48. On closer inspection, however, it turns out that the variance is rather small when the sub-roles are aggregated to their respective meta-role. The role with the highest number of representatives is consultant (sup5), while infrastructure market place operator (hyb7) is represented by the lowest number. The high standard deviation naturally has an influence on the outcome of the subsequent cluster analysis. In order to reach a stable result, data, which would have led to distortions in the clustering results, that means specific variables and extreme outliers, were eliminated step by step (see section “cluster analysis” for details). Although there is no reliable information available on the population of the presumably very large cloud ecosystem, this sample (available upon request from the authors) is considered as a first and important step on the way to a representative subset.

Fig. 2
figure 2

Organization-role assignments

Table 1 shows the distribution according to the organization size in the sample. Large organizations make up the majority with 57.3%. It must be considered that for 93 organizations no data on the size could be raised.

Table 1 Organization size

Most of the organizations stem from USA, Germany and Great Britain (Table 2). Remarkably, India is relatively highly represented in the sample. Overall, the sample clearly confirms that the cloud ecosystem is a global business as each continent is involved and the identified organizations are spread over 52 countries.

Table 2 The five main headquarter locations

Cluster analysis

In a pre-processing step, the collected organization-role assignments were transformed into binary variables (1 = role exists, 0 = role exists not). This resulted in a matrix, consisting of 31 columns (organization name and 30 roles) and 759 rows (column headings and identified 758 organizations).

The two-step clustering method comprises the two following steps: First, the algorithm scans the records one by one and decides based on a distance criterion if the current record should merge with a previously formed pre-cluster or form a new pre-cluster. The log-likelihood distance measure is suitable for categorical variables. Its assumption is that the variables are independent and have a multinomial distribution (Bacher et al. 2004; Sarstedt and Mooi 2019; Satish and Bharadhwaj 2010). This is seldom the case in practice, but studies (e.g., Garson (2014); IBM (2018)) have demonstrated that the algorithm is very robust against violations of this assumption. Fisher’s exact test showed that most of the variables are independent at a 5% level of significance. According to binomial tests, it can be stated that the variables follow a binomial distribution (dichotomous variables; sample of a presumably very large population), which constitutes a specific form of a multinomial distribution (Agresti 2018). In the second step, the pre-clusters resulting from the first step serve as input and are transformed into different solutions with various numbers of clusters. The optimal number of clusters is determined automatically using the Schwarz Bayesian Criterion (Bacher et al. 2004; Satish and Bharadhwaj 2010). Evaluation studies (e.g., Bacher et al. (2004); Biernacki et al. (2000); Chiu et al. (2001)) revealed that this approach is reliable.

In the following, the steps used to identify the role clusters are described and explained in detail:

  1. Step 1:

    The complete data set was imported into IBM SPSS Statistics and the two-step clustering algorithm was executed. This, however, did not lead to meaningful results. Following the standard approach in cluster analyses (Backhaus et al. 2016; Wiedenbeck and Züll 2010), the conditions of the cluster analysis, meaning the considered set of variables, were modified, and data outliers (specific data records) were eliminated in the next steps in different ways. A similar concrete approach can be found, e.g., in Okazaki (2006) and Rundle-Thiele et al. (2015).

  2. Step 2:

    The roles of the environment category (standard developer (env1), legislator (env2), research institute (env3), market analyst (env4) and open source community (env5)) were excluded from the analysis because it already became apparent during the data collection process that the environment roles form an exception: Many of them are independent from the other roles and corresponding organizations often are public administrations or non-governmental organizations (NGOs). Still, this modification did not lead to satisfactory results.

  3. Step 3:

    Starting with the initial data set, all roles that exist isolated (data centre developer (vend2), hardware developer (vend3), standard developer (env1), legislator (env2) and research institute (env3)) were excluded. Then the corresponding organizations, which did not fulfil any further role, were also removed. The remaining 521 organizations led to five clusters. However, the clusters were imprecise and wide-ranging.

  4. Step 4:

    Among the remaining records, 118 organizations showed only one role assignment. They were omitted as well because according to Wiedenbeck and Züll (2010), such extreme outliers (as all remaining organizations had more than one role assignment) can have a negative effect on the cluster analysis results. As a result, 403 of 758 organizations and 1839 of 2294 organization-role assignments were left. Thus, the identified five clusters became much more precise.

Closer inspection of the 355 excluded organizations shows that 321 from them have only hold one ecosystem role. These are mainly (75%) the roles that have been identified as isolated roles. The 34 remaining organizations hold two or three roles. These role combinations, however, differ significantly from each other and do not resemble any of the five role clusters. It can therefore be concluded that no relevant information was lost because of the exclusion of these organizations from the cluster analysis. As the final solution may depend on the order of the organizations and the roles in the data set, the cluster solution was tested with several different random orders. For all of them, the final solution was identical, which underpinned its stability and validity. Even though the k-means-algorithm is not applicable for dichotomous variables (Mann and Kaur 2013), it led to very similar results.

To assess the clustering solution’s overall goodness-of-fit, the silhouette measure of cohesion and separation (Kaufman and Rousseeuw 2009) was used. This criterion is based on the average distances between the objects and is represented by a value between −1 and +1. A value of −1 reflects a very poor and a value of +1 a perfect clustering. In this case, the silhouette measure reached a value of 0.24, indicating a fair cluster quality. It must be noted that the allocation of organizations to the role clusters is based on similarity measures. This means, an organization that has been allocated to a specific role cluster must not necessarily match all the roles of the cluster’s role set and can even hold additional roles. It is not surprising that in case of now 225 (30 roles minus the 5 removed isolated roles) possible role combinations for each organization only a part of the organizations exactly shows the same role combination. This is in line with the prevailing assumption that the organizations within the cloud ecosystem and their business model characteristics are quite heterogeneous (Floerecke 2018; Floerecke and Lehner 2018b). This heterogeneity explains the comparatively low silhouette measure value.

Results

Finalisation of the PaCE model

The evaluated and finalised version of the PaCE model, shown in Fig. 3, comprises 31 roles (not counting meta-roles) and includes the relationships between the roles. To illustrate the relationship between meta- und sub-roles, generalizations from UML class diagrams are used. A generalization represents an “is a” relationship, whereby a specific element inherits the features of the more general element (OMG 2011). In the PaCE model sub-roles were abstracted to a meta-role, and vice versa, when there were despite clear differences central common elements with respect to the value propositions and the implemented business model. The fundamental categorization of the roles remains unchanged. A so-called infrastructure area has been added. This contains roles of the role category vendor focusing on infrastructural products and/or services.

Fig. 3
figure 3

Final version of the PaCE model

Based on the PaCE model’s check of structural equivalence and completeness according to the cloud business the following model modifications were made (see Fig. 6 in the appendix for a graphical summary of the modifications). Roles have been

  1. (1)

    newly added (application reengineering/management provider (hyb4), data centre developer (vend2), managed service provider (hyb8), market analyst (env4) and open source community (env5)),

  2. (2)

    removed (data provider; reason: excessive overlap with application provider (hyb3)),

  3. (3)

    broken down into sub-roles (integrator into transition services provider (hyb9) and IT landscape/process integrator (hyb10)),

  4. (4)

    abstracted to a meta-role (second-party (sup1) and third-party auditor/certification institute (sup2) to auditor and the three subtypes of market place operators (application (hyb5), platform (hyb6) and infrastructure market place operator (hyb7) to market place operator) and.

  5. (5)

    renamed (help-desk to support provider (sup4), service bundler to service bundler/multi-cloud provider (hyb12), certification authority to third-party-auditor/certification institute (sup2) and auditor to second-party-auditor (sup1)).

The newly added roles are briefly explained below by their main value propositions, activities and attributes. Although the information given in the PaCE model (Fig. 3) is considered sufficient to understand the remaining roles, their description can be found in Floerecke and Lehner (2016).

  • An application reengineering/management provider (hyb4) transforms or rebuilds a traditional on-premise application to make it cloud-ready, e.g., in the form of microservices. This is required for several scenarios, as on-premise software is often monolithic and not per se cloud-ready (Floerecke 2018; Floerecke and Lehner 2018b).

  • A data centre developer (vend2) plans and constructs data centres (the building, safety precautions, cooling infrastructure, etc.), where the physical hardware of cloud services is located and operated.

  • A managed service provider (hyb8) offers managed services on the basis of IaaS, PaaS and/or SaaS services. Managed services are cloud services extended by additional components, such as monitoring, update, security or backup services, based on clearly defined service level agreements. The scope can range from individual items to a complete IT outsourcing (Floerecke and Lehner 2019a, 2019b).

  • A transition services provider (hyb9) helps customers to move existing applications and infrastructure into the cloud. Only in rare cases, the migration of the systems can be managed without external help. During the transitional period, extensive consulting and customizing support is necessary to get a firm cloud-ready (Böhm et al. 2010; Floerecke and Lehner 2018a).

  • An IT landscape/process integrator (hyb10) integrates cloud services into the existing customer’s IT landscape and business processes, e.g., by developing interfaces to on-premise applications (Herzfeldt et al. 2018; Walterbusch et al. 2014).

  • A market analyst (env4) offers market research reports, forecasts on the development of IT, evaluations of service offerings and studies of customer needs.

  • An open source community (env5) develops and shares applications, platforms, experiences and best practices. Organizations being actively involved in an open source community can benefit from the accumulated know-how (Floerecke and Lehner 2018b; Ismaeel et al. 2015).

Use of the PaCE model for the investigation of the cloud ecosystem

Role clusters

The cluster analysis unveiled five role clusters (Fig. 4). 63 organizations (out of 403) could not be clearly assigned to one of the clusters. In general, an organization associated to a cluster does not have to match all indicated roles but has the shortest distance to the respective cluster centre.

Fig. 4
figure 4

Identified role clusters

Cluster 1 – Service providers supporting new cloud adopters:

The first cluster includes the roles consultant (sup5), managed service provider (hyb8), application reengineering/management provider (hyb4) and transition services provider (hyb9). The composition of this cluster is heterogeneous, and it comprises organizations of all sizes from 14 different countries. Typical representatives are Booz Allen Hamilton Inc., Clearscale LLC, Cognizant Technology Solutions Corporation and Deloitte Touche Tohmatsu Limited. Only 20 of the 92 organizations offer core cloud services (IaaS, PaaS, SaaS). Instead, they focus on advising customers in the selection of cloud services with respect to their requirements, without favouring any specific provider in advance. Moreover, they transform traditional on-premise software in a cloud-based version (application reengineering/management provider (hyb4)) and assist in the cloud migration process (transition services provider (hyb9)). In addition, they offer managed services, so customers do not have to concern themselves with cloud service operations and can therefore concentrate on their core competencies (managed service provider (hyb8)). To conclude, the organizations of this cluster focus on firms that want to use cloud services for the first time.

Cluster 2 – The big players of cloud business:

The second cluster comprises by far the largest number of roles. It aggregates infrastructure provider (hyb1), platform provider (hyb2), application provider (hyb3), transition services provider (hyb9), physical infrastructure provider (vend4), independent application software provider (vend5), almost all support roles (third party auditor/certification institute (sup2), training provider (sup3), support provider (sup4) and consultant (sup5)) and open source community (env5). Large and prominent representatives of the IT industry, like Alibaba Group Holding Limited, Amazon Web Services Inc., Google LLC, IBM Corporation, Microsoft Corporation and SAP SE, belong to this cluster. The majority is from USA (68.97%). The organizations differ strongly according to their history and were already very successful in other business areas (e.g., online shop, search engine, on-premise software and hardware) before their engagement in cloud computing. They possess a lot of financial and human resources and offer a broad product and service portfolio, so that the end customer (c1) receives a full service from one source.

Cluster 3 – Telecommunications companies with complementary cloud offerings:

The third cluster comprises the roles infrastructure provider (hyb1), physical infrastructure provider (vend4) and network operator (vend1). More than 70% of this cluster’s participants are large companies. The locations of their headquarters are spread over more than 30 countries. These organizations provide basic computing, storage and network resources for other cloud service providers, such as platform (hyb2) or application providers (hyb3), or for direct use by end customers (c1). To offer IaaS, these companies use server capacities from their own data centres (physical infrastructure provider (vend4)). In addition, organizations in this cluster offer internet access (network operator (vend1)), oftentimes specifically for corporate customers. Known representatives of this cluster are Vodafone Group Plc, Verizon Communications Inc., 1&1 Ionos GmbH, China Telecom Ltd., Telecom Italia S.p.A. and Telefónica S.A. Several of them are former state-owned companies. To summarize, this cluster contains communications companies, which have expanded their portfolio by basic cloud services.

Cluster 4 – SaaS developers and providers with support services:

The fourth cluster includes the roles independent application software vendor (vend5), application provider (hyb3) and support provider (sup4). Similar to the second cluster, mainly large companies from the United States were assigned to this cluster. Examples are Bill.com Inc., Dropbox Inc., Slack Technologies Inc., Zoom Video Communications Inc. and Zuora Inc. Many of them are comparatively young companies that offer only one application and therefore are highly specialized. These companies usually offer their applications as cloud-only which means, they generate all their revenue through cloud services. Zoom Video Communications Inc. is a good example for this cluster: It offers video conferencing software developed by the company itself (independent application software vendor (vend5)), which can be obtained via the cloud (application provider (hyb3)). In case of problems, their support centre is available online or by phone (support provider (sup4)).

Cluster 5 – Certification and training providers:

The fifth cluster has the lowest number of roles. The organizations assigned to this cluster assume the roles of third-party-auditor/certification institute (sup2) and training provider (sup3). This means, they analyse existing cloud services and award certificates, if certain criteria are met (third-party-auditor/certification institute (sup2)). In order to develop the quality criteria to evaluate cloud services, extensive knowledge is necessary. It therefore stands to reason that the accumulated know-how is not only used for cloud service evaluation, but also to offer training courses (training provider (sup3)) both for providers of each kind and end customers (c1). The Cloud Credential Council, EuroCloud Deutschland e.V., Cloud Security Alliance and TÜV Rheinland AG are examples of organizations belonging to this cluster. The size of this cluster’s organizations is quite heterogeneous and most of them are located in the United States.

The five clusters were confirmed in a separate cluster analysis, where the data set was modified such that the sub-roles were substituted by meta-roles (e.g., transition services provider (hyb9) and IT landscape/process integrator (hyb10) to integrators). Only the cluster of the big players of cloud business (cluster 2) was expanded by the market place operators (application (hyb5), platform (hyb6) and infrastructure market place operator (hyb7)).

Overall, it is remarkable that the three cloud ecosystem’s core roles – infrastructure (hyb1), platform (hyb2) and application provider (hyb3) – do not occur isolated. Additional services (i.e., instantiations of roles) are obviously necessary to succeed in the cloud ecosystem. There must be a reason why organizations particularly cover these five role clusters. Beyond this background, it can be assumed that the five role clusters are success-relevant role combinations and organizations in the clusters are performing better than organizations with a different profile.

Isolated and non-isolated roles

In the course of the cluster analysis, it was examined which roles can be predominantly found as single role and which roles mostly occur in combination with others. To this end, the number of organizations that take up only a single role was divided by the total number of organizations identified for that role (Table 3).

Table 3 Roles and proportion of organizations that solely fulfil this specific role

In total, a considerable proportion of organizations solely take up one role (321 of 758). Nearly half or more of the organizations assigned to the vendor roles data centre developer (vend2) and hardware developer (vend3) as well as the environment roles standard developer (env1), legislator (env2) and research institute (env3) exclusively hold this role. 303 of the 758 organizations hold at least one of these five roles. More precisely, 237 of these 303 organizations exclusively hold one of these five roles and 5 organizations exclusively two roles. Only 28 of the organizations, which were assigned to the five role clusters, fulfil also one of the isolated roles. On the other hand, there are many roles that are exclusively or very often fulfilled in combination with others and thus, never or rarely found isolated (particularly the last two rows of Table 3).

Table 4 provides further details regarding the five isolated roles. Limiting on the roles of the core ecosystem (ecosystem without environment and the roles at the edge of the inner ecosystem (data centre developer (vend2) and hardware developer (vend3))) shows that the majority of organizations (here: 384 of 485; 79%) holds multiple roles.

Table 4 Summary of the results regarding the five isolated roles

The results of the analysis with regard to isolated and non-isolated roles can be summarized as follows: The roles standard developer (env1), hardware developer (vend3), research institute (env3), data centre developer (vend2) and legislator (env2) are commonly occupied in isolation from the other roles. The rest of the roles form the core of the cloud ecosystem and these roles typically are found in combinations. Figure 5 summarizes the results of the whole data analysis.

Fig. 5
figure 5

Summary of the results of the data analysis

Robustness test

In order to check the robustness of the results, the variables size, headquarter and cloud turnover were included in the cluster analysis.

Clustering the data based on the size of organizations revealed that large companies take up the clusters 1–4, whereas small and medium-sized organizations are often assigned to cluster 5. Moreover, small and medium-sized organizations hold similar roles as in clusters 1 (without managed service provider (hyb8)) and 4 (without support provider (sup4)). Beyond that, small and medium-sized organizations lead to a new role cluster consisting of infrastructure provider (hyb1), physical infrastructure provider (ven4) and support provider (sup4). These are small and medium-sized infrastructure providers from a variety of countries, many of them primarily offering their services on local markets (city, region or country) (Floerecke and Lehner 2018a, 2019a). Examples are Itenos GmbH, NxtGen Data Center & Cloud Services Ltd., Calligo Ltd. and Green Cloud Technologies LLC. The clusters 2 and 3 do not exist for small and medium-sized organizations.

A cluster analysis with regard to the cloud turnover compared to the total turnover showed that “cloud-only” organizations often take up similar roles of clusters 1 and 4, whereas more diversified (“non-cloud-only”) firms are especially represented in clusters similar to clusters 1, 2 and 3. The clusters 2, 3 and 5 could not be found in “cloud-only” organizations; the clusters 4 and 5 could not be found in “non-cloud-only” organizations. However, it must be noted that only a minority of the organizations was included in this cluster analysis because of missing values for the cloud turnover.

Since the majority of organizations are based in USA, another cluster analysis was performed, dividing the data set in US and non-US organizations. The latter predominantly fulfil the roles of clusters 1 and 3. American companies take up the roles of clusters 1 and 2 and similar but slightly different roles to those of cluster 3 (infrastructure provider (hyb1), managed service provider (hyb8), physical infrastructure provider (vend4), support provider (sup4)) and 4 (application provider (hyb3), independent application software provider (vend5), training provider (sup3) and support provider (sup4)). The clusters 2, 4 and 5 could not be identified in non-US organizations; the cluster 5 could not be found in US organizations. The reason why cluster 5 could not be found neither for US, nor for non-US organization is probably the small number of representatives in the entire data set, which became too low to be considered significant by the cluster algorithm.

To sum it up, organizational size, geographical location and cloud turnover do not affect the clustering results, but the restriction on small and medium-sized organizations leads to a further cluster, bundling roles specific for them.

Comparison with the results from Pelzl et al. 2013

The five role clusters differ considerably from those of Pelzl et al. (2013), which has been the only study on role clusters in the cloud domain so far. At a first glance, they identified ten clusters, a significantly higher number. A closer look in their results shows that all clusters comprise, contrary to this study, at least one of the ecosystem’s core roles (infrastructure (hyb1), platform (hyb2) and application provider (hyb3)). The only similarities are that largely equivalent clusters for the clusters 3 and 4 can be found and that aggregators are not important for the cluster building as their number is comparatively low.

The discrepancy in the results is attributable to the different research designs: Pelzl et al. (2013) applied a rudimentary value network model including only twelve roles, restricted their sample to German cloud providers and used a comparatively small sample size comprising 80 cloud providers respectively 82 cloud services. Apart from that, it can be assumed that the structure and composition of the cloud ecosystem has changed substantially since 2013. Furthermore, Pelzl et al. (2013) assigned all organizations accurately without any overlaps in clusters, so that their clustering is based on equality, not on similarity. This explains their greater number of clusters. Applying their clustering approach to the data set of this study would have led to an exorbitantly high number of clusters which is not meaningful.

Discussion

The empirical evaluation of the PaCE model, which was published in 2016, revealed that it already performed well with respect to structural equivalence and completeness. Nevertheless, some modifications were necessary: roles have been newly added, removed, broken down into sub-roles respectively abstracted to a meta-role and renamed. Now every organization of the large sample referring to the cloud ecosystem can be adequately mapped using the PaCE model. At the same time, each role of the PaCE model has its legitimacy because each has been identified and evidenced in practice. A part of these modifications addresses recently occurred developments within the cloud ecosystem, such as the increasing popularity of multi-cloud management (hyb12) and the increasing awareness both on the provider’s and the customer’s side for the inability of traditional on-premise applications to meet cloud-requirements (e.g., cost, performance and scalability advantages) (hyb4). Other added roles, e.g., data centre developer (vend2) and managed service provider (hyb8), have not been identified in the initial model version due to the applied evaluation approach. Instead, they were added based on the results of the first data collection stage in this study, which underlines the effectiveness of the applied approach.

However, for some roles only a small number of representatives could be found. At this point, it can only be speculated about the reasons. Market place operators (application (hyb5), platform (hyb6) and infrastructure market place operator (hyb7)) act in two-sided markets that bring customers and providers of cloud services together (Roson 2005). The unwillingness of customers to visit multiple two-sided markets is a reason for the market concentration in search engines or auction platforms and may explain the low number of market place operators. Application market place operators (hyb5) have been found most frequently. The reason could be the high amount of SaaS services available in the cloud ecosystem. Private cloud software (vend6) and virtualization software (hyb7) are highly complex with respect to software development (Habib 2008). This complexity and the resulting costs for development and maintenance (Ogheneovo 2014) may constitute entry barriers for companies. The cause for the rare occurrence of the role second-party-auditor (sup1) might be that the cloud auditor market is still young and emerging (Lins et al. 2016).

Overall, it was shown that a considerable proportion of organizations only hold one ecosystem role. Particularly the isolated existence of the environment roles is not surprising as the corresponding organizations often are public administrations or NGOs without commercial interests. The organizations holding the roles data centre developer (vend2) and hardware developer (vend3) are not limited to the cloud ecosystem but are simultaneously part of other ecosystems. This is a possible explanation why they oftentimes do not hold any further roles of the cloud ecosystem.

Furthermore, this study revealed that some roles are (nearly) exclusively occupied in combination with others. Aggregators (service integrator (hyb11) and service bundler/multi-cloud provider (hyb12)) integrate or bundle multiple cloud services, often across all three cloud layers (IaaS, PaaS, SaaS). Therefore, it stands to reason that they offer those services not only in a combined but also in a separate manner. For organizations taking up primarily core cloud ecosystem roles (infrastructure (hyb1), platform (hyb2) and application provider (hyb3)) it is plausible that they additionally help their customers to migrate into the cloud (transition services provider (hyb9)) and manage the purchased cloud services (managed service provider (hyb8)). The reason why support providers (sup4) are often found in combination with other roles could be that the cloud services are very provider-specific and thus do not allow a vendor-independent support.

To summarize, most surprising was that only a few of the roles in the cloud ecosystem are mostly occupied isolated from the others and therefore, that the actual cloud business takes place in role clusters. This fact was neither known in academic nor in market research before. Moreover, it could be shown that it is not only the big players in cloud computing that often combine certain roles. Instead, there are numerous other organizations that take on similar combinations of roles, leading to the emergence of four additional, disjoint clusters. This contrasts with the public perception, which often focuses on the big players. Furthermore, it was somehow astonishing that already during the check of the organization-role assignments gathered in the first round of data collection, some of the companies were disappeared because they became insolvent or were acquired by another company. This speaks for a considerable dynamic and consolidation within the cloud ecosystem.

Conclusion

Contributions

By means of the role-based description of the cloud ecosystem a categorization schema for organizations of the cloud ecosystem and thus an instantiation of an analytic theory (Gregor 2006) has been created, which enables, guides and supports future research in the field of cloud computing. Compared to common market overviews by market research institutes, distinguishing rather technically oriented mainly between the segments of IaaS, PaaS and SaaS, a broader and more precise perspective is offered. The success in the cloud business is characterized by a multitude of activities, which go beyond the traditional market segmentation. Hence, there was a lack of essential information so far, which is necessary to analyse and explain the market development as a whole and the success of individual organizations.

The information on prevailing role clusters, isolated and non-isolated roles within the cloud ecosystem offers insights regarding which roles respectively business models lead to synergy effects, are disjunctive, mutually dependent or even mutually excluding. These insights enable future in-depth investigations. In this regard, the PaCE model supports the examination of the cloud ecosystem at different levels of abstraction, including company or regional level. Especially the company level is currently of great importance as a multitude of companies are announcing plans to build an own platform ecosystem.

As the PaCE model is not the first and only cloud ecosystem model, in the following it is highlighted to what extent it differs from the previous models and which of their limitations have been overcome:

  1. (1)

    The final PaCE model is the only cloud ecosystem model which has been evaluated systematically regarding its structural equivalence and completeness. Together with the identification of role clusters, but also isolated and non-isolated roles the PaCE model resolves the missing transparency of structure and composition of the cloud ecosystem both for theory and practice.

  2. (2)

    A demonstration of the PaCE model’s usefulness was given as it was used as a research framework enabling and supporting the identification of role clusters and the detailed and goal-oriented analysis of cloud business models (Floerecke 2018; Floerecke and Lehner 2018b).

  3. (3)

    The PaCE model is the only cloud ecosystem model that contains the important environment, which stands in a reciprocal relationship with the inner of the ecosystem.

  4. (4)

    Whereas the existing cloud ecosystem models are restricted to the public cloud, the PaCE model covers all common cloud deployment models (public, private, hybrid and community). In other words, the role- and cluster-based structure is superior to the segmentation of cloud computing by service and deployment models.

Concerning end customers, the authors hope that they can contribute to a better understanding of the structure and composition of the cloud ecosystem by means of the PaCE model, so that more companies, especially those that still have concerns, are encouraged to use cloud services more increasingly in future. Specifically, end customers are able to compare and evaluate different usage scenarios, e.g., direct delivery from infrastructure (hyb1), platform (hyb2) and application (hyb3) providers or indirect delivery from brokers (e.g., aggregators and market place operators), from a strategic viewpoint in accordance with their specific requirements. As a comprehensive overview of cloud providers is missing, the collected list of cloud providers and in particular, the assignment of organizations to ecosystem roles can serve as a valuable support for vendor selection. To make the data set directly useable for end customers, it is necessary to develop a tool with an integrated search function. The opportunity should be given to search for organizations, e.g., with certain roles, from a specific country or with specific deployment models.

Established cloud providers can match themselves to a role or a role cluster and derive or examine further service options. In this regard, they obtain support in their decision whether to offer a specific service or to leave it to another organization.

New market entrants benefit from a better understanding of the cloud ecosystem and potential roles and role clusters they may obtain from the beginning. They thus find support for designing their initial business models and identifying niches within the ecosystem.

For researchers, the PaCE model serves as a research framework. Researchers are able to identify, formulate and locate research questions and topics and can use the model to compare the results. They can focus on specific parts of the cloud ecosystem – specific roles, their interrelations or role clusters – and investigate them in detail, e.g., from a business model, a risk or a network economic perspective.

Limitations

The authors are aware that this study has several limitations. Firstly, the data is a sample which has been drawn from the global cloud ecosystem. The key problem is that the population is widely unknown and presumably very large. It can be assumed that German organizations are overrepresented in the sample due to the use of the German version of the Google search engine. The same probably applies to the organization size: It is obvious that large and popular cloud providers can be found more easily. Nevertheless, by means of the identification of 2294 organization-role assignments based on 758 cloud providers it can be argued that the sample is, particularly due to its size and the underlying systematic search strategy, at least partially representative.

A second limitation might be that the organization-role assignments are not error-free because of the relatively abstract role specifications on the one hand, and the partly ambiguous, incomplete or even incorrect descriptions of the organizations and their service and product portfolios on their websites, on the other hand. However, before starting with the second round of data collection, the organization-role assignments were double-checked and corrected in the case of erroneous assignments, so that this influence has been at least partly reduced.

Thirdly, in the cluster analysis only the case was considered that an organization holds one or more roles. The presumably comparatively rare case that several organizations fulfil a specific role only together has been neglected.

A fourth limitation is the comparatively low silhouette measure value of the role cluster solution. However, as there are for each of the 758 organizations a huge number of possible combinations of roles it is not surprising that only a part of the organizations shows an identical combination of roles. This is in line with the prevailing assumption that the organizations and thus the business models within the cloud ecosystem are quite heterogeneous.

Fifthly, a certain degree of subjectivity can be assumed regarding the actual design of the PaCE model. This is a consequence of the decisive role of the modeler within the modelling process (Festl and Sinz 2012; Lehner 1995). The set of organizations within the cloud ecosystem could have been partitioned into roles in a slightly different way. Moreover, the roles vary in their level of abstraction and broadness.

Outlook for future research

This study paves the way for future research. In particular, the PaCE model in combination with the identified role clusters, isolated and non-isolated roles enable in-depth studies on success of organizations within the cloud ecosystem. But not only the nodes of the network, also the edges should receive special attention in future. As the PaCE model so far considers the ideal-typical relationships between the roles an important contribution might be to investigate the real relationships.

Overall, the PaCE model serves as a research framework meaning that it enables to identify, formulate and locate research questions and topics as well as to compare research results. Researchers can pick out specific parts of the cloud ecosystem and investigate them in detail by collecting appropriate qualitative (e.g., expert interviews and delphi studies) or quantitative (e.g., experiments and web-based surveys) data. The data set created in this study (available upon request from the authors) can act as an important and useful base for an extension according to the requirements of a specific selected research question. Several particularly important and interesting research questions are listed in the following:

  • What individual ecosystem roles are in general more profitable than others?

  • How do the identified role clusters vary according to profitability?

  • Does it have advantages being a member of one of the five role cluster over not being a part of them?

  • Is fulfilling a bundle of roles generally more successful than a single role?

  • Why are some roles commonly never occupied in isolation, but only in combination with other roles by organizations?

  • How can a role cluster, particularly the corresponding business model portfolio, be managed successfully?

  • What business model characteristics of the various ecosystem roles contribute to the role’s economic success?

  • What value streams generate most monetary value?

  • Which risks in general, and network risks in particular, are associated with the occupation of specific ecosystem roles? How can these risks be managed in view of the common performance responsibility with respect to the end customers?

  • Where in the cloud ecosystem do linear value creation relationships predominate and where real network relationships?

  • Where in the cloud ecosystem can what form of net effects be observed?

To conclude, this study empirically accessed the complex and opaque cloud ecosystem. It thus responded to the call for more empirical research in the field of business ecosystems (e.g., Anggraeni et al. (2007); Floerecke and Lehner (2015); Järvi and Kortelainen (2017)). For this purpose, this study has taken a snapshot of the cloud ecosystem, which is common in the research on business ecosystems (Nischak et al. 2017). However, because of the dynamic character of the cloud ecosystem, it can be assumed that its structure and composition will further change in future. Therefore, it is indispensable to verify and, if deemed necessary, adapt the PaCE model at regular intervals in order to ensure its continuing validity. To this end, the underlying data set must be reviewed and updated periodically. In case of substantial changes in the data set, a new cluster analysis will be required in order to check whether the identified role clusters remain valid.

The authors plan to continue the data collection and to periodically reassess the data set in the course of a longitudinal study aiming to evaluate how the organizations, the roles, the role clusters and thus the whole cloud ecosystem evolve over time. An interesting issue is how many organizations will survive and how many will disappear due to mergers, acquisitions or insolvencies. The authors intend to enrich the data with business figures in order to investigate the economic development. As a whole, the PaCE model is an important step towards a better understanding of the cloud ecosystem.