1 Introduction

A digital service ecosystem (DSE) has been defined as an open, loosely coupled, domain-clustered, demand-driven, self-organizing agents’ environment where each entity is proactive and responsive for its own benefit [1, 2]. A DSE can be seen as a new kind of self-organized environment that addresses openness and dynamicity, enabling collaborative innovation and co-creation among the members of the ecosystem [3]. In this context, a digital service can be any added value/benefit that is delivered digitally [3]. It is automated entirely and ideally controlled by the customer of the service.

DSEs are complex and dynamic due to several reasons, such as increasing number of components, devices and services; changes in the technology used; and applications becoming more difficult to manage. As a result, DSEs are evolving rapidly without much control. In this context, service engineering of DSEs has new challenges, such as change and evolution of requirements; gathering of quality requirements and assessment; and uncertainty caused by dynamic nature and unknown deployment environment, composition and users [3]. Another important challenge in digital ecosystems is co-evolution among ecosystem members and in customer participation. The complexity and dynamics in which these digital services are deployed, therefore, call for solutions to make such services autonomic [4, 5], i.e., capable of dynamically self-adapting their behavior in response to changing situations. Autonomic computing (AC) initiative can provide strong elements in overcoming the main challenges and obstacles to the exploitation of DSEs.

The AC initiative’s influence has been present in many computing domains, e.g., grid computing [6, 7], artificial intelligence [8], robotics [9], control systems [10], service-oriented architecture (SOA) [11, 12], cloud computing [13] and complex adaptive systems [14]. In recent years, several methods and techniques have been proposed to exploit the benefits of the AC initiative in service-oriented ecosystems, for example, SAPERE [15,16,17,18,19,20], CASCADAS [21,22,23] and BIONETS [24,25,26,27]. However, very little work has applied the AC initiative in the DSEs domain [28,29,30]. Looking at the state of the art, none of the methods seems to address in a generic and adaptive way the service engineering of DSEs, especially an ecosystem-based method on applying the AC initiative is missing in the DSE domain. Furthermore, there is no good systematic review of scientific literature when it comes to the DSE domain. There are several literature reviews on the general research area of AC [31,32,33,34] and a few narrow literature reviews focusing on its application in domains such as grid computing [35] and self-adaptive systems [36, 37]. None of these reviews covers the DSE domain. A survey article that addresses the following is clearly missing in the literature: (1) the main requirements of a service engineering methodology for autonomic DSEs; (2) the shortcomings or gaps in existing AC methods in DSEs; and (3) the research activities required to overcome the shortcomings. This article aims to set this straight.

This survey article presents a review and comparison of the AC methods in DSEs from the viewpoint of service engineering, i.e., requirements engineering and architecting of services. The review is based on systematic queries in four leading scientific databases and Google Scholar, and it is organized in four thematic research areas. After the literature searches and analysis, 12 primary methods have been selected to be most relevant to our study and a review has been conducted by the authors to identify the most relevant aspects of the research. In this regard, 13 research questions have been used which have been incorporated in a comparison framework. This framework can be used as a guide for comparing the different scientific methods selected from the research areas.

This article unfolds as follows. In Sect. 2, we provide background information and definitions of the terminology that are frequently used in the context of the methods reviewed in this survey. Section 3 outlines the research method used in the literature review. Section 4 introduces our comparison framework for comparing the different primary methods selected from the research areas. In Sect. 5, we present an overview of each primary method and a comparison of these methods using the framework defined. Section 6 discusses the results of our survey, and Sect. 7 concludes the survey.

2 Background and definitions of the main technology

In this section, we provide background information and definitions of the terminology that are often used in the context of the methods analyzed. To this end, we define terms for AC, DSEs, digital services and quality attributes for the purposes of this article and place them in context.

2.1 Autonomic computing initiative

The terms autonomic, autonomy, autonomous and autonomicity have been presented in various domains such as language, biology and philosophy. In general, the term autonomic implies occurring involuntarily, unconsciously or automatically, or resulting spontaneously, from internal causes such as autonomic reflexes. Meanwhile, the term autonomous originates from ancient Greek in early nineteenth century, and in Greek it means having its own laws. According to Oxford English Dictionary [38], autonomous signifies one’s capability of self-governance or having the freedom to act independently, also implying self-containment and self-direction. Autonomicity signifies the state of being autonomic. Meanwhile, the term autonomic computing has been named after the human body’s autonomic nervous system (ANS). ANS is responsible for the human body to perceive, adapt to and interact with the world in order to manage dynamically changing and unpredictable circumstances.

The evolution of AC from its inception can be described as follows. Several initiatives were undertaken by both industry and academia since early 1990s to develop self-managing and autonomous systems, thus contributing to the AC initiative. In this regard, Small Unit Operation Situational Awareness System is a notable preliminary self-managing project initiated in 1997 by the Defense Advanced Research Projects Agency (DARPA) [39]. This project developed technological aids that help the army with operational superiority, for example, providing the soldiers with richer information about the battle space or environment through improved communication and electronic sensing capabilities. Later, another project on self-management was initiated by DARPA called Dynamic Assembly for Systems Adaptability, Dependability, and Assurance. Its objective was to enable mission critical systems to meet high-assurance, dependability and adaptation requirements. In the late 1990s, NASA made use of the AC initiative in its space projects, such as the Mars Pathfinder and Deep Space 1. NASA’s main aim was to make deep space probes more autonomous so that the probes can speedily adapt to extraordinary situations and space crafts are able to carry out autonomous operations for longer periods of time with no human intervention [35].

On March 8, 2011, Dr. Horn, research director at IBM, presented the importance of AC and its direction during a keynote speech at Harvard University [40, 41]. Soon afterward, IBM server group introduced the eLiza project, which was later known as the AC project, thus beginning the AC journey at IBM. The AC initiative is a vision introduced by IBM for creating self-managed systems [4]. It seeks to render a computing system as self-managed, that is, to enable computer systems to manage themselves so as to minimize the need for human intervention [42]. The main goal of AC is to address the increasing complexity of modern computing systems by removing demand for skilled administrative interventions and automating system management [43]. AC benefits the IT domain in the short term by reducing the dependence on human involvement and the system total cost of ownership. Near short-term benefits more specifically are: improved user experience because of better system quality of service (QoS); reduced requirements for human intervention; better user access to services due to more natural human–machine interaction facilities; lower maintenance costs due to reduced requirements for human intervention; and lower usage costs due to better resource management [43].

In 2003, an architectural blueprint to build AC systems was introduced by IBM in which five building blocks for an autonomic system have been presented [42]. The blueprint also identified four self-* characteristics considered as fundamental for any autonomic system, and as a consequence, the most cited in the AC domain are: self-configuration, self-healing, self-optimization and self-protection [42]. These features are referred in short as self-chop [42]. Since AC domain’s inception, the list of self-* features has been continuously growing. However, many of the latter features can be incorporated in the original self-chop list. Examples of the other self-* features are: self-anticipating, self-adapting, self-adjusting, self-aware, self-critical, self-defining, self-destructing, self-diagnosis, self-governing, self-installing, self-managing, self-monitoring, self-organized, self-recovery, self-reflecting, self-simulation, self-stabilizing. Other than these self-* properties, context-awareness specifically represents an additional key capability of an autonomic system. It means an autonomic system must be able to detect and adapt to changes in its execution environment, which can be user behavior, available resources or interactions with neighboring systems [43].

IBM has proposed five incremental levels of maturity in autonomy in [40, 42] where self-management and autonomicity have been progressively integrated into the continuously evolving software system. They are basic, managed, predictive, adaptive and autonomic [43]. In another complementary classification scheme presented in [32], the autonomy of systems has been adapted to four classes: support, core, autonomous and autonomic. Today the AC initiative’s influence has been present in many computing domains, such as grid computing, artificial intelligence and multi-agent systems, robotics, control systems, SOA, cloud computing and complex adaptive systems. However, very little scientific literature exists on the application of the AC initiative in the DSEs domain.

2.2 Digital service ecosystems

A service ecosystem is a socio-technical complex system where service providers can reach shared goals and utilize the services of other members in the ecosystem to gain added value [44, 45]. A DSE is part of a service ecosystem, but it only covers the digital part, leaving out the social part. An example of a DSE is an interactive multi-screen TV services ecosystem in the Innovative Cloud Architecture for Real Entertainment (ICARE) project [3, 46]. This DSE includes 25 service ecosystem members from Europe providing and using digital cloud-based services on operating end-to-end interactive multi-screen TV services. There are two dimensions in a DSE: species and underlying infrastructure and services support [1]. According to [1], several factors characterize a DSE, for example a strong information infrastructure, a domain-oriented cluster and rich resources offering cost-effective digital services.

A DSE contains several elements: ecosystem members, ecosystem infrastructure, capabilities, and digital services [3]. The main members of a DSE are service providers, service brokers, service consumers and infrastructure providers. The ecosystem capabilities describe the capability model that defines the properties of the ecosystem. It also describes how the properties have been implemented using the ecosystem services provided by the ecosystem infrastructure. The ecosystem capabilities are implemented by the infrastructure, which supports the utilization of core competencies and core assets, flexible business networking and efficient business decision-making. Independent ecosystem members provide digital services in a DSE where the members provide additional value for both service consumers and other service providers [3].

In this context, a digital service can be any added value that is delivered digitally [1,2,3]. It is automated entirely and ideally controlled by the customer of the service [3]. Users can use digital services to enrich their everyday life, for example, exploiting services that can aid a person to monitor and guide in his or her health and well-being issues. An example for a digital service can be found in [47], which is a situation-aware safety service for children. In it, sensor and social web technologies have been exploited in the development of a safety service to enable proactive and instantaneous assistance and guidance for children in their daily lives.

2.2.1 Quality attributes

In DSEs, achieving the expected quality of a digital service is very challenging as the quality goals of all the supporting services need to be satisfied as well. Therefore, addressing quality attributes in the earliest possible phases of the software lifecycle like requirements engineering and architecture design is central in DSEs.

Service requirements for DSEs can be categorized as functional, non-functional, business requirements and constraints [3]. Functional requirements describe the behavior of a service that fulfills the tasks of the user. On the other hand, non-functional requirements describe the qualities of the service system, which can be defined as internally and externally observable properties. Meanwhile, business requirements help service providers to achieve business goals, and constraints are characteristics that limit the development and use of the service [3].

Quality is a term with multi-dimensional meaning, which depends on the context it is used. Software quality has been defined in IEEE 1061 [48] as the degree to which software possesses a desired combination of attributes. ISO/IEC 25010 [49] presents a software quality model with six categories of characteristics (i.e., functionality, reliability, usability, efficiency, maintainability and portability), which are then divided into sub-characteristics. These non-functional characteristics of a component or system are commonly known as quality attributes. Quality attributes can be categorized as execution and evolution quality attributes [50]. Execution qualities (e.g., performance, security, availability, usability, scalability, reliability, interoperability, adaptability) are observable at runtime. In comparison, evolution qualities (e.g., maintainability, flexibility, modifiability, extensibility, portability, reusability, integrability and testability) are not distinguished at runtime, and as a result, solutions for evolution qualities are in the static structures of the software system [50].

Several challenges and limitations can be identified for ecosystem-based service requirements engineering process, such as service co-innovation, service value co-creation, enabling infrastructure and utilization of ecosystem’s assets [3]. Also, the definition of quality requirements for DSEs needs further exploration, and special skills are required in the innovation and requirements analysis, negotiation and specification phases. However, quality ontologies, quality-driven methods and tool support for attaching quality properties for architectural elements as discussed in [51,52,53,54,55] can aid the quality requirements engineering process.

3 Research method

This section outlines the research method used in this survey. Our method was motivated by the normative information model-based systems analysis and design (NIMSAD) framework [56]. NIMSAD focuses on classification and thematic analysis of scientific literature. It is a general framework for evaluating any methodology, and it uses the entire problem-solving process as the basis of evaluation. A main goal of our survey is to describe and compare each primary method against the comparison framework (see Sect. 4) defined in the study. Typically, surveys based on the systematic literature review (SLR) method [57] focus more on the guidelines followed, and thematic analysis and detailed comparisons of the primary methods are not given much emphasis. As in the SLR method, the current study follows three different stages: planning, conducting and reporting (see Fig. 1 for process steps and outcomes). In this section, we provide an overview of the procedure followed and describe in detail the research questions and the search strategy followed; the primary method selection procedure and criteria applied; the quality of the selected papers; the data elements extracted from the papers; and the data analysis and synthesis methods used. The review was conducted by a research Fellow in AC systems, and the results were reviewed by a research professor in digital systems and services.

Fig. 1
figure 1

Overall procedure of the survey

3.1 Planning stage

Research questions, search strategy and databases: The most important activity during planning (Fig. 1) is formulating the research questions. To this end, we have expressed our objectives in the form of 13 research questions (see Table 1), which have been defined from a broad perspective. Our objective was to capture a comprehensively full range of the literature on AC methods in DSEs.

A search strategy was defined to detect as much of the relevant literature as possible. That is, it needs to identify all relevant primary methods that address the research questions. To this end, literature searches were conducted from March–May 2015 (updated in January 2016) using four scientific databases—Scopus, IEEE Xplore, ACM digital library, Springer link—as well as Google Scholar. The scientific databases used are the most relevant in the software engineering area [57], and with the inclusion of Google Scholar, an exhaustive list of databases is not necessary. Our review is based on automatic search process which depends on the search engines of the scientific databases used. However, as the general search string (Boolean ANDs and ORs) has been adapted to each database according to its internal requirements, we contend that relevant studies have not been excluded. In each case, the search string “autonomic computing” AND “service ecosystem” was entered, with no temporal limitation. The initial results were as follows:

Scopus returned 58 results. Scopus is the largest abstract and citation database of peer-reviewed literature, indexing about 20,000 peer-reviewed journals, books and conference proceedings.

IEEE Xplore returned 2 results. This database covers electrical engineering, computer science and electronics, and indexes more than 160 journals and 1200 conference proceedings.

ACM digital library returned 1 result. ACM is the world’s largest scientific educational computing society.

Springer link returned 26 results. This database contains journals and conference proceedings published by the Springer publishing house, and indexes over 8.3 million scientific documents.

Google Scholar returned 88 results. Google Scholar provides a simple way to broadly search for scholarly literature, allowing search across many disciplines and sources. This is beneficial in gaining an overall understanding of the results as it is based on various disciplines and sources.

Table 1 Research questions

3.2 Conducting stage

Once the planning stage is completed, the review proper (conducting stage) starts.

Primary method selection procedure and criteria: As an initial screening, titles and abstracts were read and the following three main research areas were manually identified:

  • AC methods in DSEs

  • AC methods in service ecosystems

  • quality-driven software engineering methods.

The papers were considered from the perspective or viewpoint of service engineering, i.e., requirements engineering and architecting of services. A research area here represents an important study area considered for analysis and comparison. The use of solid quality-driven software engineering methods is essential in the service engineering of DSEs, as handling and managing quality in an ecosystem is a more complex and challenging process. In DSEs, service systems are integrated solutions from several service providers, and therefore, in order to achieve the intended quality of a digital service, quality goals of all the supporting services need to be satisfied too.

As the initial result set and number of research areas identified were very limited, the scope of the search was broadened. As stated in Sect. 1, DSEs are characterized by uncertainty caused by environmental disturbances or evolving requirements. Although the number of methods that address uncertainty using self-* features of the AC initiative in the DSE domain is scarce, as evident by the very limited results returned in the initial search process, valuable lessons can be learnt and applied through methods in other related domains like dynamically adaptive systems (DASs). Therefore, literature search was performed and the following search string “autonomic computing” AND “dynamically adaptive system” was entered with no temporal limitation. The result of this search is as follows:

“autonomic computing” AND “dynamically adaptive system”—Scopus: 20, IEEE Xplore: 2, ACM digital library: 35, Springer link: 48, Google Scholar: 69.

The titles and abstracts of the research articles returned were read and the following research area was manually identified:

  • DASs-based methods that support self-* properties (in requirements engineering or architecting phases of the software lifecycle)

Figure 2 shows the research areas identified during the analysis. As shown in Fig. 2, the four research areas are represented by:

  1. 1.

    intersection of DSEs and AC

  2. 2.

    intersection of service ecosystems and AC

  3. 3.

    intersection of DASs and AC

  4. 4.

    quality-driven software engineering.

Note that, although an overlapping of the quality-driven software engineering research area can be identified with other domains (e.g., AC, service ecosystems, DASs), we consider quality-driven approaches independently from their application domain. Thus, it has been represented independently in Fig. 2.

After this analysis, as the resulting 349 articles were overlapping, articles indexed by two or more databases were eliminated. In order to handle the inconsistency between the meta-data format stored in different databases, we used the RefWorks reference management system. The benefit is it automates the task of aggregating research papers into a consistent list in a unified format.

The selection criteria are generally used to determine which studies are included in or excluded from a review. In this review, both theoretical and empirical studies, and studies conducted in both industry and in academia were considered for inclusion. The inclusion and exclusion criteria need to be based on the research questions, and for this purpose, the following criteria were used:

Fig. 2
figure 2

Thematic research areas identified

Inclusion criteria:

  • The primary method is in one of the four main research areas identified during initial screening.

  • The primary method provides evidence of service engineering, i.e., requirements engineering and architecting of services, which is the perspective considered in this study.

Exclusion criteria:

  • The primary method provides no abstract or full text of the approach.

  • The primary method is written in a language other than English.

Finally, 12 primary methods were selected to be most relevant to our study and a review was conducted by the authors to identify the most relevant aspects of the research.

Quality of the selected papers: Quality criteria are important to assess the quality of the primary methods, which are aimed at minimizing bias and maximizing internal and external validity. To this end, first, quality instruments [57] can be formed which are checklists of factors that need to be evaluated for each primary method. Second, how quality data are to be used can be specified. However, in this review, no detailed quality assessment was performed as the goal of our survey was to identify all the AC methods in DSEs as much as possible. Existing scientific literature applying the AC initiative in DSEs is very little, which can be because it is still a very new research topic. Yet, as mentioned earlier, we used several general inclusion and exclusion criteria when selecting the primary methods for analysis.

Data elements extracted from papers: During the data extraction step (Fig. 1), data extraction forms were used to extract primary method properties from the primary methods. These primary method properties correspond and relate to the different characteristics (see Sect. 4) defined in the comparison framework. The intention was to help address all 13 research questions in each primary method. Some interpretation of data was necessary as not all information available was sufficient to answer all the 13 research questions. In addition, the following items were used during data collection: (1) the author(s) with their affiliations, the source (e.g., Journal article, conference paper, technical report) and year; (2) research area and scope; (3) the most relevant papers on the primary method; (4) a summary of the method; and (5) additional notes.

Data analysis and synthesis methods used: The data analysis step (Fig. 1) is used to synthesize the data so that the research questions can be answered. This step involved collating and summarizing the results of the primary methods in tables. Tables were used to organize the data with basic information about each study. The synthesis here is descriptive, exploratory and comparative. It is descriptive as the analysis is made by defining the research questions and elements of the comparison framework. Exploratory analysis is performed by finding out the thematic research areas and mapping the identified data/methods to them. Comparison analysis is done by studying and presenting characteristics of each primary method in the thematic research areas, and summarizing and analyzing the main findings.

In addition to answering the research questions, we used the data to identify interesting trends or limitations, such as how long and who has led the research in the respective research areas identified, i.e., any specific organization of researchers, and limitations of the current research approaches.

3.3 Reporting stage

We will disseminate the results of the review using a Journal article (this publication, see Fig. 1). The results of this review are provided in Sect. 5, while a discussion of the results is provided in Sect. 6.

4 A comparison framework for autonomic computing methods in digital service ecosystems

In this section, we introduce our comparison framework that we use for comparing the different scientific methods from the four thematic research areas. It incorporates the 13 research questions identified in Table 1, Sect. 3.1. We explain the different characteristics of the framework and provide justifications for their inclusion.

As mentioned in Sect. 1, none of the surveyed methods appears to address in a generic and adaptive way the service engineering of DSEs. That is, specifically, an ecosystem-based method on applying the AC initiative is missing in the DSE domain. Therefore, there is a need for a coherent, systematic ecosystem-based method and framework to support the requirements engineering and architecting of digital services with AC capabilities. This needs to be performed by adopting a generic and adaptive way to tackle the complex needs of adaptation behavior of these systems. To this end, several characteristics are significant, such as top-down vs. bottom-up approach, decentralized control, self-* features, context-awareness, reflexivity, quality attributes (e.g., evolvability, interoperability, scalability) and method validation (see Fig. 3). These characteristics are intended to make the framework both theoretical and practical, in which method validation focuses on the practical side of a method while the other characteristics focus on its theoretical side.

Fig. 3
figure 3

Comparison framework taxonomy

The categories of the comparison framework are based on the NIMSAD framework [56]. NIMSAD has been used in the development of a number of comparison frameworks in software engineering (e.g., [58, 59]). NIMSAD defines four essential elements for evaluating a methodology: method context, method user, method content and evaluation of method. A distinctive feature of NIMSAD is its fourth element, evaluation, which is missing in many other similar frameworks [56]. For these reasons, NIMSAD has been selected in the present survey. The 13 research questions (RQs) established in Table 1 can be broadly categorized under these four categories (i.e., context: RQ1, user: RQ2, method: RQ3–RQ11, evaluation: RQ12–RQ13). First, in the context category, the analyzed method is examined from the problem situation point of view (see RQ1, Table 1). Second, in the user category, the method is examined from the viewpoint of the intended users of the method (RQ2). Third, the method contents category focuses on the content of the method itself (RQ3–RQ11). Finally, in the evaluation category, the validation details of the method are focused (RQ12–RQ13). Descriptions of each characteristic of the comparison framework are provided next.

Goal and expected benefits: First the goal of the analyzed method must be clearly defined. Also, the expected benefits of using the method need to be described.

Top-down versus bottom-up approaches: Autonomic systems can be characterized by their operating conditions and by multiple dimensional properties such as top-down and bottom-up approaches, and centralization and decentralization [36].

On the one hand, traditional top-down approaches can be adopted to engineer systems where specific functionalities or behavior is achieved by explicit design. On the other hand, bottom-up approaches (e.g., nature-inspired or bio-inspired approaches [21]) are used to achieve functionalities via spontaneous self-organization [17]. Both these approaches are beneficial where a top-down approach can be used to engineer specific local functionalities while the latter can be adopted to engineer large-scale behaviors. The line between these two approaches is often not clear, and a method can incorporate techniques from both alternatives.

Decentralized control: Adaptation logic can be decentralized, centralized or applied in a hybrid manner [37]. A method needs to define models and tools to support decentralized control so that both collective adaptation and adaptation by subparts can be provided. Decentralization (e.g., see [60]) is a feature of cooperative self-adaptive or self-organizing systems, which function without a central authority [36]. Decentralized systems are usually bottom-up and the large numbers of components contained in these systems interact locally according to simple rules, thus emerging the global behavior of the overall system. In a centralized system approach, a central unit controls the system, but this approach is not suitable for large systems due to its size and real-time constraints. Meanwhile, a hybrid approach has both centralized and decentralized elements [37, 61]; thus, both collective adaptation and adaptation by subparts can be provided.

Self-* features: As described in Sect. 2.1, the four self-* characteristics (self-chop) considered as fundamental for any autonomic system, and as a result, most cited in the AC domain are self-configuration, self-healing, self-optimization and self-protection [42]. Self-configuration describes the adjustment of system components in a user independent manner to achieve overall system behavior according to higher-level goals. Self-optimizing is achieved when the system provides operational efficiency by tuning resources and balancing workload. Meanwhile, self-healing means that the system provides resiliency by discovering and preventing disruptions and recovers from malfunctions. Self-protecting means that the system secures critical assets and resources by anticipating, detecting and protecting against any security risks. Other than these self-chop properties, self-adaptation [36] is a key characteristic of an autonomic system. It is realized as a situation-based behavior that takes into consideration the functional and quality properties of the environment and system itself, and the needs of the users.

Context-awareness: The need for context-awareness (e.g., see [62, 63]) is a recognized issue in complex adaptive systems such as DSEs [3]. Although acquiring data in order to support context-awareness is not an issue, handling significant amount of data is very challenging [17]. Also, awareness can encompass situations occurring not only at the locality of individual components but also at many different levels of the system. Therefore, in order to perform autonomous adaptation activities in a collective and coordinated way, they need to be driven by more comprehensive levels of awareness than the traditional context-aware computing models.

Reflexivity: Reflexivity is an important characteristic of a self-managed autonomic system, which means that the system must have knowledge of its components, current status, capabilities, limits, boundaries and interdependencies with other systems and available resources [64]. It is the capability of making intelligent decisions based on self-awareness. Also, the system must be aware of its possible configurations and how they affect specific non-functional, quality requirements. The knowledge processing is based on rules, machine learning algorithms and software agents. In the current study, we consider reflexivity as a technique that can be exploited to support evolution (evolvability) of the ecosystem.

Although reflexivity is a relatively new term in service engineering, reflection is a widely known mechanism that can be used to support reactive or proactive adaptation of software systems. Reflection is defined as the ability of software to examine and modify its structure or behavior at runtime [65, 66]. Reflection can be of two types: introspection and intercession. Introspection is the observation of an application’s own behavior, while intercession is the reaction on introspection’s results, which can be structural, parameter or context adaptation [67]. Reflection techniques have been investigated with self-adaptive systems as an underlying principle for self-awareness on different levels of software, e.g., architectural reflection [68], behavioral reflection. However, these methods apply reflection on the software itself, while we consider reflexive behavior with respect to unanticipated changes at the larger ecosystem level to support evolution of the ecosystem, and not at the system level.

Quality attributes: Non-functional requirements describe the qualities of the system. From service development point of view, QoS defines a set of quality attributes that a particular service has to fulfill. As a consequence, quality attributes defined in the QoS specification of a service system has to be dealt in each software engineering phase: in requirements specification, architecture design and implementation.

As discussed in Sect. 2.2.1, quality attributes can be categorized as execution and evolution quality attributes. While all these attributes are important, however, in this survey, we only focus on quality attributes that are significant from the ecosystem viewpoint of service engineering of digital services (e.g., evolvability, interoperability, scalability).

Evolvability: By evolvability we refer to the ability of the ecosystem to evolve in dynamic situations (for example, see [29]). An ecosystem is dynamic, evolving all the time as new members, services and value networks emerge [3]. Therefore, in order to adapt to the needs of the ecosystem, the ecosystem’s knowledge management model should evolve too. Additionally, new support services need to emerge as and when required. As new requirements emerge, requirements innovation is a continuous process inside the ecosystem.

Interoperability: Interoperability is the ability of software to exchange information and to provide something new, which originates from exchanged information [69]. The main goal of interoperability models and rules is to enable the loosely coupled services to collaborate. In [53], six interoperability levels have been defined for smart environments, i.e., conceptual, behavioral, dynamic, semantic, communication and connection. In order to support ecosystem interoperability, four interrelated metamodels have been proposed in [70], which are domain ontology, methodology, domain reference model and knowledge management metamodels.

In DSEs, proper service engineering techniques are required to develop digital services that are interoperable, available and easily consumed by taking into consideration the specific capabilities of the ecosystem [3]. In order to support service interoperability, two main elements are required by the ecosystem to engineer services in an ecosystem: ecosystem infrastructure and knowledge repositories [71]. Ecosystem infrastructure makes services interoperable, available and easily consumed and therefore manages all service ecosystem operations. Meanwhile, storage of the collaboration models, service descriptions and ontologies of service types to support interoperability are provided by knowledge repositories. Other than service interoperability, pragmatic interoperability is achieved between ecosystem members when their intentions, business rules and organizational policies are compatible [3]. Pragmatic interoperability deals with context data, which is specified as internal state of the system [71]. It also deals with the specification of the system process that employs the data. For examples and usage of service and pragmatic interoperability, refer to [71].

Scalability: In general, scalability in software engineering has been commonly known as the ability of a system, network or process to handle growing amounts of work in a graceful manner or its ability to be enlarged to accommodate that growth. A formal definition of scalability for digital ecosystems has been provided in [72] as: “to a certain degree, a digital ecosystem is scalable if its performance stays effective and efficient while large amount of input data or large quantities of heterogeneous participating entities are added.”

The component model for a DSE can potentially include a very large scale of target scenarios; thus, it must promote scalability of both design (i.e., software engineering scalability [73, 74]) and execution complexity (i.e., performance scalability [73]). In other words, the component model for a DSE should be based on sound design principles that can be practically applied to small systems and to very large systems, and the component model of the DSE needs to exhibit scalable performances and QoS.

Method validation: There should be some level of evidence regarding the maturity of the method, such as the evidence of its use and applicability. It is important to ascertain whether the method has concretized in several research papers. Also, the method should provide a way to validate its results. In this regard, a method can be applied at the conceptual level, as a proof of concept in the lab, or in the development of large-scale industrial product using a case study.

5 Overview and comparison of autonomic computing methods in digital service ecosystems

This section presents each of the 12 primary methods organized in the four research areas (see Sect. 3.2, Fig. 2) in greater detail. To this end, an overview of each primary method is provided followed by a comparison of the primary methods against the comparison framework (Tables 2, 3, 4). The four research areas are:

  • AC methods in DSEs

  • AC methods in service ecosystems

  • quality-driven software engineering methods

  • DASs-based methods that support self-* properties (in requirements engineering or architecting).

Table 2 Comparison summaries of the AC methods in DSEs
Table 3 Comparison summaries of the AC methods in service ecosystems
Table 4 Comparison summaries of methods from DASs with self-* properties and quality-driven software engineering research areas

5.1 AC methods in DSEs

This research area includes the articles found explicitly using the AC initiative in the DSE domain. Digital ecosystems are not characterized by only one reference model as they crosscut different business domains and value chains [72]. As a consequence, architectures need mechanisms to allow the participants to publish any model and investigate on models that are most suitable to their needs. In order to handle these challenges, the AC initiative has been exploited in three main studies in the DSE domain, which are:

  • self-controlled components [28, 75]

  • evolving SOAs [29]

  • autonomic SOA for DSEs [30, 76,77,78].

See Fig. 4 and Table 2 for a comparison of these three primary methods against the framework.

Self-controlled components

Cloud computing and the future Internet create a new ecosystem where everything is a service with custom composition and dynamic management of resources at runtime. In this context, in the OpenCloudware project [28, 75], the authors introduce a compositional framework to compose components as services which can be self-controlled. In self-controlled components (SCC), self-control mechanisms are attached to them to enable autonomic application management during execution. The objective is to provide strong QoS guarantees of composed applications.

The authors adopt a top-down approach from architectural modeling to service implementation, and runtime support including autonomic contract management. The SCCs are based on grid component model which the authors have extended with service-oriented features. Autonomy has been introduced in the SCCs using feedback control loops with elements of monitoring, analyzing, planning and execution of adaptation activities (MAPE loops). A high level of decentralized control is provided using MAPE loops defined at the top global level of the composition, and interactions that occur through the hierarchical component model. This method supports several self-* properties such as self-reconfiguration, self-adaptation and self-management. MAPE loops in an SCC provide self-reconfiguration with actions to change the component structure or dependencies between the involved components at runtime. Self-management of resources is supported for each QoS criterion in the quality model. A significant feature and contribution in their work is the support for QoS control and management. For this, each SCC has a QoS control component to ensure compliance with the service contract. It defines four QoS criteria: availability, integrity, time and capacity.

Fig. 4
figure 4

AC methods in digital service ecosystems compared against the framework characteristics

The authors apply a use case called Springoo to show how the adoption of SCCs helps the service composition to provide a guarantee of QoS. Springoo is a web application providing online merchant applications using Apache/Jonas/MySQL components. The architecture has been partially validated using the grid component model/ProActive middleware to provide the monitors, QoS control and MAPE components. While QoS guarantees have been comprehensively provided in their work, a detailed context-awareness model to support autonomous adaptation activities is missing. Also, reflexivity and quality attributes of evolvability, interoperability and scalability of the ecosystem have not been addressed in [28, 75].

Evolving SOAs

Briscoe and De Wilde [29] have presented a largely bottom-up method called an ecosystem-oriented architecture (EOA) of digital ecosystems by extending SOA with distributed evolutionary computing, thus allowing services to recombine and evolve over time and increasing its effectiveness for the users. Here, the word ecosystem is more than just a metaphor. Digital ecosystems, which are digital counterparts of biological ecosystems, have been defined as software systems that exploit the properties of biological ecosystems. They can automatically solve dynamic and complex problems, such as robustness, scalability and self-organization.

The architecture of the digital ecosystem provides a two-level optimization scheme, and there a high level of decentralized control has been supported. The underlying tier of distributed agents consists of a decentralized peer-to-peer network. The second optimization level is based on an evolutionary, genetic algorithm that operates locally on single habitats (peers). The self-* properties supported are self-organization and self-management (see Table 2). The digital ecosystem here is a multi-agent system, which uses distributed evolutionary computing to combine appropriate agents so that user requests for applications are satisfied. In this architecture, each service is a habitat and the network of habitats creates the digital ecosystem. The continuous changing requirements and their complexity in contextual, adaptive environments are the driving force for the evolution and self-organization of agents. The authors consider two main models from several variants of distributed evolutionary computing: the coarse-grained island model and the fine-grained diffusion neighborhood model. However, they propose to use a reconfigurable network topology so that habitat connectivity can be dynamically adapted based on the observed migration paths of the agents in the habitat network.

One of the key features and benefits of authors’ work is the support for scalable architectures in order to meet user requests for applications. This has been fulfilled by applying a fundamental paradigm shift where a push-oriented approach has been used instead of a pull-oriented one. In a push-oriented approach, the digital ecosystem composes applications preemptively and upon request. On the other hand, a pull-oriented approach generates applications only upon request in SOAs. This method has been validated using a simulation of an EOA-based digital ecosystem. This EOA-based simulation has been compared against a simple SOA with a distributed UDDI (Universal Description, Discovery and Integration) service registry. The results indicate that with the increasing number of services, the digital ecosystem outperformed the traditional SOA system. Maintaining the self-organizing evolution of digital ecosystems in a scalable architecture is a main benefit in the authors’ method. However, Briscoe and De Wilde [29] have not considered context-awareness, reflexivity and interoperability in their method.

Autonomic SOA for DSEs

In [30, 76,77,78], following a top-down approach, the SOA has been extended with the AC initiative (autonomic SOA) to achieve a more adaptive and a robust architecture for DSEs. This is to keep up with the dynamic changes of requirements and environment. The authors elaborate on the design and implementation model of an autonomic SOA using a case study in computational engineering. Compared to traditional SOA, the autonomic SOA technique includes an autonomic manager and a knowledge base, which provides the ability to adapt to changes.

The proposed architecture of the autonomic, self-organizing SOA contains three layers: presentation layer, process layer and service/resource layer. The presentation layer, which is the top-most layer of the architecture, provides an interface for various users. The processing layer, which is the middle layer, performs and coordinates autonomic functionality. The bottom resource layer provides utilization of the distributed resources using web services. It is provided as a typical SOA framework that contains a service registry and service providers. The service registry’s functionality has been extended by introducing a knowledge base for the autonomic processes. The actual autonomic concept has been provided by an autonomic manager in the processing layer. The autonomic manager performs the autonomic cycle of monitoring, analysis, planning and execution over the knowledge base (MAPE-K loop), and this provides some degree of decentralized control. The authors have used Unified Modeling Language (UML)-based metamodeling concept to model the proposed architecture as a UML class diagram (basic metamodel for context-awareness). There several domain-independent stereotypes have been used to specify the relationships (e.g., request, call, instance, publish, find and bind) between the components (e.g., user, autonomic manager, composer, brokers, service registry and service provider) in the architecture. Also, a sequence diagram has been derived of the service architecture.

The method has been validated using a case study in computational engineering [76], and an implementation of a work-in-progress prototype of the proposed SOA has been presented. However, reflexivity and quality attributes (e.g., evolvability, interoperability, scalability) have not been addressed in the method.

5.2 AC methods in service ecosystems

In the following, we discuss four primary methods that apply the AC initiative in the domain of service ecosystems. As mentioned in Sect. 2.2, service ecosystems are socio-technical complex systems where service providers can reach shared goals and utilize the services of other members in the ecosystem to gain added value [44, 45]. Service ecosystems are closely related to the research areas of Internet of Services (IoS) and service value networks. IoS considers the Internet as a global platform for retrieving, combining and utilizing interoperable services. Meanwhile, service value networks [79] provide business value through agile and market-based composition of complex services. This is from a pool of service modules by the use of a universally accessible network orchestration platform. The reviewed four primary methods are:

Figure 5 and Table 3 provide a comparison of these four primary methods.

Fig. 5
figure 5

AC methods in service ecosystems compared against the framework

SAPERE

In [15,16,17,18,19,20], the authors propose a nature-inspired reference architecture called SAPERE (Self-Aware Pervasive Service Ecosystems), which can be a useful guide in the design and implementation of self-adaptive pervasive service ecosystems. They identify several research challenges emerging from the convergence of cyber-physical worlds, such as comprehensive situation-awareness, top-down vs. bottom-up design, power of masses, decentralized control, and diversity and evolvability [19].

The authors explain how the SAPERE middleware infrastructure supports the SAPERE model and framework. SAPERE middleware has followed a bottom-up approach getting inspiration from natural systems. Decentralized control is exhibited by the bottom-up emergence of self-organized patterns of coordinated behaviors. The self-* properties supported are self-adaptation, self-organization and self-management (see Table 3). In the SAPERE framework, pervasive services are modeled and deployed as autonomous individuals in an ecosystem of other services and devices. All of these interact according to a limited set of self-organizing, self-adaptive coordination laws called eco-laws. The provisioning of distributed pervasive services is realized by a variety of adaptive, self-organizing patterns (context-awareness support). The authors survey and analyze a number of natural metaphors that can be adopted in the modeling and architecting of innovative pervasive service ecosystems. This is to support spatiality, adaptability, openness and long-lasting evolvability of the ecosystem. The key metaphors introduced are physical, chemical, biological and social, and the key differences between them are the way the species, space and eco-laws are modeled and implemented. They have discussed how diversity and evolution of the ecosystem can be supported by these four metaphors. On interoperability, this has only been partially addressed in [20] where they explain on a mechanism on how to explicitly externalize knowledge out of services and use it to carry out interactions. The authors highlight scalability as one of the main challenges of data storage and analysis in pervasive and mobile computing [17].

The authors’ method has matured and evolved in many research papers [15,16,17,18,19,20]. The middleware implemented has been validated in the context of exemplary use cases on information and guidance services in a smart museum. Although the need for interoperability and scalability has been highlighted as important characteristics in the reference architecture, it is not clear how the implemented middleware infrastructure supports these qualities. Also, reflexivity has not been supported in their method.

CASCADAS

In the EU project CASCADAS (Component-ware for Autonomic Situation-aware Communications, and Dynamically Adaptable Services) [21,22,23], the authors introduce a model of an autonomic component to support the evolution of the ecosystem through self-awareness and self-organization. The architecture of the ecosystem is based on distributed autonomic components called autonomic communication elements (ACE). The internal behavior of ACE is described by means of a declarative representation called the self-model.

CASCADAS has elements of both top-down and bottom-up approaches where autonomic mechanisms have been included using a top-down approach while bio-inspired mechanisms are provided through a bottom-up approach. A high level of decentralized control is supported as self-organization capabilities are part of the ACE autonomic behavior defined within the self-model. The self-* properties supported are self-awareness, self-organization and self-management. Their work supports a detailed level of context-awareness with its self-model, which is defined as a set of extended finite state machines. These state machines include rules for modifying them to adapt ACE behavior to the changes of internal and environmental conditions. Explicit support for quality attributes has not been mentioned in [21,22,23], but evolvability of the ecosystem is provided by programming the self-model of the ACEs. Using experiments, the authors have shown that the ACE architecture is scalable in several dimensions, such as memory, threads and communication delay. Thus, the applicability of the ACE model in large autonomic communication scenarios is clear.

The CASCADAS method has been experimentally validated using simulations of a use case concerning a decentralized server farm, as part of a complex service ecosystem. But reflexivity and interoperability have not been supported in CASCADAS [21,22,23].

BIONETS

The BIONETS (BIOlogically inspired autonomic NETworks and Services) project [24,25,26,27], which is a European Commission FET (Future and Emerging Technologies) initiative on Situated and Autonomic Communications, aims at enabling autonomic pervasive computing environments through the introduction of biologically inspired approaches. The project uses evolutionary techniques embedded in the system components as means to achieve full autonomic behavior. BIONETS looks at how nature and biology in particular (e.g., chemical computing, artificial embryogenies and evolutionary games) can be used to achieve self-chop features through open-ended evolution [24]. The authors describe four main challenges stemming from Future Internet scenarios: scale, heterogeneity, complexity and dynamicity [24]. The overall goal of BIONETS is provisioning of a service ecosystem for autonomic services. This service ecosystem needs to be able to fulfill user demands and needs in a transparent, efficient manner by exploiting the unique features of pervasive computing and communication environments.

Like the SAPERE method, BIONETS also follows largely a bottom-up approach where it gets inspiration from nature to build a distributed autonomic system based on local interactions. Decentralized control has been provided to allow services to adapt and evolve at the component level and global ecosystem level. BIONET places greater emphasis on four specific AC initiative properties, which are self- configuration, self-healing, self-optimization and self-protection [24] (see Table 3). There are three main actors in BIONET networks with respect to devices: T-Nodes, U-Nodes and access points [25]. T-Nodes gather data from the environment and are read by U-Nodes, which are complex, powerful devices passing by the T-Nodes. U-Nodes use T-Nodes to interact with the environment and gather information to run the context-aware services (context-awareness support). Access points are complex powerful devices that act as proxies between BIONETS networks and IP networks. The BIONET project is built on two main pillars of networks and services, which converge to provide a full autonomic environment for network services. The latter is provided by self-evolving services, which is a bio-inspired platform, centered on the notion of evolution. Evolution here builds on the notion of self-organization, and it has been considered at two levels: single components (micro) and global ecosystem (macro). At the single component level, each service is able to design and build its own protocol stack and its own network. On the other hand, at the global ecosystem level, the interactions among service entities provide the means for rapid service evolution at the same time maintaining global stability properties. BIONETS achieves scalability through an autonomic and localized peer-to-peer service-driven communication paradigm [25].

Lahti et al. [26] present a validation of the BIONETS concepts as a simulation case and proof-of-concept implementation for a service mobility framework. However, like CASCADAS, reflexivity and interoperability have not been defined in their framework (see Table 3).

Self-reconfiguration for service ecosystems

Li et al. [80] propose an AC method to enable a service-based system to continue adjusting its configuration by means of an autonomic loop of monitoring, analyzing, planning and executing actions. Their top-down approach shows how AC initiative can be implemented to perform self-reconfiguration for service-based systems to satisfy two common metrics of non-functional requirements, i.e., response time of services and the system resource consumption. The focus of reconfiguration here is to satisfy non-functional requirements, and support for functional requirements, business requirements, constraints and quality attributes have not been mentioned. Their method focuses on the geometry configuration of service-based systems as opposed to dynamic reconfiguration exploited in traditional, distributed systems. The authors have used heuristics [80] to formalize a basic model of configuration and reconfiguration definitions (context-awareness support).

The main AC functions implemented to support self-reconfiguration of a service-based system include the following MAPE feedback loop activities: monitor to initiate reconfiguration; analyze to diagnose the configuration; plan to select reconfiguration; and execute for implementing reconfiguration. In addition, knowledge has been presented as a configuration of service-based systems described using architecture description standards, goals or policies. These MAPE loops provide some degree of decentralized control of the service-based system.

The authors have used preliminary experiments to evaluate their method. The method has been demonstrated using a service ecosystem, which provides mechanisms to dynamically change the location of services on machines while executing service requests. The service ecosystem here is a resilient service-operating environment in which the deployed services (e.g., grid services or web services) can be dynamically migrated in response to changing demand on resources to guarantee service-level agreements and to optimize resource utilization. However, their method [80] does not support reflexivity and any quality attributes (e.g., evolvability, interoperability, scalability). Also, it has not matured in several research papers, and therefore, it is difficult to establish the applicability of their method more clearly (see Table 3).

5.3 DASs-based methods that support self-* properties

In the following, we discuss four primary methods selected for comparison from the DASs-based methods that support self-* properties research area. These are selected to be most relevant to our study, or these provide valuable lessons that can be learnt and applied from the DASs domain to the present context. The methods are from the perspective of service engineering, and these can be from requirements engineering and architecting phases of the software lifecycle (see Fig. 2). The four primary methods are:

  • requirements reflection [81,82,83,84]

  • architectural styles for runtime adaptation [85, 86]

  • digital evolution of behavioral models for autonomic systems [87,88,89,90,91]

  • evolutionary computation for DASs [92, 93].

DASs continuously monitor their environment and adapt behavior in response to changing environmental conditions [94]. In these systems, reconfiguration of software may need to be performed at runtime (e.g., software uploaded or removed) in order to handle new environmental conditions. Example domains that apply DASs include automotive systems, telecommunication systems, power grid management systems and ubiquitous systems.

Requirement reflection method supports runtime representation of requirements for DASs. Although there are several existing methods on requirements specification of DASs [94,95,96], the requirements reflection method supports the synchronization between requirements and architecture from which the current study can learn and draw parallels to the notion of reflexivity introduced here for DSEs. Thus, it has been selected for comparison here. In the same manner, at the architectural level, architectural styles for runtime adaptation method has comprehensive support for context-awareness modeling with their architectural styles for DASs.

Recently, there has been considerable interest within the software engineering research community (e.g., [87,88,89,90,91,92,93]) to apply evolutionary computation techniques for handling the threat of uncertainty [97] on adaptation capabilities of DASs. In [97], a taxonomy of potential sources of uncertainty from the DASs perspective has been presented with techniques for mitigating them. Evolutionary computation is a subfield of computer science which applies the basic principles of genetic evolution to problem-solving [91]. Digital evolution [98] is a branch or form of evolutionary computation. In digital evolution, self-replicating computer programs exist in a user-defined computational environment and are subject to mutations and natural selection. In this context, we analyze two primary methods, (1) digital evolution of behavioral models for autonomic systems and (2) evolutionary computation for DAS. Compared to other related methods in evolutionary computation, these methods support self-* properties and, more importantly, they have matured in several research papers.

See Fig. 6 and Table 4 for a comparison of these four primary methods against the framework.

Requirements reflection

In [81,82,83,84], the authors following a top-down approach introduce a method for requirements reflection, which means making requirements available as runtime objects. Requirements reflection is important as future software systems will be self-managing and these systems need to adapt continuously to changing environmental conditions. Requirements reflection can support such self-adaptive systems by making requirements first-class runtime entities, allowing software systems to reason about, understand, explain and modify requirements at runtime. It supports self-adaptation by using a runtime goal model and qualitative and quantitative reasoning about how the goal model’s organization changes over time.

Fig. 6
figure 6

DAS-based methods and quality-driven software engineering methods compared against the framework

Bencomo [81] classifies uncertainty and adaptations that a self-adaptive system has to face as foreseen, foreseeable and unforeseen. Several research challenges on requirements engineering of self-adaptive systems have been identified, such as dealing with uncertainty, runtime representation of requirements, evolution of the requirements model and synchronization with the architecture, and dynamic generation of software [81,82,83,84]. In order to deal with uncertainty, they use and extend goal-oriented requirements modeling (context-awareness support) with the RELAX language [99, 100], which has been developed to support modeling and reasoning about uncertainty in design time and runtime models. Runtime representation of requirements has been achieved by providing language support for representing, navigating and manipulating instances of a metamodel for goal modeling such as the KAOS metamodel [101]. In order to facilitate requirements reflection and synchronization between the goals and the architecture, the authors propose a two-layer model, that is, a base layer that consists of runtime requirements objects and a metalayer that allows the dynamic manipulation of requirements objects. This results in two layers—one for requirements and one for architecture—and each has a casually connected base layer and a metalayer. For the dynamic generation of software, they recommend the use of generation and transformational techniques in software engineering.

The authors’ research has matured in many research papers. In [84], their method has been applied to synthesize emergent middleware to achieve interoperability in the context of the CONNECT project [102]. In emergent middleware, mediators are synthesized from runtime models, which provide support to reason about interoperability issues. However, the authors do not mention on decentralized control and any scalability features of the architecture (see Table 4).

Architectural styles for runtime adaptation

Taylor, Medvidovic and Oreizy [85, 86] present a top-down method on architectural styles for runtime software evolution. Runtime software evolution or dynamic adaptation is the ability of a software system’s functionality to be changed during runtime without reloading or restarting the system [85]. Architectural styles are “named collections of architectural design decisions that (1) are applicable in a given development context, (2) constrain architectural design decisions that are specific to a particular system within that context, and (3) elicit beneficial qualities in each resulting system” [85]. The architectural styles considered in [85] are REST (representational state transfer), event-based, service-oriented and peer-to-peer, and these styles can be used to provide decentralized control of the architecture.

The main targeted self-* property is self-adaptation while architectural styles can be used to provide comprehensive level of context-awareness modeling. They assess a range of styles with respect to a four-element evaluation framework called BASE introduced previously in [86]. The BASE framework provides means for evaluating, comparing and combining techniques for runtime adaptation. The BASE framework can be applied to differentiate techniques based on the system model they operate on and on how the four key aspects of runtime change are confronted, i.e., behavior, asynchrony, state and execution context. Architectural styles provide a technique for representing quality properties in architectural models and supporting quality-aware architecture modeling process. Architectural styles and patterns promote different quality attributes, and in [85], the quality attribute—dynamic adaptability—has been supported.

The authors do not specify any details of validating their method [85]. The authors’ work [85] does not support reflexivity, interoperability and scalability features of the framework. There are several other existing methods that leverage architectural styles to enable dynamic adaptation, such as the Rainbow framework [103], and Kramer and Magee’s layered reference architecture for self-adaptation [104] which includes mechanisms to swap out components and/or connectors at runtime.

Digital evolution of behavioral models for autonomic systems

By leveraging the Darwinian evolution, in [87,88,89,90,91] the authors propose a software development methodology capable of producing self-* software. They investigate the application of digital evolution to the design of software that exhibit self-* properties. In their method, a population of computer programs can be found in a user-defined computational environment, and it is subject to mutations and natural selection. Applying digital evolution in DASs provides means to produce economical software solutions that exhibit robustness, flexibility and adaptability.

The authors’ method has been applied to generate behavioral models that capture autonomic system behavior. Their model-driven engineering process for DASs follows a top-down approach using several phases, such as goals, requirements, design models and implementation. A digital evolution-based tool called Avida-MDE (Avida for model-driven development) has been developed for generating behavioral models, which satisfy requirements specified as scenarios and properties. The authors propose a development model with three stages: cultivation, evaluation and deployment [87]. The Avida-MDE tool extends the Avida digital evolution platform in three ways to support state diagram generation. They are: first, defining search space by providing instinctual knowledge, which is information available to an organism at birth; second, generating behavioral models using this instinctual knowledge; and third, evaluating an organism based upon how well its generated behavioral model satisfies the requirements using model checking tools. The authors highlight two scalability challenges and present how their method will scale when used with larger applications. The two challenges are, first, allowing organisms to evolve large and increasingly complex diagrams, and second, model checking of the diagrams to verify that the functional properties are satisfied [88]. These potential scalability challenges have been addressed through the model abstraction and incremental development features of their method [88].

The method has been validated using two main case studies. First, it has been applied to generate behavioral models, describing the navigation behavior of an autonomous robot navigation system [87, 89, 91]. Second, in [90], the method has been validated by applying it to an adaptive flood warning system. Their work has matured in several research papers. However, decentralized control, context-awareness, reflexivity and interoperability issues have not been defined in their method (see Table 4).

Evolutionary computation for DASs

In a related method to the preceding method, in [92, 93] the authors describe a process and a suite of tools to support the development of DASs. Their top-down approach starts with requirements and moves through reconfigurable designs at runtime. They exploit the power of evolutionary computation into model-based development and runtime support of high-assurance DASs.

The authors have defined uncertainty that can arise in three different aspects of cyber-physical systems: physical environment, cyber environment and components themselves. The sources of uncertainty in these aspects can happen at runtime, design time and requirements, and the authors try to address uncertainty with three enabling technologies: model-based development, assurance and dynamic adaptation. They highlight several evolutionary computation methods, such as genetic algorithms, genetic programming, artificial life and digital evolution, and evolved artificial neural networks [92, 93]. Novel evolutionary algorithms have been harnessed at both design and runtime. For example, their method has been applied to evolve collective communication algorithms for a variety of distributed behaviors that include synchronization, quorum sensing, constructing networks, responding to attacks and reaching consensus [92, 93]. At design time, evolutionary algorithms have been used to better explore possible conditions requiring a DAS to self-reconfigure. At runtime they have been applied to generate safe adaptations, which are unanticipated at design time. The authors use OLYMPUS [92] for handling environmental uncertainty in a DAS where a goal model (context-awareness) is defined as a point of reference at runtime.

The authors have devised a case study to demonstrate how a goal model supports the modeling, monitoring and reconfiguring an intelligent vehicle system (IVS), which needs to perform adaptive cruise control, lane keeping and avoid collisions [92]. A simulation has been built using the Webots simulation platform to demonstrate a version of the IVS application. Additionally, their work has been validated using experiments conducted in the context of robotics [93]. However, decentralized control, reflexivity, interoperability and scalability features are missing in their method [92, 93].

5.4 Quality-driven software engineering methods

In DSEs, service systems and applications are integrated solutions from several service providers. Therefore, in order to achieve the intended quality of a digital service, quality goals of all the supporting services need to be satisfied too. Therefore, dealing quality in an ecosystem is a complex process which stresses the need for solid quality-driven software engineering methods.

Quality-driven software engineering methods (e.g., [105,106,107,108,109,110,111,112,113,114]) emphasize the importance of addressing quality attributes in the earliest possible phases of the software lifecycle like requirements engineering and architecture design. In Sect. 2.2.1, we defined quality and quality attributes and classified quality attributes as execution and evolution qualities. Quality attributes are gathered, categorized and documented as at least equally important requirements as functional requirements. The gained knowledge is used in requirements engineering and software architecture design phases. In [106], authors describe five key industrial software architecture design methods and compare how they address quality requirements in architecture design. The compared methods are Attribute-Driven Design [107], Siemens’ 4 Views (S4V) [108], Rational United Process 4+1 Views [109], Business Architecture Process and Organization [110], and Architectural Separation of Concerns [111]. The goal of this research area is not to provide an exhaustive analysis of scientific literature on quality-driven software engineering methods, which is out of scope here. Yet, we analyze a key method, quality-driven analysis process based on URDAD, which has been selected for comparison as it provides comprehensive support for quality attributes from a service engineering perspective (i.e., a service-oriented methodology used by requirement engineers). Also, the classification of different stakeholders with quality requirements for both process and quality model can provide valuable insights on how quality can be similarly classified in an ecosystem-based method which has many users.

Table 4 (see also Fig. 6) presents a comparison of this method against the framework.

Quality-driven analysis process based on URDAD

Solms et al. [105] propose a top-down, quality-driven analysis and design process based on URDAD (Use-Case, Responsibility-Driven Analysis and Design). URDAD is a service-oriented methodology used by requirements engineers to design services. This method provides comprehensive support for quality attributes where the authors provide a set of quality requirements and drivers specified for both quality model and process. Quality drivers are activities that improve one or more process or model quality criteria [105]. URDAD is used to generate the computation independent models of the model-driven architecture with sufficient details so that it can be used directly as platform-independent models. The authors have defined a domain-specific language for URDAD, called URDAD-DSL, which can be used to specify syntactically correct URDAD models. They identify the stakeholders and their quality requirements for both process and model quality. Then, for each quality criterion a set of quality drivers have been provided, and how quality drivers are embedded within the URDAD methodology has been demonstrated. The authors identify the main stakeholders for the model as requirement engineers, architects, developers, quality assurance staff, project managers and clients. The corresponding quality requirements for the requirements model are simplicity, completeness, modifiability, consistency, decoupling, cohesion, reusability and traceability. Meanwhile, stakeholders defined for process are project managers, requirements engineers and clients, and their quality requirements are low cost, repeatability, estimatability, trainable, measurability, consistency and isolation.

The authors validate the internal consistency of their method by using it to design a service-oriented analysis and design methodology, which generates its own URDAD metamodel and process. Their work is not targeting the AC initiative; thus, support for decentralized control, self-* properties, context-awareness and reflexivity is not applicable. Although a comprehensive set of quality requirements and drivers have been specified for both quality model and process, quality attributes of evolvability, interoperability and scalability have not been mentioned in their method [105] (see Table 4).

6 Results of the survey

Given the characterization of the state of research, several observations can be made. In the following, these observations are summarized and discussed. In this process, we identify some open problems and insights to future work. Table 5 provides a summary of the results of this survey.

6.1 Main findings, open problems and future work

The comparison framework (Sect. 4) defined several characteristics which have been used to analyze and review the methods, i.e., top-down or bottom-up approach, decentralized control, self-* features, context-awareness, reflexivity, quality attributes (e.g., evolvability, interoperability, scalability) and method validation. We can identify dependencies between these different characteristics (see Fig. 7). This will assist in structuring the scientific literature and identifying the status of state of the art in the research areas with respect to the framework characteristics, that is, what has been achieved so far and what is missing. More importantly, this can serve as an effective starting point for defining a road map for service engineering digital service ecosystems with AC capabilities.

Fig. 7
figure 7

Dependencies between the framework characteristics

As discussed in Sect. 4, we have used the NIMSAD framework to examine a method from four different points of view sequentially: context, user, method content and evaluation. Here, we can identify a sequential dependency order between these categories (see Fig. 7). When we examine the individual characteristics within the method content category, top-down vs bottom-up approach and decentralized control precede all other characteristics, as these characteristics are reviewed from a higher level. Two characteristics are fundamentally important for reflexivity: self-awareness (self-* features) and context-awareness. Reflexivity is the capability of making intelligent decisions based on self-awareness (Sect. 4). We only consider quality attributes that are significant from the ecosystem viewpoint, such as evolvability, interoperability and scalability. Reflexivity can be identified as a technique that can be exploited to support evolvability of the DSE. Therefore, as shown in Fig. 7, the dependency between reflexivity and quality attributes can be described.

  • The goal (from context) and benefits (from user) characteristics are specific to a particular method. These are only used to describe the purpose and advantages of using a method, and not directly used to analyze the results of the primary methods.

  • In DSEs, both top-down and bottom-up approaches can be beneficial where a top-down approach can be used to engineer specific local functionalities while the latter can be adopted to engineer large-scale behaviors. Most analyzed primary methods (eight) have been top-down approaches, which achieve specific functionalities or behavior through explicit design. From the reviewed primary methods, evolving SOAs, SAPERE and BIONETS follow largely a bottom-up approach.

Sometimes the line between these two approaches is not clear and a method can incorporate techniques from both alternatives, and this has been exhibited by the CASCADAS method.

  • As mentioned in Sect. 4, adaptation logic can be applied in a decentralized, centralized or hybrid manner [37]. A method needs to define models and tools to support decentralized control so that both collective adaptation and adaptation by its subparts can be provided. In general, decentralized systems are bottom-up and the components contained in them interact locally according to simple rules, thus emerging the global behavior of the overall system. A hybrid system has both centralized and decentralized elements within it. As a result, both collective adaptation and adaptation by subparts are provided. We note that all the methods in the DSE and service ecosystem domains provide some level of decentralized control. To this end, some popular techniques that have been applied are MAPE loops (e.g., self-controlled components, autonomic SOA for DSEs), or techniques getting inspiration from natural (e.g., SAPERE) or bio-inspired mechanisms (e.g., CASCADAS). However, except for the architectural styles for runtime adaptation method, decentralized control is largely missing in the primary methods reviewed from the DASs domain where the level of attention it has received is far less compared to the service ecosystem and DSE domains.

  • Self-configuration, self-healing, self-optimization and self-protection are four self-* features considered as fundamental for any autonomic system [42]. The BIONETS project supports all these four characteristics, and self-adaptation and self-management have been the most supported self-* properties in the reviewed primary methods. Quality-driven analysis process based on URDAD method does not target the AC initiative, so support for self-* properties is not applicable there (see Table 5).

An open issue and challenge when applying the AC initiative to DSEs can be the open-loop structure of an ecosystem. Problems can be created by the open-loop structure of DSEs when incorporating autonomic software systems that have a closed-loop system. Typically an autonomic system consists of a closed-loop system. This means due to continuous changes of the system, it modifies itself at runtime using feedback. Software systems that incorporate closed-loop mechanisms allow them to adapt themselves to changing conditions, thus reducing human effort in the computer interaction. However, due to the open-loop structure of DSEs, there is a need for continuous human supervision, which is a challenge.

  • The need for context-awareness is significant in complex adaptive systems such as DSEs [3]. There needs to be more comprehensive levels of awareness than the traditional context-aware computing models to support and drive the autonomous adaptation activities. Given the prominent place of context-awareness in complex adaptive systems, more efforts need to be directed at designing and developing these models in DSEs. This is because it is virtually absent from the literature except for the autonomic SOA for DSEs method, which defines a basic UML metamodel for adaptation handling. In contrast, support for context-awareness has received more attention in the service ecosystems domain. To this end, CASCADAS method defines a comprehensive context-awareness model called a self-model, which is defined as a set of extended finite state machines. In addition, SAPERE method defines a variety of adaptive and self-organizing patterns to provision distributed pervasive services. However, their work targets middleware infrastructures of pervasive services ecosystems.

  • We see the opportunity and potential in combining the notions of self-awareness and context-awareness to obtain a further understanding of the situation in the context of reflexivity introduced for DSEs in the current study. In Sect. 4, we introduced the notion of reflexivity and compared it with existing reflection techniques. Reflexivity is an important characteristic of an autonomic self-managed system, which means that the system must have knowledge of its components to make intelligent decisions based on self-awareness [64]. For this, two characteristics are fundamental, which are self-awareness and context-awareness. One of the major deficiencies of the surveyed methods is that they did not commit to supporting reflexivity except for one work at the requirements level—requirements reflection. However, in there reflexivity has been applied to synchronize requirements model with architecture while in DSEs this needs to be applied at the larger ecosystem level to support the evolution of the ecosystem.

  • With respect to support for quality attributes, the method—quality-driven analysis process based on URDAD—provides comprehensive support from a service engineering perspective where the authors describe a set of quality requirements and drivers specified for both quality model and process. The different stakeholders identified there can provide valuable insights on how quality can be similarly classified in an ecosystem-based method such as in the DSE domain which has many users.

Table 5 Results summary

Quality attributes can be categorized as execution and evolution quality attributes (Sect. 2.2.1). One challenge is to extend this characterization of quality criteria and properties to provide some metrics to measure them targeting the DSEs domain. Some of these quality attributes may be more easily guaranteed at runtime than at design time. In [115], authors discuss several adaptation properties defined as assurance criteria on the adaptation process, and mapped to quality attributes measurable at runtime for both the target system and the adaptation mechanism. However, it is targeting self-adaptive systems and not the DSE domain. In this survey, we have focused on quality attributes that are important from the ecosystem viewpoint of service engineering of digital services like evolvability, interoperability and scalability.

  • An ecosystem is dynamic and it evolves all the time as new members, services and value networks emerge [3]. Therefore, the service engineering methods need to support the evolution of the ecosystem. However, except for the evolving SOAs, CASCADAS, BIONETS and SAPERE, the other methods analyzed here do not support evolution from the ecosystem’s perspective but from other viewpoints (e.g., individual system level). For example, evolving SOAs method introduces an ecosystem-oriented architecture of digital ecosystems; and there, evolutionary, genetic algorithms have been used to combine appropriate agents so user requests for applications are satisfied. In CASCADAS, evolvability of the ecosystem is programmed by the self-model of the ACEs. Meanwhile, in BIONETS, evolution builds on the notion of self-organization, where it has been considered at the single components level and global ecosystem level. However, a method that explores and combines reflexivity as a technique to support evolution of the ecosystem is completely missing in the literature. In the DASs domain, the primary methods do not support evolution from an ecosystem perspective; thus, this has been shown as partial satisfaction of that characteristic (see Table 5).

  • One noteworthy quality attribute that is almost completely absent in the analyzed primary methods is interoperability, which can be of two types—service interoperability and pragmatic interoperability (see Sect. 4). In DSEs, proper service engineering techniques are required to develop digital services that are interoperable, available and easily consumed by considering specific capabilities of the ecosystem [3]. In requirements reflection, the authors have applied their method to dynamically synthesize emergent middleware that ensures interoperation between heterogeneous networked systems. However, in that method, interoperability has been considered from the network systems viewpoint (partial satisfaction of criteria, Table 5). Meanwhile, pragmatic interoperability has not been considered in any of the analyzed primary methods.

  • A component model for a DSE can potentially include a very large scale of target scenarios. As a result, it must promote scalability of both design and execution complexity with software engineering scalability and performance scalability, respectively. The evolving SOAs method follows a push-oriented approach to support scalable architectures to meet user requests for applications. In BIONETS, scalability is achieved through an autonomic and localized, peer-to-peer service-driven communication paradigm. In SAPERE, authors have highlighted scalability as one of the main challenges of data storage and analysis in pervasive and mobile computing. However, it is not clear how it has been supported in the implemented middleware infrastructure [15]. While software engineering scalability has been addressed in some methods as mentioned above, performance scalability is absent in the analyzed primary methods.

  • With respect to method validation, in most cases, the validation of the method is empirically based. One common aspect in all the primary methods is that they do not present well-designed quantitative or qualitative evaluations, but mainly focus on their own experience in the use of ad hoc methods or informal case studies. In this context, one of the main shortcomings and challenge we note is the lack of actual industrial case studies and scenarios on ecosystem-based digital services. The case studies need to exhibit situations for frequently evolving requirements; dynamic nature of the ecosystem; and digital services developed by several partners. An example of an industrial case study is the ICARE project [3, 46] mentioned in Sect. 2.2. Yet, scenarios that identify afore mentioned complex situations in DSEs are completely missing in scientific literature. Also, there is a lack of publications covering some of the reviewed methods (e.g., self-reconfiguration for service ecosystems), and as a result, the applicability of those methods could not be clearly established.

In this manner, first identifying the dependencies between the framework characteristics and then analyzing what has been achieved so far and what is missing on the framework characteristics can provide an effective starting point for defining a road map for service engineering digital service ecosystems with AC capabilities.

As given in Table 5, it is clear that none of the analyzed primary methods entirely fulfills the requirements defined in the comparison framework. The goal is to compare the primary methods and not the projects. There are three main reasons for a method not fulfilling a particular characteristic of the comparison framework. They are: (1) the application domain—ecosystem-based service engineering is a natural progress of networking and pervasive computing, and therefore, the primary methods—SAPERE, CASCADAS and BIONETS—cover most of the characteristics of the framework; (2) the purpose of the method is more focused and naturally covers only one part of the framework topics. Thus, the intended usage context of the method is narrower than the comparison framework; and (3) the timing—newer publications are more relevant and consider the changes of the ecosystems, at least partially. According to the framework, the most suitable methods are CASCADAS and requirements reflection. CASCADAS satisfies all the requirements of the framework except reflexivity and interoperability, and requirements reflection specifically supports reflexivity. In order to advance the findings of this survey, the use of solid modeling languages, techniques, tools and practices are highly desirable. Toward this end, models@runtime or runtime models [115] can be a key technique which can be explored toward addressing the shortcomings of the existing primary methods. The requirements reflection method has explored models@runtime to support synchronization between goal-based requirements and the architecture (evolution) and dynamically generate software artefacts at execution time. In the DSEs’ context, future studies are needed to investigate on how models@runtime could be employed to mitigate uncertainty through runtime adaptation and evolution, and to the provision of digital services with autonomic capabilities.

Models@runtime are up-to-date abstractions of the running system [115]. They constitute a core concept for enabling adaptation and tackling uncertainty by reflecting the system and its context at runtime. Runtime models provide reflective capability as they are casually connected to the system being modeled. They can be utilized for two purposes: (1) supporting reasoning about uncertainty and leveraging self-adaptation and (2) supporting the generation of software artifacts themselves, using model-driven engineering at execution time [84]. Both runtime adaptation and evolution are necessary to address the frequent changes imposed by DSEs. Runtime adaptation and runtime evolution are two interwoven activities that influence each other [86]. Evolution can be understood as a longer sequence of modifications to a software system over its lifetime [116]. On the other hand, adaptations can be seen as modifications of the software system performed in an automated way.

7 Conclusions

This survey article presented a review of the scientific literature on AC methods in DSEs. Based on systematic queries in four leading scientific databases and Google Scholar, 349 articles were analyzed, out of which 12 primary methods were selected to be most relevant to our study from a service engineering perspective, which were then clustered, succinctly described and compared.

A comparison framework was defined as a contribution, which can be used as a guide for comparing the different scientific methods selected. To this end, the framework proposed several characteristics, such as top-down vs. bottom-up approach, decentralized control, self-* features, context-awareness, reflexivity, quality attributes (e.g., evolvability, interoperability, scalability) and method validation. The main contribution of this article is the comparison of the primary methods selected from the four thematic research areas. The comparison process using the framework was straightforward and uncomplicated. The framework is a valuable tool for searching for an applicable method on service engineering of digital service ecosystems with AC capabilities. It is evident from the comparison that none of the analyzed methods supports all the requirements of the framework. Furthermore, looking at the state of the art, none of the methods addresses in a generic and adaptive way the service engineering of DSEs, especially an ecosystem-based method on applying the AC initiative is missing in the DSE domain. This survey also introduced a technique called reflexivity for DSEs which can be further explored to address uncertainty and frequent changes imposed by DSEs.

We note two main perspectives for future research: (1) investigate how models@runtime can be employed as a technique to mitigate uncertainty through runtime adaptation and evolution in DSEs, and to provision digital services with autonomic capabilities. The reflective capability of runtime models can be explored to address the uncertainty and frequent changes imposed by DSEs at different lifecycle phases such as requirements, architecture design and runtime; (2) develop real industrial case studies and scenarios that exhibit situations for evolving requirements in DSEs, dynamic nature of DSEs and digital services developed by ecosystem members.