Introduction

DevOps has emerged as a practical approach to software development in the context of agile [1, 2]. DevOps enables automation, continuous integration, monitoring, and team collaboration [3, 4] to assist the fast deployment and delivery of quality software [5]. Internet of Things (IoT) is a digital technology that enables connecting a large number of physical devices or things as virtual objects over an established network [6]. There is an increasing interest in IoT applications within the overall context of a digital ecosystem that involves several heterogeneous devices and protocols [7]. There is also an increasing interest among organizations to adopt DevOps for IoT application deployment to multi-cloud [8, 9]. Multi-layer cloud and DevOps could prove useful for the IoT-applications that require frequent updates and real-time interactions with the IoT devices [10, 11]. Hence, the question is how to develop, deploy and govern IoT applications to multi-cloud within the overall parameters of the existing organization ecosystem? To address this complex problem, this research presents the DevOps reference architecture (DRA) [8].

The DRA architectural design is founded on five models: (1) contextual, (2) conceptual, (3) logical, (4) physical, (5) operational. DRA was constructed using the guidelines of the design science research (DSR) method [12]. The adopted DSR [12] has six steps [8]; the paper explains the empirical evaluation (step 5) of DRA and discusses the DSR outcome (step 6) to determine the DRA contribution to both research and practice.

The main scope of this paper is limited to the DRA architecture design and empirical evaluation of the new DRA framework [8]. The empirical evaluation involves two main steps: (1) industry case study, (2) industry field survey. A case study template (CST) was developed, tested and then offered to the case study participant along with the instructions and guidelines for the testing of the DRA design. The survey was offered online to industry practitioners and experts from the software engineering community. The survey participants came from a cohort of international and local organizations.

There are two types of collected data in the empirical evaluation: (1) qualitative (case study and survey); (2) quantitative (survey). The qualitative data were examined to ascertain the correlation between DRA design models and a list of validation criteria. The validation criteria used in this research are explicitly applicable to the evaluation of DSR artefact outcomes [13, 14]. The quantitative data collected in the field survey were analyzed using the well-known statistical analysis techniques (frequency, percentage, and χ2 p value). The quantitative results are plotted onto histogram tables that provide a visual presentation of the survey results.

This paper is organized as follows. First, the article outlines the research problem and discusses related work. Second, the research presents the DSR method. Third, the paper includes an overview of the updated DRA. Fourth, the study discusses the empirical evaluation results of the DRA. Finally, the document discusses the framework's applicability based on the evaluation results and concludes the article with future scope directions.

The research DRA empirical evaluation and the framework applicability are the main contributions of this research. The results of the empirical evaluation aim to deliver sufficient proof that the DRA address the research problem and provide an effective solution for researchers and practitioners to automate the deployment of IoT application to multi-cloud.

Literature Review and Related Work

DevOps aims to improve collaboration and communication between Development and Operations [3]. DevOps provides a set of practices [15] to enable the automation of application deployment for timely release and delivery [16, 17]. DevOps offers broader support for the deployment of applications to the cloud [5, 18, 19] using a wide range of tools that enable automation and continuous integration (CI) [20].

There is growing interest in organizations to adopt DevOps for IoT [6, 21]. IoT, supported by cloud computing [22], aims to achieve interoperability and fast data exchange [23, 24]. Cloud offers PaaS (Platform as a Service) as a virtual platform for IoT application developers. It also provides a back-end solution to manage the vast data streams of IoT application data using Infrastructure as a Service (IaaS) and Software as a Service (SaaS) [25]. DevOps and IoT complex contexts present opportunities and challenges to cloud-computing [26, 27].

The value of IoT for enterprises resides in the fast deployment of IoT applications by developers [28], and by the effective seamless integration with other systems such as the cloud [29]. Models such as SysADL are explicitly designed to preserve a system-oriented [30] using an ADL based on SysML architecture. The IoT applications seem to be data-driven [31, 32]. The performance of IoT applications is deduced by measuring the latencies of interactions using communication protocols (e.g. MQTT, RSS, SSH, Wi-Fi, and mobile [33]) and by handling the increasing amount of data [34, 35].

Cloud computing enables ubiquitous, on-demand shared resources (Cloud API, configurations, and services) [36] that assist with the IoT-applications deployment and auto-scaling [2537]. Cloud computing can provide advanced resources to assist with the design of conceptual models to improve the autoscaling trends of software applications [38, 39]. The interplay and cooperation between fog (edges: support IoT devices) and the core (cloud) can be characterized by the integration between code and devices [29, 40]. Multiple clouds or multi-cloud is the integration of multiple cloud services in a single heterogeneous architecture. Organizations, developers, and researchers can benefit from open-source cloud platforms because they encourage the use of the multi-cloud through broader user access, flexibility, availability, and high-level quality of service (QoS) to the same or different applications deployed to multi-cloud [41, 42]. However, cloud applications are often hardwired to their native cloud API [43].

The major obstacle for adopting a multi-cloud strategy is vendor lock-in [43, 44]. Vendor lock-in may occur in two cases: when a cloud from the multi-cloud cohort hosts the deployment configuration and when a cloud from the multi-cloud cohort hosts the database. Several studies and frameworks have introduced innovative ideas to achieve heterogeneous architecture for continuous deployment to the multi-cloud [45]. For instance, CYCLONE [46] is a software stack that focuses on the areas of application deployment, management, and security and user authentication on the multi-cloud. Another model, CloudMF [23], is an object-oriented domain-specific model tailored for IoT applications. The deployment process to the multi-cloud can follow specific migration patterns [47], such as multi-cloud refactoring and multi-cloud rebinding. The dynamic Data-Driven Cloud and Edge Systems (D3CES) approach enables real-time IoT data collection and provides feedback that promotes effective decision-making to deploy IoT applications to the cloud [47, 48]. IoT can benefit from multi-cloud services and techniques that enable portability and interoperability [49].

The literature review and related work draw our attention to further research in the possible integration of DevOps, multi-cloud, and IoT applications [50, 51]. DevOps adoption for IoT application deployment to multi-cloud requires concrete architecture and guidelines [19, 52]. This need highlights the following research problem and challenges:

  • Automated IoT applications deployment to multi-cloud.

  • Manage connectivity between the IoT application and IoT sensors.

  • Avoid vendor lock-in for IoT application deployment to the multi-cloud.

The new proposed DRA framework addresses the research problem. This paper presents the updated DRA version and its empirical evaluation.

Research Method

This research adopts a well-known DSR method [12, 53], which is a system of principles, practices, and procedures pertained to a specific branch of knowledge to produce and present high-quality research artefacts. The DSR aims to provide verifiable contributions through the design, development, and evaluation of the proposed DRA artefact. The artefact development may involve the review of existing theories and knowledge to develop a solution or artefact for the intended purpose and audiences. The DSR is composed of three primary stages (Fig. 1):

  • Stage 1—DSR main flows. The DSR is composed of two flows:

  • Stage 2—DSR process steps. The DSR process in this thesis is composed of six steps:

    • Problem identification: Initial research into the background and related work helped identify the research problem and gaps and its underlying objectives in “Literature review and related work”.

    • Analysis: The related work and background research provided rich information about DevOps, multi-cloud and IoT.

    • Design: A general design model is created for the new DRA founded on DevOps concepts, and cloud infrastructure and services.

    • Development: DRA architectural model is developed based on the DevOps concepts and cloud services. The architectural model is not fixed and can be applied to numerous instances in multiple contexts.

    • Evaluation: The DRA is evaluated using the DSR evaluation criteria [13] 14 (Table 2). The evaluation involves an industry case study and field survey. The evaluation results and the updated DRA are the main contributions of this paper.

    • Outcome: The updated DRA and its contribution to the SE body of knowledge.

  • Stage 3—DSR outcomes. The DSR process in this thesis is composed of six outputs:

Fig. 1
figure 1

DSR method

DSR Evaluation Method

The DRA has been evaluated using an empirical evaluation approach. The empirical evaluation includes an industry case study and survey. The case consists of five steps: design, prepare, collect, analyze, and report. The industry survey consists of five steps: plan, prepare, develop, deliver, and report. The empirical evaluation process is presented in Fig. 2 (based on [55] 56]), whereas evaluation criteria are shown in Table 2 (based on [13, 14]).

Fig. 2
figure 2

DRA empirical evaluation

DRA empirical evaluation overview shows the evaluation steps (case study and survey). The evaluation data is analyzed to determine the applicability of the DRA.

Case Study Design

The case study approach is commonly used for software engineering research artefact evaluation [55], such as the DRA in this research. In software engineering, an artefact could be an architecture, method, or process model or software tool [56]. This research uses the interpretive case study approach [55], which includes the following steps.

  1. 1.

    Case study planning: Plan the case study and identify the objectives

  2. 2.

    Preparation for data collection: Prepare the data collection method used in the case study.

  3. 3.

    Collecting data: Present the case study implementation and data collection method.

  4. 4.

    Data analysis: Analyze the qualitative data collected using CST and compare the participants’ feedback to the evaluation criteria (Table 2).

  5. 5.

    Reporting: The report summarizes the outcomes of the case study. The report table includes the testing steps of the case study and the description of each step. The report provides evidence of the DRA applicability in the real context of the case study organization.

Survey Design

The survey utilized in this research follows a commonly used survey structure [57]:

  1. 1.

    Plan: Outline the survey objectives (purpose, need, knowledge requirements).

  2. 2.

    Prepare: Identify the target participants (Ethical considerations are required)

  3. 3.

    Develop the questionnaire: Industry survey questionnaires are defined by the researcher using artefact evaluation criteria [13].

  4. 4.

    Collect: Present the survey collection method.

  5. 5.

    Analyze the data: The survey evaluation is composed of two main steps: Survey Quantitative Evaluation and Survey Qualitative Evaluation.

The survey uses the ratings described in Table 1. The ratings transform the participants’ responses to every question into numerical data to be used for the statistical formulas in the quantitative evaluation process. The qualitative ratings in Table 1 are explained as follows:

  1. 1.

    Strongly agree: The participants consider the question claim very satisfactory.

  2. 2.

    Agree: The participants agreed with the statement.

  3. 3.

    Average: The participants somewhat agreed with the statement.

  4. 4.

    Disagree: The participants disagreed with the statement.

  5. 5.

    Strongly disagree: The participants strongly disagreed with the statement.

Table 1 Rating table
Table 2 Evaluation criteria

Statistical Analysis Method

This research used a statistical method to analyze the survey data. The statistical approach is better suited to bring out essential insights from the survey data. According to Hyndman [57], ‘Statistics is the study of making sense of data.’ The statistical formulas used to analyze the numerical survey data are explained in (1), (2), and (3).

Equation (1) describes the Chi-squared that calculates the probability p value (0 ≤ p ≤ 1) and compares it to a critical value α = 0.01. If p value < α, then H0 is rejected and H1 is accepted.

$${\text{Chi}}^{{2}} {\text{or }}X^{{2}} = \sum \frac{{\left( {O - E} \right)^{2} }}{E}\quad \left( {O \, = {\text{ frequency and }}E \, = {\text{ expected value}}} \right) \, \left( {p{\text{-value }} < \, 0.0{1}} \right),$$
(1)

E = ΣO/N (O = frequency and N = total number of observations).

The p value determines if the null hypothesis H0 is accepted or rejected based on a critical value α = 0.01.

If p value < α, then H0 is rejected and H1 is accepted, and there is a positive association between the test variables (DRA models) and the evaluation criteria (see Table 2).

[If p value < 0.000ε (ε is a small number), then p is mathematically corrected to p < 0.001].

H0 (null hypothesis): there is no association between the test variables and the evaluation criteria.

H1 (alternative hypothesis): test variables and the evaluation criteria are positively associated.

Equation (2) describes the sum of frequencies of the participants’ scoring three and above in the rating table (Table 1).

$$\begin{gathered} {\text{AAF }} = \sum {\text{Frequency }}\left( {{\text{Ratings }} > = { 3}} \right) \hfill \\ {\text{AAF is the sum of all participants responses }}\left[ {{\text{Average }}\left( {3} \right) \, + {\text{ Agree }}\left( {4} \right) \, + {\text{ Strongly Agree }}\left( {5} \right)} \right] \hfill \\ \end{gathered}$$
(2)

Equation (3) describes the sum of percentages of the participants’ scoring three and above in the rating table (Table 1).

$$\begin{gathered} {\text{AAP}} = \sum {\text{Percentage }}\left( {{\text{ratings }} > = { 3}} \right) \hfill \\ {\text{AAP is the sum of all percentages of responses }}\left[ {{\text{Average }}\left( {3} \right) \, + {\text{ Agree }}\left( {4} \right) \, + {\text{ Strongly Agree }}\left( {5} \right)} \right] \hfill \\ \end{gathered}$$
(3)

Evaluation Criteria

The purpose of the empirical evaluation is to evaluate the DRA models. The evaluation method used in this research comprises of two phases:

  • The survey questionnaire sets a relationship to the evaluation criteria.

  • Case study participants’ feedbacks relationship to the evaluation criteria.

The evaluation criteria elements are selected based on models used in related work [13, 14]. The survey questionnaire sets are developed to assess the DRA models against the chosen criteria (Table 2). The feedback from the participants’ in the survey and the case study. The feedback was cross-examined to determine their relationships with the evaluation criteria (Table 2).

The DRA Overview

The DRA is founded on the concepts and practices of DevOps [2, 3]. DRA uses the multi-cloud ecosystem to support the deployment of IoT applications [21, 58]. DRA enables automated deployment of IoT-applications to multi-cloud [8]. The initial work-in-progress version of the DRA was published [8] to obtain early feedback from the research community before commencing this empirical study. This paper presents the updated version of the DRA with empirical evaluation results. The updated DRA architecture design is composed of five models [8]: contextual (model 1), conceptual (model 2), logical (model 3), physical (model 4), and operational (model 5) (Figs. 3, 4).

Fig. 3
figure 3

DRA contextual model

Fig. 4
figure 4figure 4

DRAv2.0 Instance (Based on Bou Ghantous & Gill 2018)

DRA Contextual Model

The DRA contextual (Fig. 3) model (model 1) describes the relationship between DevOps, Multi-Cloud, and IoT at a higher level. DRA contextual model is founded on DevOps concepts [3]. Multi-Cloud offers DRA broader user access to virtual servers and a vast array of services [8]. DevOps approach and Multi-Cloud technologies aim to facilitate and support the IoT application deployment [5, 10, 18, 21].

DRA Conceptual Model

The DRA Conceptual Model (model 2) (Fig. 4) expands in detail the high-level contextual model components [DevOps, Multi-Cloud, and IoT] [8]. The DRA context system enables:

  • Automation of agile software development.

  • Automation of application deployment process using CI-broker (continuous integration broker).

  • Automated faster application delivery.

  • Automated and integrated testing of the application.

  • Enhance team experience and improve collaboration and communication.

  • Enable real-time monitoring of the application.

The CI broker component was introduced in the DRA to address the research gap (“Literature review and related work”, e.g., vendor multi-cloud deployment, connectivity, and lock-in). With the research problem in mind, it was essential to devise an approach to deploy IoT applications to the multi-cloud and avoid vendor lock-in, which occurs when a cloud vendor hosts the deployment configurations and when a cloud vendor hosts the database. The CI broker is an essential and novel part of the DRA conceptual model. For instance, the CI broker enables automation (build, testing, logging, deployment), CI, branching development, and automated code synchronization. Most importantly, it hosts the deployment configurations for the IoT application independent of any cloud technology and vendor. The CI-Broker also seems to address the issue of a single or fixed cloud DevOps environment and aims to enable the adaptability, integration, and interoperability of the DevOps approach for supporting the multi-cloud IoT application deployment. The CI broker packages the IoT application in a container and deploys it to the multi-cloud platforms. Thus, the DRA is generic, enabled by CI broker, to be able to be used for different contexts.

DRA Logical Model

The DRA logical model (model 3) further unpacks the conceptual model and is organized into components (M1–M5). The logical model architecture uses DevOps and cloud to create a functional model. The logical model illustration shows how DevOps practices [3] are transformed into features and functions to support the IoT application deployment to multi-cloud (Fig. 4).

DRA Physical Model

The DRA physical model (model 4) (Fig. 4) is an implementation of the DRA logical model. DRA Physical model creates a tangible design based on the Logical components (M1-M5). DRA physical model can be sub-divided into three tiers:

  1. 1.

    DevOps Team tier:

    • M1 enables team communication and real-time notification. M1 receives build/test logs from M2 and deployment and performance logs from M4.

  2. 2.

    Cloud tier—composed of:

    • M2 is the CI-Broker that handles the build, testing, and deployment of the application.

    • M3 is the deployment cloud(s) platform.

    • M4 is the monitoring and tracking platform.

    • M5 is the data management cloud. Applications exchange data NoSQL or SQL with M5. The database using the M5 model is managed separately from the deployment cloud to avoid vendor Lock-in.

  3. 3.

    User tier—represents the user devices operating the application and exchanging data with M5.

The DRA Physical Model can be instantiated to create an end-to-end deployment pipeline (Operational Model) using numerous technology stacks and DevOps tools.

DRA Operational Model

The operational model (model 5—live instance) of the DRA is based on the physical model. The DRA operational model pipeline instance (Fig. 4) is configured using an integrated set of DevOps tools [3, 8]. The DRA pipeline provides automated IoT-application deployment to multi-cloud [8].

The operational aspects (M1–M5 components) of the DRA pipeline are explained below:

  1. 1.

    The IoT-app code is pushed from M1 to M2.

  2. 2.

    M2 (CI-Broker) deploys the IoT-application to M3.

  3. 3.

    M3 is the deployment platform of DRA (multi-cloud).

  4. 4.

    M4 monitors the build, test, and deployment logs.

  5. 5.

    M5 store the IoT-application NoSQL data and provide central management.

  6. 6.

    M1 notifies the DevOps team with the logs from M4.

Figure 4 is composed of four images representing the DRA2.0 models (model 2 to model 5). Figure 4 is refactored into separate images to improve the resolution and view of the DRA design models (model 2 to model 5) relationships as follows:

Industry Case Study

The DRA evaluation was conducted using the case study template for case study organization CPF (coded name) (Link) (see “Appendix”). The data collected from the case study is qualitative feedback about the DRA. The case study process and results are explained below.

Plan

The case study was designed to demonstrate the capabilities of the DRA for the CPF organization. It includes the following:

  • Identify the case study organization: The case study is conducted at the Australian organization [CODE_NAME: CPF] based in Sydney. Date: 24/04/2019

    The organization name was kept anonymous—see ethics approval stored on CloudStor (Link).

  • Case study organization context: CPF provides a single AWS cloud-based modelling platform for digital strategy, architecture, and project delivery. CPF aims to adopt a multi-cloud environment for its modelling platform to address its customer needs. They also have a keen interest in emerging IoT applications.

  • Need, and problem: Need a DevOps approach and middleware broker to deploy their platform features to a multi-cloud environment to meet their different customer needs and also to avoid cloud vendor lock-in.

  • Solutions: DRA seems to address the need, as mentioned above, for the case study organization’s need.

  • Objective: The objective of the researcher is to demonstrate the applicability of the research-based DRA framework to the practical CPF’s organizational context. The CPF case study organization aims to have a functional DevOps architecture and solution that support the deployment of their modelling platform to multi-cloud.

  • DRA Proof of Concept (POC) demo and presentation: To demonstrate the applicability of the DRA framework, a proof of concept presentation and demo (including video of the demo) were developed in alignment with the case study need. The demo successfully demonstrated the deployment of a pre-developed sample IoT application to a multi-cloud environment. This demonstration shows the applicability of the DRA for the case study organization.

    • Presentation Slides (anonymized): Link

    • Demo Video YouTube video: Link

Prepare

The case study template (see CST Link) was prepared to collect the DRA evaluation data from the case study organization.

Collect

An evaluation session was organized with the CPF case study organization. They appointed the DevOps lead from their organization, which has expert knowledge about DevOps. The CPF DevOps lead participated in this evaluation and provided the relevant feedback data about the applicability of the DRA to their organization. The total duration of the data collection, including presentation and demo, was approximately 120 min. The case study data was stored in CloudStor, which is the recommended cloud storage (Link). The following sections discuss the evaluation feedback.

DRAv2.0 Architecture

The DRAv2.0 architecture was presented to the case study organization CPF. DRAv2.0 is composed of 4 design architecture models: conceptual, logical, physical, and operational. The expert from the CPF reviewed the design and provided positive feedback with further opportunities for improvements. Feedback about the DRA architecture is provided below.

“It has been very well thought and process-driven. I think it would be excellent to include some controls which could be used in the case to re-deploy or even roll back to a previous version in an automated fashion”.

DRA Operational Model Pipeline

In this step, CST provides a checklist for the DRAv2.0 pipeline implementation. The participants may re-use the recommended tools-set or configure DRAv2.0 with other tools of choice (see CST Link). Feed-back about the pipeline is noted below.

“Tool used in Operation model pipeline are industry used tools and are an excellent choice for the DRA Operation Mode Pipeline.”

“Configuration template is easy to use and can be replicated.”

Software Component

DRAv2.0 can be configured to deploy software applications, including IoT applications, to a multi-cloud environment. The CST provides the participants with a demo application to test the DRAv2.0 architecture. The demo application source code can be accessed from the code repository using public access (see CST Link). The following feedback was received on the software components.

“Testing software component is functioning properly.”

Hardware Component

The CST provides information about the IoT-devices and network used to test the deployment of the IoT software application to multi-cloud. The IoT-network (see CST Link) is configured to provide proof of the concept of the DRA operation model and its applicability for deploying the IoT application to multi-cloud. However, one may use their own choice of IoT-devices and sensors to test the IoT application deployment using the DRAv2.0. The following feedback was received from the case study organization.

“Testing hardware component is functioning properly and responding appropriately.”

Overall Feedback

Based on the feedback, it can be suggested that DRA is fit for its intended purpose. Overall feedback about the DRA is noted below.

“Demo was easy to understand and well presented.”

“DRA framework would help organizations understanding DevOps methodologies and agile application deployment and delivery.”

Analyze

The case study data was analyzed and presented in Table 3. The analysis method is a cross-examination comparison between the feedbacks and the case study evaluation criteria in Table 2. This analysis aims to connect or relate evaluation criteria to the organization’s feedback. The output of the analysis is organized into three columns: “Interpretation,” the researchers' interpretation of the “participant feedback,” and “evaluation criteria.”

Table 3 Case study analysis
Table 4 Case study report

Report

The case study report aims to conclude the data about the DRA framework and its applicability. The report summary is presented in (Table 4).

Industry Survey

The survey (see “Appendix”) was used to evaluate the DRA from practitioners’ perspectives. Thus, data were collected from a broad audience [55] as opposed to a single organization case study. The survey was offered online to local and international industry experts. The survey was conducted using the following steps [57]:

  • Plan survey

  • Prepare the delivery method

  • Develop the survey questionnaires

  • Collect the data

  • Analyze the data

Plan

The plan is to obtain experts’ feedbacks about the DRA framework models. The survey (Link) was offered to participants and experts from the IT industry specialized in the areas of software engineering, DevOps, Cloud computing and architecture, and IoT.

Prepare

The survey was offered online (Link) to industry experts who were contacted using the author’s LinkedIn account. The survey was opened between January 2019 and June 2019. A total of 82 participants came from different companies located in Australia and several other countries. The demographic representation of the participants (Link) includes information about their professional experience, their organization location, and years of experience in their IT field.

Develop

The survey is composed of nine questionnaire-sets (Link) as follows:

  • Q1-Set: DRA contextual model (5 questions)

  • Q2-Set: DRA conceptual model (6 questions)

  • Q3-Set: DRA logical model (5 questions)

  • Q4-Set: DRA logical model features (9 questions).

  • Q5-Set: DRA physical model (5 questions)

  • Q6-Set: DRA operational model (8 questions)

  • Q7-Set: DRA usefulness feedback (2 questions)

  • Q8-Set: DRA suggested improvements (1 question)

  • Q9-Set: DRA overall feedback (2 questions).

Collect

The survey (Link) generated two types of data:

  • Quantitative Data: Represents the participants’ responses for each questionnaire-set (Q1-Set, Q2-Set, Q3-Set, Q4-Set, Q5-Set, Q6-Set, Q7-Set, and Q9-Set). The responses are transformed into numerical data using the rating table (see Table 1).

  • Qualitative data: Represents the participants’ feedback collected in questionnaire-sets Q7-Set, Q8-Set, and Q9-Set.

The collected survey raw data were stored on CloudStor (Link).

Analyze

The collected data from the survey were analyzed in two phases:

  • Quantitative data analysis: Presents a detailed analysis of the DRA design models based on the data collected from each questionnaire-set. The collected data are used to test if the DRA models meet the criteria in Table 2 (Coverage, Relevance, and Importance).

  • Qualitative data analysis: Presents the analysis of the quantitative data collected from (Q7-Set and Q9-Set). To complement the quantitative analysis, qualitative feedback was collected and analyzed for productive results and insights, including ideas for further improvement (Q8-Set).

Quantitative Data Analysis

The collected data were analyzed using the statistical formulas (see Eqs. 1, 2, and 3) to test if the DRA design models (contextual, conceptual, logical, physical, and operational) meet the evaluation criteria positively (see Tables 2, 5). The quantitative data analysis is composed of six statistical cards (Labeled Card[index], see Fig. 5). Each statistical card (see Fig. 5) includes the following items:

Table 5 DRA contextual model questions group
Fig. 5
figure 5

Design models analysis cards

Item 1:

Organize the questionnaire-sets according to their relationship with the evaluation criteria into tables labelled QT[index].

Item 2:

Collect and map the survey quantitative raw data into tables labelled RT[Index].

Item 3:

Arrange the RT[index] tables into category rating tables labelled CT[Index] based on the questionnaire-sets group in the QT[index] tables.

Item 4:

Plot the RT[index] tables into bar graphs labelled RF[index].

Item 5:

Calculate the statistical values AAP, AAF for all the RT[index] tables, and the Goodness of Fit χ2 for the CT[index] tables. (see Eqs. 1, 2, and 3).

Item 6:

Generate result summary reports labelled RS[index]. The results reports summarize the analysis of the DRA design models.

Contextual Model Analysis: Card [8]

RS1: Result Summary The contextual model evaluation results in Tables 6 and 7 can be interpreted as follows:

  • AAF = 370 out of total = 410 responses indicate that most of the participants agree that the DRA contextual model positively meets with test criteria.

  • AAP = 90.25% indicates that there is a high percentage of participants who agree that the DRA contextual model positively meets with test criteria.

  • Coverage p = 0.001 < α = 0.01. This result means that H0 is rejected, H1 is accepted, and the DRA contextual model seems to cover industry contextual design needs.

  • Relevance p = 0.001 < α = 0.01. This result means that H0 is rejected, H1 is accepted, and the DRA contextual model seems relevant.

  • Importance p = 0.001 < α = 0.01. This result means that H0 is rejected, H1 is accepted, and the DRA contextual model seems important.

Table 6 Contextual questionnaire raw data
Table 7 Contextual group data

The statistical values indicate that the participants consider the DRA contextual model relevant and essential design and that it covers the industry needs. Figure 6 illustrates the frequency of participants’ responses to add further visual insight into the results (Table 8).

Fig. 6
figure 6

Contextual data graph (RF1)

Table 8 DRA conceptual model questions group
Conceptual Model Analysis: Card [3]

RS2: Result SummaryTables 9 and 10 present the statistical analysis results based on the participants’ responses. The conceptual model evaluation results can be interpreted as follows:

  • AAF = 382 out of total = 410 responses indicate that most of the participants agree that the DRA conceptual model positively meets with test criteria.

  • AAP = 93.17% indicates that there is a high percentage of participants agree that the DRA conceptual model positively meets with test criteria.

  • The p value for the test variables:

    • Coverage p = 0.001 < α = 0.01. This result means that H0 is rejected, H1 is accepted, and the DRA conceptual model seems to cover industry design needs.

    • Relevance p = 0.001 < α = 0.01. This result means that H0 is rejected, H1 is accepted, and the DRA conceptual model seems relevant to the industry.

    • Importance p = 0.001 < α = 0.01. This result means that H0 is rejected, H1 is accepted, and the DRA conceptual model seems important for the industry.

Table 9 Conceptual questionnaire data
Table 10 Conceptual group data

The statistical values indicate that the participants consider the DRA conceptual model relevant and essential design and that it covers the industry needs. Figure 7 illustrates the frequency of participants’ responses to add further visual insight into the results (Table 11).

Fig. 7
figure 7

Conceptual data graph (RF2)

Table 11 DRA logical model design questions group
Logical Model Design Analysis: Card [13]

RS3: Result Summary

  • Tables 12 and 13 present the statistical analysis results based on participants’ responses. The logical model design evaluation results can be interpreted as follows:

  • AAF = 386 out of total = 410 responses indicate that most participants agree that the DRA logical model design positively meets with test criteria.

  • AAP = 94.14% indicates that there is a high percentage of participants agree that the DRA logical model design positively meets with test criteria.

  • The p value for the test variables:

    • Coverage p = 0.001 < α = 0.01. This result means that H0 is rejected, H1 is accepted, and DRA logical model design seems to cover industry needs.

    • Relevance p = 0.001 < α = 0.01. This result means that H0 is rejected, H1 is accepted, and the DRA logical model design seems relevant to the industry.

    • Importance p = 0.001 < α = 0.01. Hence, H0 is rejected, H1 is accepted, and the DRA logical model design seems important for the industry.

Table 12 Logical design questionnaire data
Table 13 Logical design group data

The statistical values indicate that the participants consider the DRA logical model design a relevant and essential design and that it covers the industry needs. Figure 8 illustrates the frequency of participants’ responses to add further visual insight into the results (Table 14).

Fig. 8
figure 8

Logical design data graph (RF3)

Table 14 DRA logical model functions questions group
Logical Model Features Analysis: Card [59]

RS4: Result Summary The numerical data in Tables 15 and 16 produced fundamental statistical values based on participants’ responses. The evaluation results can be interpreted as follows:

  • AAF = 711 out of total = 738 responses indicate that most of the participants agree that the DRA logical model features positively meet with test criteria.

  • AAP = 96.33% indicates that there is a high percentage of participants agree that the DRA logical model features positively meets with test criteria.

  • The p value for the test variables:

    • Coverage p = 0.001 < α = 0.01. This result means that H0 is rejected, H1 is accepted, and DRA logical model features seem to cover industry contextual design needs.

    • Relevance p = 0.001 < α = 0.01. This result means that H0 is rejected, H1 is accepted, and the DRA logical model features seem relevant to the industry.

    • Importance p = 0.001 < α = 0.01. Hence, H0 is rejected, H1 is accepted, and the DRA logical model features seem important for the industry.

Table 15 Logical features questionnaires data
Table 16 Logical features group data

The statistical values indicate that the participants consider the DRA logical model features relevant and essential design and that it covers the industry needs. Figure 9 illustrates the frequency of participants’ responses to add further visual insight into the results (Table 17).

Fig. 9
figure 9

Logical features data graph (RF4)

Table 17 DRA physical model questions group
Physical Model Analysis: Card [10]

RS5: Result Summary The numerical data in Tables 18 and 19 produced fundamental statistical values based on participants’ responses. The evaluation results can be interpreted as follows:

  • AAF = 625 out of total = 656 responses indicate that most of the participants agree that the DRA physical model positively meets with test criteria.

  • AAP = 95.28% indicates that there is a high percentage of participants agree that the DRA physical model positively meets with test criteria.

  • The p value for the test variables:

    • Coverage p = 0.001 < α = 0.01. This result means that H0 is rejected, H1 is accepted, and the DRA operational model seems to cover industry design needs.

    • Relevance p = 0.001 < α = 0.01. This result means that H0 is rejected, H1 is accepted, and the DRA operational model seems relevant to the industry.

    • Importance p = 0.001 < α = 0.01. Hence, H0 is rejected, H1 is accepted, and the DRA operational model seems important for the industry.

Table 18 Physical model questionnaires data
Table 19 Physical model group data

The statistical values indicate that the participants consider the DRA logical model features relevant and essential design and that it covers the industry needs. Figure 10 illustrates the frequency of participants’ responses to add further visual insight into the results (Table 20).

Fig. 10
figure 10

Physical model data graph (RF5)

Table 20 DRA operational model questions group
Operational Model Analysis: Card [6]

RS6: Result Summary The numerical data in Tables 21 and 22 produced fundamental statistical values based on participants’ responses. The evaluation results can be interpreted as follows:

  • AAF = 625 out of total = 656 responses indicate that most of the participants agree that the DRA operational model positively meets with test criteria.

  • AAP = 95.28% indicates that there is a high percentage of participants agree that the DRA operational model positively meets with test criteria.

  • The p value for the test variables:

    • Coverage p = 0.001 < α = 0.01. This result means that H0 is rejected, H1 is accepted, and the DRA operational model seems to cover industry contextual design needs.

    • Relevance p = at 0.001 < α = 0.01. This result means that H0 is rejected, H1 is accepted, and the DRA operational model seems relevant to the industry.

    • Importance p = 0.001 < α = 0.01. Hence, H0 is rejected, H1 is accepted, and the DRA operational model seems important for the industry.

Table 21 Operational model questionnaires data
Table 22 Operational model group data

The statistical values indicate that the participants consider the DRA operational model relevant and essential design and that it covers the industry needs. Figure 11 illustrates the frequency of participants’ responses to add further visual insight into the results.

Fig. 11
figure 11

Operational model data graph (RF6)

Qualitative Data Analysis

The qualitative data analysis is composed of two sections:

  • DRA usefulness (for teaching, research, and industry) evaluation [Q7-Set]

  • DRA overall feedbacks and ratings [Q9-Set]

DRA suggested improvements [Q8-Set] are used to indicate the future scope based on the participants’ feedback.

DRA Usefulness Feedback and Rating

This section presents the analysis of the participants’ responses about DRA from a usefulness perspective in the industry, teaching, and research. The evaluation process is as follows:

  • Collect and organize the feedback about DRA usefulness into Table 23.

  • Analyze Table 23 feedback based on the occurrence of criteria elements (see Table 2) in the text using a cross-examination method.

  • Collect DRA usefulness rating numerical data and organize it into Table 24 labelled RT7.

  • Plot Table 24 (RT7) data into a bar graph representation Fig. 12 labelled RF7.

  • Calculate the statistical values AAF and AAP from Table 24 (RT7) data:

    • AAP: determines the frequency of participants that consider DRA useful.

    • AAF: determines the percentage of participants that consider DRA is useful.

    • Calculate the Goodness of Fit χ2 and p value for the test variables (teaching, industry, research) at a critical value α = 0.01.

If p < α then the null hypothesis H0 is rejected, H1 is accepted.

H0: There is no association between the test variable (usefulness for teaching, industry, and research) and DRA design models.

H1: There is an association between the test variables, and DRA positively meets the test criteria.

Table 23 Q7-Set DRA usefulness feedback analysis
Table 24 DRA usefulness ratings
Fig. 12
figure 12

DRA usefulness ratings graph (RF7)

  • Results summary are labelled as RS [5]. RS [5] presents the analysis of the DRA usefulness feedback data.

RS7: Results Review Table 24 (RT7) statistical analysis results of (Q7-Set) responses can be interpreted as follows:

  • AAF = 229 out of total = 246 responses indicate that most of the participants agreed that DRA is useful for teaching, research, and at the industry level.

  • AAP = 93.09% indicates that there is a high percentage of participants that agree with the DRA usefulness for teaching, research, and industry.

  • The p value for the test variables:

    • Research p value is set at 0.001 < α = 0.01. This result means that H0 is rejected, H1 is accepted, and DRA models are related to usefulness for research.

    • Teaching p value is set at 0.001 < α = 0.01. This result means that H0 is rejected, H1 is accepted, and DRA models are related to the usefulness of teaching.

    • Industry p value is set at 0.001 < α = 0.01. This result means that H0 is rejected, H1 is accepted, and DRA models are related to the usefulness of the industry.

In Table 25, there is a total of T = 50 related references to the evaluation criteria (see Table 2). Table 23 criteria frequency distribution is mapped into Table 25 and interpreted as follows:

Table 25 DRA usefulness categories frequencies

The participants consider DRA models useful (28.00%), relevant (26.00%), and essential (24.00%). DRA design is generic (10.00%), but it provides sufficient coverage for industry needs (12.00%). Overall, it can be suggested based on the analysis of results that DRA meets the evaluation criteria set in Table 2. However, participants considered DRA more from usefulness, relevance, and importance perspectives when compared to generalization and coverage. Figure 12 adds further visual evidence that the participants consider DRA useful.

DRA Overall Feedback and Rating

This section presents the evaluation of the participants’ overall feedbacks and ratings about the DRA. The evaluation process is as follows:

  • Collect and map the feedbacks provided about the DRA into Table 26.

  • Analyze Table 26 feedback based on the frequency of evaluation criteria (see Table 2) relationship with the DRA aspects in the text using the cross-examination method.

  • Collect DRA overall rating and arrange it as numerical data into Table 27 labelled RT8.

  • Plot Table 27 (RT8) data into a bar graph representation Fig. 13 labelled RF8.

  • Calculate the statistical value AAP from Table 27 (RT8) data.

  • AAP: determines the frequency of participants satisfied with the DRA overall.

  • Results review labelled RS [1]. RS [1] presents an analysis of the results data.

Table 26 Q9-set overall feedback analysis
Table 27 Survey overall feedback rating (RT8)
Fig. 13
figure 13

Survey overall feedback graph (RF8)

RS8: Results Summary The numerical data in Table 27 (RT8) produced a principal statistical value based on participants’ overall responses. The DRA overall questionnaire Q9-Set showed that:

  • The participants consider the DRA satisfactory at AAP = 77%.

  • In Table 26, there is a total of T = 22 related references to the evaluation criteria. Table 25 criteria appearance frequency distribution is mapped into Table 28.

Table 28 DRA overall criteria occurrences

Table 28 shows that participants seem to consider the DRA useful (27.27%), and relevant (45.45%). This result indicates that participants seem to think the DRA relevant to the industry and may be useful for their organizations’ contexts. Overall, it can be suggested based on the analysis of results that DRA meets the evaluation criteria set in Table 2. Figure 13 adds further insights that overall, the participants consider DRA useful and relevant.

Key Insights: Summary and Analysis

This section presents the summary and analysis of the DRA empirical evaluation results. The purpose of this investigation is to evaluate the DRA using the industry case study and a field survey. The evaluation data is used to indicate the relationship between the DRA models and the evaluation criteria (Table 2). There are two types of indicative measures designed to determine the significance of the DRA.

  • The quantitative indicator matrix (QIM).

  • The qualitative evaluator matrix (QEM).

The QIM and QEM aim to provide indicative measurements and qualitative review to verify that the DRA meets the evaluation criteria (Table 2). To complement the quantitative analysis, qualitative feedback was collected and analyzed for rich results and in-sights, including ideas for further improvement.

The Quantitative Indicator Matrix (QIM)

The quantitative indicator matrix QIM (Table 29) is a collective of the numerical data results reported in the industry survey analysis. The collected data are the χ2 test, AAF, and AAP (see Eqs. 1, 2, and 3) in statistics report cards Card [3, 6, 8, 10, 13, 59] (see Fig. 5).

Table 29 Quantitative Indicator Matrix (QIM)

The QI formula is an average value that indicates the significance of DRA at a probability above 75%. The QI formula is described in (4) as follows:

QI definition: The Quantitative Indicator (QI) is the average probability (AAP) for every table source in chapter 5, the QIM table (see Table 29).

$${\text{The probability of a source table is}}: \, \left[ {{\text{Probability }}\left( X \right){\text{ or }}P\left( X \right)} \right] \, = {\text{ AAP}}\left[ {{\text{Table}}_{{({\text{index}})}} } \right]$$
(4)

QI = Σ [P(X)]row[i]/Count(row[i]); where (i = the row index in QIM and X = participants score ≥ 3).

QI objective: Determine the average probability of participants’ likely to agree with the data results from source tables (Table(index)). It indicates that the DRA models meet the evaluation criteria positively.

Specific QI condition: Participants agree with table results: if [AAF > 75%, p value < 0.01].

QI Result: QI = Σ[P(X) > 75%]row[i]/Count(row[i]) [Indicates the average probability of participants scoring ≥ 3 in a particular survey table].

Conclusion: QI > 75% then DRA meets the evaluation criteria positively.

Data Source: Tables 6, 7, 9, 10, 12, 13, 15, 16, 18, 19, 21, 22 and 24

The QIM (Table 29) aims to determine that there is a probability of 75% or above that the participants agree that the DRA meets the evaluation survey criteria in Table 2. Table 29 generates a probability indicator QI that determines the probability of a participant scoring three or above in the rating table (Table 1) in the survey.

Review QI = 93.56% indicates that the cohort of participants in the surveys seem to agree that DRA is significant at a condition [Probability > 75% and p value < 0.001]. Hence, it can be concluded that the participants seem to consider that the DRA framework is fit for its intended purposes. Overall, it can be suggested based on the (QIM) analysis of results that DRA meets the evaluation criteria set in Table 2.

The Qualitative Evaluator Matrix (QEM)

The quantitative evaluator matrix (QEM) is a collective of the feedback data gathered in “Industry case study” and “Industry survey”. Three data sources are feeding into the Qualitative Evaluator Matrix QEM (Table 30). The data are acquired from Tables 3, 23 and 26. To complement the quantitative analysis in QIM, the QEM feedback qualitative feedback was analyzed for rich results and insights support that the DRA meets the evaluation criteria (see Table 2). QEM (Table 30) shows the relationships between the evaluation criteria and the expert’s feedback.

Table 30 Qualitative evaluator matrix (QEM)

Review QEM (Table 30) indicates that the participants agree that:

  • DRA seems instantiable and easy to implement using various tools and technologies

  • DRA seems flexible and reconfigurable with other technology stacks

  • DRA includes DevOps tools integration that enables automated deployment to multi-cloud

  • DRA is based on high-level modelling that supports DevOps concepts and practices

  • DRA seems to enable DevOps concepts, multi-cloud services and support IoT process.

  • DRA seems to support DevOps culture and human factor

  • DRA is a comprehensive architecture that may support digital transformation

  • DRA seems to provide new knowledgebase about DevOps approach adoption.

  • DRA seems to provide a fast and straightforward path to software production in Agile.

  • DRA is generic and applicable to a class of situations.

Future Scope

This section evaluates the participants’ responses regarding suggested improvements to the DRA at the industry level. The feedback was in response to the Q8-Set. Feedback to the Q8-Set provided vital suggestions to improve further the DRA, as well as valuable ideas that may be considered as future research projects or DRA upgrades. The evaluation data results for the Q8-Set are presented in Table 31, which is organized as follows: the participants’ suggestions column, which contains the participants’ feedback and comments; and the interpretations column, which includes the researcher’s answers to suggestions. In this column, the researcher identifies the key ideas considered for the possible future scope of the research.

Table 31 Future scope and suggested improvements

Discussion

Most recently, DevOps has emerged to deal with this vital concern by enabling the integration of development and operations abilities to complement the current agile approaches [1, 2]. DevOps approach provides developers with concepts, practices, and tools to enable automation, continuous integration, and fast deployment and delivery on the cloud [5, 58]. This paper discusses the evaluation of the proposed DRA framework to deploy IoT-applications to multi-cloud using DevOps [8]. The DRA was constructed in previous research [8] using a well-known design science research methodology [12]. The DRA architectural model provides agile teams with real-time monitoring, team collaboration, automated deployment to multi-cloud capability to the application at run-time using the DevOps approach and cloud-tools and services.

The DRA seems to offer developers a research-based and practical architecture driven in-depth approach to plan, analyze, architect, and implement the DevOps automation environment to support the deployment stage of software engineering for multi-cloud IoT-applications. To provide sufficient proof of the applicability and novelty of the DRA, this paper presents the results of the empirical evaluation of the DRA architectural models [13, 59], which is an essential step for developing a novel Agile-DevOps theory. The empirical evaluation is composed of two main sections: (1) industry case study, (2) online field survey.

First, the industry case study used a CST (Link) to provide instructive guidelines for DRA implementation at the CPF organization. The feedback obtained from the DevOps engineer expert at CPF were analyzed in Table 3 and were reported in Table 4. Table 4 results indicate that DRA is useful, reusable, and covers the organization's needs.

Second, the industry survey (Link) used in this paper was offered online using the author’s LinkedIn to industry experts from organizations located locally and internationally. The survey data analysis is composed of two phases: quantitative data analysis and qualitative data analysis. The RT [index] and CT[index] tables produced key statistical values (AAP, AAF, and χ2 p value).

Third, an overall summary and analysis of the results were presented for crucial insights. The QIM (Table 27) produced a probability QI = 93.56%, which means that participants in the case study and survey agree that DRA is fit for its intended purposes. The QEM (see Table 30) demonstrates that the participants consider the DRA architectural models fit for its intended purposes. The QEM indicate that the DRA positively meets the evaluation criteria in Table 2. Hence, based on this evaluation results, it can be suggested that the DRA framework is a practical and applicable approach for IoT-application deployment to multi-cloud. Also, it has been founded during this research that the new DRA, the CI-Broker concept, may also be used for other non-IoT applications. The empirical evaluation results provide sufficient information about the usefulness and applicability of the DevOps adoption for software development.

This study does not claim to offer complete knowledge of the integration of the three contexts [DevOps, Multi-Cloud, and IoT]. However, it is anticipated that the results of this study will provide sufficient consolidated and synthesized information and insights to practitioners and researchers and enable them to make informed decisions about the adoption of DevOps for IoT on the cloud. Further, it will provide a strong foundation, grounded on empirical evaluation results, for developing the theories in this vital area of research.

Conclusion

The integration of the Development and Operations is itself a complex subject, and it becomes more challenging when it is associated with the emerging IoT and multi-cloud contexts. This paper presented the evaluation of one such framework, DRA. The proposed DRA has been developed using a well-known design science research methodology.

To conclude the research, this paper focused on the evaluation phase of the DRA. Thus, the DRA has been evaluated using the industry case study and survey. Based on the evaluation results, it was determined that DRA is an appropriate framework for architecting and implementing the DevOps for the automated deployment of IoT applications to the multi-cloud. This paper is a crucial contribution to the development of a theory in this vital area of research. Further, this research demonstrates how to systematically evaluate a complex artifact, such as the DRA, using the DSR evaluation criteria. Finally, this research can be extended to include a future investigation into the areas of DevSecOps security and DataOps.