1 Introduction

During the last few years, we have witnessed a massive increase in the number of devices connected to the Internet. In particular, it is expected that the number of connected devices will be more than three times the global population by 2023. Indeed, Machine-To-Machine (M2M) connections will correspond to half of such connected devices (up to 14.7 billion M2M connections) in that year [7]. This deployment has fostered the deployment of the Internet of Things (IoT) paradigm, a comprehensive network of intelligent objects that have the capacity to automatically organize, share information, data, and resources, react and act in the face of situations and changes in the environment [35]. This has led to an exponential growth in the amount of data traffic flowing through the network.

The computation of shared data can be intensive and can usually not be completed by the IoT devices themselves, due to limited resources (memory, battery, etc.) [2]. The deployment and exploitation of the cloud paradigm has helped companies address this capacity limitation by offloading intensive computing tasks to the cloud. However, this offloading has a penalty on the quality of service (QoS) offered, such as the increase in latency imposed by the distance between the cloud and the end devices, the network overhead, and the increase in the security and privacy risk that this offloading entails.

To reduce this penalty, over the last few years, the edge computing paradigm has been proposed. Edge computing allows the offloading of the computing task to nodes closer to the end devices (at one hop from them). Therefore, these tasks are closer to the source of data and the consumer of data, increasing the quality of service offered.

However, the exact definition and coverage of edge computing are unclear. Some researchers indicate that edge computing addresses the offloading of computing tasks from the cloud to the last hop before smart devices, others indicate that it only covers the set of devices at one hop from end devices, some also include IoT devices, etc. [60]. In addition, different researches recommend the application of edge computing for different domains, applications with a strict response time, for those who want to reduce the infrastructure cost or to improve the privacy management. Therefore, there is currently a lack of consensus on the specific coverage, targeted domains, and benefits of edge computing.

For this paradigm, to have an adoption by the industry similar to that of other proposals, such as cloud computing, all stakeholders should share a common vision, clearly knowing the different elements and concepts involved in it, what are the main problems that allow them to address, what are the main benefits provided, and for which domains.

This paper presents an analysis of the vision of edge computing in the industry. To this end, 29 international companies have been surveyed to identify what they understand by edge computing, what problems it addresses, and what benefits it provides. This analysis allows us to bring the vision of the business world closer to the academy, allowing both of them to better focus their efforts to improve its adoption.

The remainder of the paper is structured as follows. Section 2 analyzes the main concerns that led us to carry out this study. Section 3 is devoted to the research methodology applied in this work. The development and final description of the created theory on IoT edge computing is presented in Sect. 4. In Sect. 5 we analyze the threads to validity of this theory and its limitations. A description of existing work related to our proposal is provided in Sect. 6, information that is complemented by the information shown in B. Finally, Sect. 7 draws the main conclusions of this work.

2 Motivation

Cloud computing is an architecture based on accessing centralized computing resources ubiquitously and on demand by making use of the network. This paradigm was standardized by the National Institute of Standards and Technology (NIST). The concept of cloud computing is to have hugely powerful servers in data centers connected to the network. The resources of cloud servers are then virtualized and offered to clients. Cloud computing has been the de facto solution over the past decade. One of the main reasons for its rise and wide adoption is the clarity and common vision that the entire industry has about the advantages and disadvantages that it provides [22]. For Edge Computing (EC), and other similar paradigms, to be successfully incorporated by the industry, it is necessary for them to clearly share its vision and the benefits it brings.

Specifically, there are some key requirements of QoS-stringent IoT applications that cannot be met only by applying a pure cloud paradigm, such as response time, cost, sensitivity, data volume, bandwidth limitation, resilience, etc. [27]. The Internet Research Task Force (IRTF) in its draft on IoT Edge Challenges and Functions [31] concluded that these limitations should be overcome by applying EC. The basis for EC is to use different computing devices closer to the end user to distribute the workload of the application to them [36]. However, there is no consensus on the definition of this paradigm in the literature, nor on the border with other closely related paradigms, which jeopardize the adoption of these paradigms by the industry [60].

ISO defines the term Edge Computing (EC) as “a form of distributed computing in which processing and storage takes place on a set of networked machines which are near the edge, where the nearness is defined by the system’s requirements” [30]. Nevertheless, the NIST also introduces a couple of closely related terms, Fog Computing (FC) and Mist Computing (MC), which makes it more difficult to clearly identify the borders of EC and its benefits. FC is defined by the OpenFog Consortium as “a horizontal system-level architecture that distributes computing, storage, and networking functions closer to the user along a cloud-to-thing continuum” [20]. Moreover, FC can be multitiered: fog nodes do not need to be placed at a single point or a single network tier and can be placed on multiple tiers. Instead, MC is “a lightweight computation distribution proposal that resides directly at the edge of the network fabric, bringing the FC layer closer to the smart end devices”.

Different proposals have been proposed with concrete nuances to support these paradigms. For instance, the ETSI’s Multi-access Edge Computing (MEC) [9]. MEC is an architecture based on the provision of computing and storage resources closer to users, similar to FC, but with resources directly connected to a 4G or 5G base station. In addition, the concept of Cloudlet proposes to bring cloud servers to the edge, instead of computing and storage resources closer to the edge with intermediate devices [49]. Aside from the previously defined architectures, some works present different proposals contributing to the EC. In [4], the authors present Mobile Edge Clouds, a three-tier architecture for IoT that contains IoT devices, a middle tier that contains mobile devices, which can run services, and the cloud. During the execution time, the bottom tier requests a service to the middle tier, which can be executed either within the middle tier or it can be offloaded to the top tier. The osmotic computing platform is another proposal that is also closely related to edge computing [56]. Osmotic computing bases its model on microelements. These microelements are deployed and provisioned on the basis of the concept of osmosis: initially, microelements are deployed in the cloud. As requests for microelements arrive at the infrastructure, the osmotic computing platform is in charge of detecting the requirements from the requests. If these requirements require that the microelement be deployed on the edge, the platform automatically provisions them at the appropriate edge node and maintains them as long as required.

Regarding MC, some works define different proposals detailing its own architecture [42, 53], while others define it as a layer of fog or edge computing [28]. Thus, although it is possible to combine MC with other architectures [28, 42, 53], it is not clear whether such a combination is allowed by design or not. Other architectures aim at combining different architectures into a single one by merging their proposals and concepts. This is the case of [3], which proposes a combination of MEC and FC. In the literature, EC in some cases is described as a concept implemented in FC, MC, or MEC as described by K. Dolui et al. [11]. In other cases, it is used interchangeably as pointed out by [60].

Therefore, there is a plethora of approaches that define frameworks to apply the edge computing paradigm. However, each approach addresses the paradigm from a different point of view, defines different elements, and tries to meet different goals and challenges. Therefore, there is no clear and unanimous agreement on what EC is, which is crucial for its adoption in the industry, and more importantly, this view should come from the industry to share a common language and better transmit the benefits of this paradigm.

3 Research methodology

This paper presents empirical research on practicing IoT edge computing. It is based mainly on the constructivism model as an underlying philosophy (epistemological and ontological positions) [12]. Constructivism or interpretivism states that scientific knowledge cannot be separated from its human context and that a phenomenon can be fully understood by considering the perspectives and the context of the involved participants. Therefore, the most suitable methods to support this approach are those collecting rich qualitative data, from which theories (tied to the context under study) may emerge.

One of these methods is Grounded Theory (GT), which aims at the iterative development of a theory from qualitative data [16] and encourages deep immersion in the data [46]. “In grounded theory, initial analysis of the data begins without any preconceived categories. As interesting patterns emerge, the researcher repeatedly compares these with existing data, and collects more data to support or refute the emerging theory” [12]. Thus, GT is adequate for our purposes, and according to our philosophical stance, we used Constructivist Grounded Theory (aka Charmaz’s GT variant [6]). Specifically, we applied a novel process that extends the Charmaz GT variant to allow multiple researchers to participate in the coding process, i.e., collaborative coding, while ensuring consensus on the constructs that support the theory and, thus, improving the rigor of qualitative research (cf. [10]). To this end, we conducted an analysis of the intercoder agreement (ICA) to measure the extent to which different raters assign the same precise value (code or category) for each item being rated (qualitative data item or quotations) [14].

GT allows in-depth analysis of the phenomenon to be studied, that is, the perception that companies have of IoT edge computing. Once the data had been collected through a survey, GT is the methodology (in the field of qualitative research) that seemed most appropriate. Others, such as ethnographic or action research, require the researcher to enter the context to be studied for a prolonged period of time, which is unfeasible for this study due to the circumstances surrounding it, that is, software companies jealous of the way they work. We did not find appropriate to use focus groups (given that the individuals work for different organizations) or content/temathic analysis (subsumed in GT but which do not generate any theory).

3.1 Initial research questions

We began by asking what the industry thinks about what IoT edge computing is, and the expected benefits and challenges associated with this paradigm.

3.2 Data collection

GT involves iteratively performing interleaved rounds of qualitative data collection and analysis to lead to a theory (e.g., concepts, categories, patterns) [47]. The selection of participants is also iterative and can be considered a combination of “convenience sampling” as we are restricted to organizations and relevant stakeholders to which we had access; “theoretical sampling”, in the sense that we chose which data to collect based on the concepts or categories that were relevant to the emerging theory, i.e., data from organizations that have been adopting IoT edge computing; and “maximum variation sampling”, in the sense that we tried to choose highly diverse people and organizations in our sample, strengthening the transferability of our theory.

According to the purposed sampling strategy, we initially collected data from a set of participants from several leading international organizations in the Internet of Things domain, which are currently committee members of the Master’s Degree in Distributed and Embedded Systems SoftwareFootnote 1 and Master’s Degree in IoTFootnote 2 at Universidad Politécnica de Madrid (Spain), and international industrial contacts of the Universidad de Extremadura (Spain). Then we moved on to theoretical sampling and iteratively collected more data based on the concepts or categories that were relevant to the emerging theory until the ICA value exceeded a given threshold and the theoretical saturation was reached. Table 1 lists the organizations involved in the study, their ID, scope (international or national), sizeFootnote 3, business core, and role and experience of the respondent. A total of 29 responses were collected from a open-ended questionnaire available in https://es.surveymonkey.com/r/PMWD7ZM.

Table 1 Description of organizations

3.3 Qualitative data analysis

GT is a technique for iteratively developing theory from qualitative data [16] that encourages a deep immersion in the data [46]. “In grounded theory, initial analysis of the data begins without any preconceived categories. As interesting patterns emerge, the researcher repeatedly compares these with existing data and collects more data to support or refute the emerging theory” [12]. To conduct a constructivist GT, we will follow the following steps: initial/open coding, selection of core categories, selective coding, sorting, theoretical coding, and write-up. These steps are detailed in the next section.

4 A Theory on IoT edge computing

This section describes a theory on how the IoT industry perceives the edge computing paradigm, as well as the benefits they expect from adopting this paradigm and the challenges they face. To analyze the data from the survey responses of 29 companies and construct the theory, we followed the steps described in the previous section. It is important to keep in mind two concerns: i) the theory to be developed is a substantive theory; and ii) the theory is about how companies perceive the IoT edge computing paradigm and not so much how the paradigm has been defined in other scientific sources and/or standards.

4.1 Initial/Open coding

This activity aims to discover the concepts underlying the data and instantiate them in the form of codes. Thereby, at each iteration of the open coding, n documents of the survey are analyzed, that is, chopped into quotations that are assigned to either a previously discovered code or a new one that emerges to capture a new concept.

4.1.1 Iteration 1

In the first iteration of the open coding process, researchers R1, R2, and R3 analyzed 6 documents. R1 created a codebook with 29 codes that was subsequently refined by R2 and R3. As a by-product of this process, 40 codes were discovered and divided into 7 semantic domains (denoted by S1, S2,..., S7) (see Table 2).

Table 2 Domains and Codes resulting from Open Coding - Iteration 1

After completing the coding process, Krippendorff’s \(\alpha \) coefficients [26, 32] were computed (see also [18] for a thorough introduction to these techniques). Specifically, \(Cu\text {-}\alpha \)Footnote 4 and \(cu\text {-}\alpha \)Footnote 5 coefficients were computed and their values are shown in Table 3. As we can observe from this table, the value of the global coefficient \(Cu\text {-}\alpha \) did not reach the acceptable threshold of 0.8, as fixed in the literature [32]. For this reason, a review meeting was necessary to discuss disagreements and the application criteria of the different codes. The results of this meeting are documented in the disagreements diary file of the open coding folder in the public repository.

Table 3 Values of the different Krippendorff’s \(\alpha \) coefficients in the iteration 1 of the open coding. In bold, the values above the acceptability threshold (\( \ge 0.80\))

To highlight problematic codes, we used the coefficients \(cu\text {-}\alpha \) calculated per semantic domain. For Table 3, we observe that domain S3 had a remarkably low value of the coefficient \(cu\text {-}\alpha \). A thorough look at the particular codes within S3 shows that this domain includes codes related to the functionality of the system. This is particularly a fuzzy domain in which several concepts can be confused. During the review meeting, clarifications about these codes were necessary to avoid misconceptions. After this, a new codebook was released. In this new version, memos and comments were added, and a code was removed, so 39 codes (and 7 semantic domains) proceeded to the second iteration of the open coding.

4.1.2 Iteration 2

Researchers R1, R2, and R3 analyzed six other documents. Since the coders agreed on a common codebook in the previous iteration, we can expect a greater agreement that materializes as a higher value of ICA. As a by-product of this second iteration, 8 new codes arose, leading to a new version of the codebook with 47 codes and 7 semantic domains (see Table 4).

Table 4 Domains and Codes resulting from Open Coding - Iteration 2

The ICA values for this second iteration are shown in Table 5. From the results of this table, we observe that after this refinement of the codebook, \(Cu\text {-}\alpha \) reaches the acceptable threshold of agreement. In this way, the open coding process can stop: There exists consensus in the interpretation of the codes presented in the codebook, and we can proceed with the selection of core categories and selective coding.

Table 5 Values of the different Krippendorff’s \(\alpha \) coefficients in the iteration 2 of the open coding. In bold, the values above the acceptability threshold (\( \ge 0.80\))

4.2 Selection of core categories

In this activity, R1 and R2 selected the core categories, that is, the most relevant codes from the 47 codes obtained in the open coding. To this end, we focused on the groundedness of the codes and semantic domains (i.e., the number of quotations coded by a code) and the density of the codes and semantic domains (i.e., the number of relationships between codes, that is, the cooccurrence of codes in the same quotation). Table 6 shows these values. The detailed analysis is documented in the selection of core categories file of the selection of core categories folder in the public repository. As a result of the analysis, four semantic domains (S1, S2, S3 and S6) and 29 codes were selected for the next activity. This codebook is available in the selection of core categories - codebook file of the public repository.

Table 6 Groundedness of the codes per semantic domain

4.3 Selective coding

This is an inductive-deductive process in which new data are labeled with the codes of selected categories (semantic domains). The coders only focused on the core categories, but the number and definition of their inner codes were modified according to the analysis of new data. The researchers R1, R2 and R3 analyzed 6 documents using S1, S2, S3 and S6, which comprise a total of 29 codes. After coding, 9 codes were added to the codebook, representing a total of 38 core codes (see Appendix A). The results of the ICA coefficients obtained after coding are shown in Table 7.

Table 7 Values of the different Krippendorff’s \(\alpha \) coefficients in the selective coding phase. In bold, the values above the acceptability threshold (\( \ge 0.80\))

As we can observe from this table, the value of \(Cu\text {-}\alpha \) reached the acceptable reliability threshold of 0.8. This evidences that there exists a consensus among the coders on the meaning and limits of the codes within the core categories. Additionally, the coders also agreed that adding new data did not lead to new information, so the theoretical saturation had been reached. Therefore, since after this first iteration, the value of \(Cu\text {-}\alpha \) was compelling and the coders agreed that the theoretical saturation had been reached, the GT process could proceed to the next activity.

4.4 Sorting procedure

From the analysis of the memos together with the co-occurrence tables, we drew the relationships between the different categories (see Fig. 1). The core categories are boxed, while the font size of each category, as well as the thickness of the lines that relate them, correspond to the groundedness of semantic domains and the density of codes, respectively.

Fig. 1
figure 1

Relations between Categories

Fig. 2
figure 2

Scope of the theory

4.5 Theoretical coding

Theoretical coding is defined as “the property of coding and constant comparative analysis that yields the conceptual relationship between categories and their properties as they emerge” [17]. According to Gregor’s taxonomy [19], we develop an analytic theory: “Theories of this type include descriptions and conceptualizations of ’what is”’. Taxonomies, classifications, and ontologies, as defined by Gruber [21], are also included. In fact, Gregor says “Some examples of grounded theory can also be examples of Type I theory, where the grounded theory method gives rise to a description of categories of interest.”. Type I refers to “Analytic theories analyze ’what is’ as opposed to explaining causality or attempting predictive generalizations”. These types of theory are valuable when little is known about the phenomena they describe. This is the case for the edge computing paradigm, which is relatively new. That is, the theory to be built will answer “What is edge computing?” We do not attempt to answer questions related to “why edge computing is used” (explanation theory), nor do we intend to develop mathematical/probabilistic models to support predictions (prediction theory), nor do we intend to describe “how to do” things (design and action theory or prescription theory).

To develop the theory, we follow the following steps, which are thoroughly described in the following sections.

  • Determining the scope of the theory

  • Defining the constructs of the theory

  • Defining the propositions of the theory

  • Providing explanations to justify the theory

  • Testing the theory

4.5.1 Determining the scope of the theory

Figure 2 shows, using UML 2.0, the elements and relations that determine the scope of this theory. We describe the theory scope through four archetype classes: Actor, Technology, Activity, and Software_System (“An actor applies technologies to perform certain activities on an (existing or planned) software system” [52]. The four archetype classes have been represented as abstract classifiers (classes), and the relationships between these archetypes are as stated in [52]. We added several classifiers (subclasses) to indicate that the activities of IoT edge systems are performed on the edge side (see the enumerated type Location). Specifically, the subclass IoT_Edge_Computing represents a technology understood as a set of skills, techniques, methods, and processes, all specialized for the IoT edge computing paradigm. The subclass Activity_IoT_Edge represents an activity performed at the edge; in addition, this class has been declared as active, since, by its very nature, its instances will have their own control flows. Finally, the subclass IoT_Edge_Software_System represents a software system in the IoT Edge computing domain.

Since the construction of the theory was based on the qualitative analysis of data obtained from a set of surveys of 29 companies in the sector, this theory must necessarily be limited to a substantive (local) theory, as opposed to what could be a formal (all-inclusive) theory. However, the scope of the theory will be revealed during the testing phase.

4.5.2 Defining the constructs of the theory

The constructs have been derived from the codes associated with the core categories. The codes of each core category are described in Appendix A. Table 8 shows the constructs and the code(s) from which they are derived.

Table 8 Building constructs from codes

Figure 3 shows the relations between the elements of the scope and the constructs. Regarding this figure, class Device_in_the_Edge and its subclasses represent the constructs C1 to C4. The classes Sensor and Actuator represent the construct C5. Distributed_Architecture represents the construct C6. The classes Advantage and Problem represent constructs C7 and C8, respectively. Construct C9 (represented by the classifier Activity_IoT_Edge) is an artifice (it is not central to the theory, although it is part of the scope of the theory) that allows us to establish two levels of abstraction in the operations performed by an IoT_Edge_Software_System. These high-level operations generate the benefits of the IoT_Edge_Computing technology. For example, the storage and analytics in the Device_in_the_Edge, and the filtering and artificial intelligence techniques enable local data processing and avoid sending raw data to fog/cloud, gaining better bandwidth throughput. Table 9 relates the constructs to the classifiers in Fig. 3.

Fig. 3
figure 3

Constructs & scope of the theory

Table 9 Relations between constructs and classifiers

4.5.3 Defining the propositions of the theory

The propositions of the theory are derived from the relationships between the constructs that make up the theory. In this sense, we extract the relationships described in Fig. 3 as propositions. We characterize each proposition by means of three elements: i) the actual textual statement of the proposition; ii) its formalization by means of the OCL language; and iii) an excerpt, as an example, obtained from the surveys that mention this relationship. The code of each proposition is composed of the letter P, followed by an order number and optionally in square brackets the classifiers to which it relates, a comma character, and the constructs. Additionally, we use a hyphen to indicate several items of a range and & for a sequence of items. Thus, the code P1 [1 &6, C1-C4 & C9], refers to Proposition 1, which states the relationship between classifiers 1 and 6 (Device_in_the_Edge and Activity_IoT_Edge) that support the relationship of constructs C1-C4 and C9.

P1 [1 &6, C1-C4 & C9]. A device located at the edge (i.e., an instance of one of the subclasses of the class Device_in_the_Edge) /participates in the execution of one or more Activity_IoT_Edge. Since classifier 6 (see Table 9) is an artifice, there is no excerpt to support it. The OCL syntax is as follows:

figure a

P2 [1 &6, C1-C4 & C9]. An Activity_IoT_Edge invokes the operations of one or more instances of one of the subclasses of Device_in_the_Edge to carry out its responsibilities. Since classifier 6 (see Table 9) is an artifice, there is no excerpt to support it.

figure b

P3 [1 &3, C1-C4 & C6]. A device located at the edge (that is, an instance of one of the subclasses of the class Device_in_the_Edge) participates in a Distributed_Architecture. The OCL syntax and excerpts are as follows:

figure c
figure d

P4. An IoT_Edge_Software_System has one and only one architecture, and this architecture is unique. This relationship is established by knowledge of the problem domain: every software system has an associated architecture, whatever its type. The OCL syntax is as follows:

figure e

P5 [4 &6, C9 & C7]. The execution of Activity_IoT_Edge instances generates advantages defined by the enumeration type named Benefit. The OCL syntax and excerpts are as follows:

figure f
figure g

P6 [1 &5, C1-C4 & C8]. IoT_Edge_Computing technology (represented by the IoT_Edge_Computing classifier) has a number of problems (defined by the Challenge type) to be solved, the solution of which would bring new benefits (NewAdvantage classifier). The OCL syntax and excerpts are as follows:

figure h
figure i
figure j

In both cases, reference is made to different challenges faced by using IoT Edge technology, such as security, scalability, vendor lock-in, and deployment time.

P7 [1 &2, C1-C4 & C5]. An instance of type Device_in_the_Edge (or any of its subtypes) is connected to an arbitrary number of instances of the types Sensor and Actuator. The OCL syntax and excerpts are as follows:

figure k
figure l
figure m

P8. The use of Edge computing techniques (represented by the IoT_Edge_Computing classifier), such as the use of containers, faces some of the challenges (described in the enumeration type Challenge), such as scalability. The OCL syntax and excerpts are as follows:

figure n
figure o
figure p

P9. The use of Edge computing techniques (represented by the IoT_Edge_Computing classifier), such as downlinks of wireless communication networks, enables some of the benefits (described in the enumeration type Benefit) such as save energy.

figure q
figure r

The rest of the associations, indicated in Fig. 3 (other than those of generalization), refer to the scope of the theory (Fig. 2) and are, therefore, outside the scope of the propositions. For example, the relations “define” and “execute”. The generalization (inheritance) relations are implicitly shown in the propositions listed above (indicated by the OCL expressions relating to navigation between classifiers).

4.5.4 Providing explanations to justify the theory

An explanation is a relation between constructs and other categories that are not central enough to become constructs. The code of each explanation comprises the letter E, followed by a number referring to its order, and, optionally between brackets, the number of the proposition related to the explanation separated by hyphens. In this way, the code “E1 [1-2]” refers to Explanation 1 about Propositions 1 and 2.

E1 [1-2]. An activity (instance of the classifier Activity_IoT_Edge) may involve several instances of the classifier Device_in_the_Edge and one of the latter may intervene in several activities. These activities are high-level operations whose results are sensible to being analyzed to measure the benefits of this paradigm. However, it is complicated to analyze these benefits in operations of a smaller scope carried out by a single type of device.

E2 [3]. The very nature of an IoT application makes it a strong candidate to be based on an architecture with distributed and interconnected elements. In this architecture, many types (instances of the subclasses) of Device_in_the_Edge may appear arbitrarily.

E3 [4]. It may seem that the same architecture can support different IoT edge software systems. However, this rarely occurs since the number and types of components involved are typically characteristic of a particular system. However, several IoT edge software systems may share a reference architecture (comprised of a reference model and an architectural style).

E4 [5]. The explanation of some of the benefits captured by the enumeration type Benefit is the following (it is an abductive reasoning):

  • E4.1 “better_bandwidth_throughput”. Since part of the post-processing of the data ingestion process is done locally, a large amount of bandwidth is saved by transmitting only the data in “cooked” format instead of “raw” format.

  • E4.2 “better_performance”. If all the context (elements needed to carry out a computation) is saved locally, then much time is saved in service requests to other computational nodes, leading to an increase in performance.

  • E4.3 “better_user_experience”. Since the performance of the system has been improved, better response times may be expected to user queries, leading to an improvement of the quality of the user experience.

  • E4.4 “customer_has_the_control_of_his/her_data”. When we transmit the data in “cooked” format, the customer retains control of the “raw” data that were generated in the IoT on the Edge devices and were not sent through the network.

  • E4.5 “greater_efficiency”. We should understand efficiency as the fundamental reduction in the amount of wasted resources that are used to produce a given number of goods or services. In other words, to produce the same results, fewer bandwidth requests and service requests to remote notes are needed.

  • E4.6 “less_cloud_overhead”. Since a large amount of processing is done locally, the cloud is not responsible for this task.

  • E4.7 “less_response_time”. This is strongly related to the time invested in communications. When we reduce this time due to local processing, we also decrease the response time observed by the user.

  • E4.8 “lower_latency”. The latency is related to the use of the network. If remote service requests are needed, we must send the request through the network and wait for a response from the server. These lead to an increase in the waiting time to get a response, i.e. the latency of the net. In this manner, the fewer service requests that are issued, the lower the global latency observed.

E5 [6]. The explanation of some of the benefits captured by the enumeration type Challenge is the following (it is an abductive reasoning):

  • E5.1. “complexity”. The complexity of these systems is determined by: i) the heterogeneity of the devices to be connected, regarding their properties and functions, but also in the definition of their interfaces; ii) the requirements of real-time operation; iii) the costs of developing and maintaining the system to achieve a permanent operation; iv) the financial and human consequences of a malfunctioning of the system.

  • E5.2 “deployment_time”. The time needed to deploy the system in production environments must be as low as possible if we want to compete with similar products. This also implies that we must deploy new functionalities and fix errors quickly. The complexity of these systems, as pointed out previously, as well as the necessary automation of the CI and CD process, requires a continuous effort to update on the air infrastructure (hardware and software) to exploit the potential of its new functionalities.

  • E5.3 “latency”. Under strict real-time conditions, the latency of the network remains an issue. Maybe the 6G system will cushion this problem, but removing it is quite unlike. The higher the traffic flow, the higher the expected demand. This phenomenon is analogous to the well-known problem with RAM memory, in which programs tend to occupy all the available space.

  • E5.4 “maintenance_cost”. This cost refers not only to hardware infrastructure (devices and networks to be maintained), but also to the software infrastructures that have to be updated and the applications that require more and more resources.

  • E5.5 “reliability”. These types of applications often have strong requirements on the expected reliability. In this context, reliability must be understood as the “degree to which a system, product, or component performs specified functions under specified conditions for a specified period of time” [29]. The lack of this attribute may jeopardize customers and their resources. However, to get reliability, one must balance cost and risk. It is not possible, in very complex systems, to achieve a reliability of 100%, but reaching levels close to this value is feasible.

  • E5.6 “scalability”. In general terms, a system is scalable if it can grow to adapt to new and more exigent demands of service, without requiring a change of architecture and only increasing the invested amount of resources. For instance, an intelligence system for agricultural tasks is scalable if it can be adapted to new croplands (with a new area to be screened with new types of crops) by only increasing the number of resources (devices, communications) without altering the architecture or the implementation.

  • E5.7 “security &privacity”. In IoT systems in domains such as health, the privacy of the used data and the mechanisms applied to meet these constraints are crucial for the success of the system.

  • E5.8 “time_to_market”. The speed with which a new version of an IoT edge software system is released is crucial to the survival of any organization.

E6 [7]. A device located at the edge will be connected with sensors (to obtain data from the context) and actuators (to modify the context). The device is fed with sensor data, processes them locally or remotely, and uses the results to command the actuators.

E7 [8-9]. The use of techniques from the IoT Edge computer domain (containers, virtual environments -machines, networks, servers, downlinks of wireless communication networks, orchestration coordination) allows developers to address problems like scalability and to obtain benefits like saving energy or response time.

4.5.5 Testing the theory

The last step of the theory-building process involves examining the validity of the theory. To this aim, we examine the following elements:

  1. 1.

    The data from the surveys not used in the previous steps to contrast how the theory fits to the new data.

  2. 2.

    The standard ISO/IEC TR 30164 (Internet of things (IoT) - Edge computing) to validate the alignment of the theory developed with this standard.

  3. 3.

    The clarity and precision of the elements that are part of the theory.

  4. 4.

    The extent to which a theory has been validated.

  5. 5.

    The scope of the theory.

Analysis of the remaining surveys. The remaining 11 surveys were analyzed to test whether the propositions established in Sect. 4.5.3 are aligned or contradict the data contained in the surveys. Recall that, as the previously analyzed surveys, only the answers to questions 7, 10, 13, 14, 15, 16, 19, and 21 were parsed.

This analysis confirms that no new constructs emerged, apart from those described in Sect. 4.5.2, no new relations were needed and therefore no new propositions were added. Furthermore, the previously formulated propositions were validated, clarifying the conclusions.

Analysis of the standard ISO/IEC TR 30164. Sect. 1 (Scope) of that document says “This document describes the common concepts, terminologies, characteristics, use cases and technologies (including data management, coordination, processing, network functionality, heterogeneous computing, security, hardware/software optimization) of edge computing for loT systems applications”.

For this reason, it makes sense to compare this standard with our theory to validate the theory. The main conclusions raised were the following.

  • The main motivations for edge computing pointed out by the standard (latency, disconnected operations, need to minimize the volume of data transmitted upstream, and data providence) are reflected in the theory developed (Benefit::lower latency, Device_in_the_Edge::disconnected mode, Benefit::better bandwidth throughput, and Challenge::security & privacy).

  • Our theory encompasses the main classifiers (constructs) indicated in the conceptual viewpoint of the standard as follows. We associate the classifier IoT_System of the standard with the classifier IoT_Edge_Software_System of the theory, as well as the classifier IoT_ Component of the standard with the classifier Device_in_the_Edge of the theory. However, it is worth mentioning that the theory does not distinguish between physical and digital entities, whereas the standard does include this distinction.

  • The functional viewpoint of the standard claims that “An edge computing entity can have but is not limited to the functions mentioned in 6.3.” The functions described in Section 6.3 of the standard are subsumed in the methods of the Device_in_the_Edge classifier. We should understand that these functions are those extracted from the surveys and do not represent an exhaustive list of the functions that can be assigned to an IoT_Edge_Software_System.

  • Regarding the deployment viewpoint, the standard defines two deployment models: three levels vs. four layers. In both models, a distributed architecture underlies, as pointed out in the theory developed.

In summary, for the aforementioned reasons, we consider that the theory is perfectly aligned with the standard.

Clarity and precision. The constructs and propositions of a theory should be clear and precise so that they are understandable, internally consistent, and free of ambiguities. In our case, the definitions and descriptions of both constructs and propositions have been expressed in UML and, in the case of the propositions, also in OCL. The semantics of each of the elements that appear in the UML/OCL diagrams are described in their respective specifications [41] and [40] (see also [48]). Due to the formal language applied, there is no room for ambiguity or inconsistency, problems that would have been detected by the tool used to draw the diagrams [37].

It is worth mentioning that the semantics of some of the operations/attributes defined in some of the classifiers may be misleading, but if we would like to clarify them, this information would be artificially added from the researchers’ knowledge, since it is not reflected in the documentation analyzed. In this sense, we limited ourselves to define/characterize the elements that arise in the theory only based on the data extracted from the surveys, trying to not include any extra knowledge.

Extent to which a theory has been validated. Following [52], we must differentiate between two terms: scope of interest and scope of validity of a theory. In our case, the scope of interest was explained in Sect. 4.5.1.

On the other hand, quoting [52], “The theory’s scope of validity refers to that part of the scope of interest in which the theory has actually been validated. The scope of validity of a theory is the accumulated scopes of validity of the results of the studies that have tested the theory, or the studies from which the theory has been generated”. In our case, the scope of validity is for the 18 surveys used to generate the theory, plus the remaining surveys (11) used to validate it.

The scope of the theory. In general, this concept refers to the fact that conditions must be explicitly and clearly specified, so that the domain or situations in which the theory should be (dis)confirmed and applied are clear. In our case, the scope was set in Sect. 4.5.1 and graphically depicted in Fig. 2. Roughly speaking, the theory can be applied to IoT edge software systems.

4.6 Discussion

As noted by Glaser [15], “The task of the GT researcher is to generate a theory within the chosen data boundaries, not a formal theory”. The same author also highlights “The researcher, if using the classical GT method, is set up to write – and must – to conclude a substantive GT. He/she should stop, write.”

Independently of the description in the standard ISO/IEC TR 30164, the theory developed in this work is based on the perception that the professionals involved in the surveying process have about what edge computing is. Indeed, the standard is relatively recent (April 2020), so its adoption by the industry, if finally reached, will take some time.

IoT edge computing is a computational paradigm within the IoT framework characterized by the aim of moving the computations as close as possible to the data source. This computation is held in edge devices that frequently have severe limitations of computing speed and storage resources. These limitations are common to several types of devices (gateways, servers, microcontrollers, etc.) and condition the functionality that they can host (filtering, video processing, storage and analytics, etc.). As a result of carrying out the computation at the edge, we obtain a set of benefits (better performance, less response time, greater efficiency, etc.). However, a set of problems related to this paradigm must also be addressed if it intends to be applicable to the IoT framework (deployment time, reliability, scalability, etc.). Finally, we would like to mention that all the applications supported by this paradigm must present a highly distributed architecture with interconnected remote nodes in different topologies.

From the analysis conducted, it is possible to deduce that all the companies identify that there are several dimensions that must be taken into account, or that are affected by, the application of edge computing in the design of IoT applications. The dimensions identified by the vast majority of companies are computing, networking, functionality, and technology. Indeed, all of these dimensions are affected because all organizations highlight that the application of a distributed architectural design is crucial for this paradigm and that greater control of this distribution is also necessary to achieve the desired QoS. However, with respect to the challenges highlighted, a lower consensus may initially be seen, but a more thorough analysis shows otherwise. The organizations interviewed identify different challenges, but they are closely related. Thus, an important challenge is the complexity in the management of these applications, which is highly related to other challenges such as the need to automate this management and the deployment of functionalities, which also lead to better control of the scalability, reliability, and maintenance cost of the systems.

Furthermore, the deployment and monitoring of highly distributed applications, where quality can be affected by several highly related dimensions, entails greater complexity in management. Companies demand tools that allow them to automate this process, to detail their needs in a simpler way, to automate how applications should scale in these highly distributed environments, and how the operational cost can be kept under control. Therefore, one of the key aspects that can be deduced from this study is that methodologies, techniques, and tools are needed in this direction, so that organizations can apply this paradigm more boldly and confidently. In this regard, the GT study has detected some of the challenges that have not been described in the literature so far, such as those related to the delivery and deployment of IoT edge applications. This could indicate that further research in that area is required to introduce or adapt already existing paradigms that have been proven successful when dealing with highly distributed workloads such as DevOps or GitOps. In such a case, new research is required to analyze which other benefits and challenges arise when adopting these paradigms in the domain of IoT edge computing.

5 Threats to validity and reliability. Limitations

Criteria for judging the quality of the research design are the key to establish the validity, that is, the accuracy of the findings and the reliability, i.e., the consistency of the procedures and the researcher’s approach, of most empirical research [8, 59]. We considered the quality criteria defined by Lincoln and Guba’s [34] for qualitative research as follows:

  • Credibility is also referred to as trustworthiness, i.e., the extent to which conclusions are supported by rich, multivocal evidence. The strategy to mitigate this threat was data triangulation. We received surveys from 29 companies, which means that we collected data at different times and locations and from different populations, as can be seen in Table 1.

  • Resonance is the extent to which a study’s conclusions make sense to (i.e., resonate with) participants. A key strategy to that end is member checking, so some participants received preliminary results to ensure the correctness of our findings.

  • Usefulness is the extent to which a study provides actionable recommendations to researchers, practitioners, or educators and the degree to which the results extend our cumulative knowledge. The usefulness of this study is to validate that the vision of IoT companies aligns with the standards generated in the IoT domain. We assume that the industry has also defined the constructors and relationships in these standards.

  • Transferability shows whether the findings could plausibly apply to other situations. Data were iteratively gathered from 29 companies, a number large enough to build a complete picture of the phenomenon. This multiplicity is what provides the basis for “theoretical generalization”, where the results are extended to cases that have common characteristics and hence for which the findings are relevant [58]. Furthermore, it is necessary to consider that the theory is substantive (i.e., local to the analyzed surveys). Like any grounded theory study, the result is only applicable to the domain and context being studied and therefore cannot be assumed to be applicable to other contexts or in general.

  • Dependability shows that the research process is systematic and well documented and can be traced. The public repository contains all the data and procedures used in this research so that other researchers can replicate it.

  • Conformability assesses whether the findings emerge from the data collected from cases and not preconceptions. As explained in Sect. 4.5.5, we deliberately omitted any interpretation of the analyzed data, even if this may lead to ambiguities or a vague interpretation of the data. Additionally, as pointed out in Sect. 4.5.4, to explain some phenomena, we applied abductive reasoning: we assume the premise to be true and seek the most probable explanation.

5.1 Limitations

As with any research methodology, there are limitations to our choice of research methods. The first limitation of our study lies in the number of surveys. The goal of our study was not to generalize a phenomenon observed in a sample to a population: instead, we are generating a theory about a complex phenomenon from a set of observations obtained through theoretical sampling. Grounded Theory does not support statistical generalization. Although the proposed theory appears widely applicable, organizations with different software development cultures in the IoT edge computing domain could have different perspectives.

6 Related work

During the last few years, different work has analyzed the main characteristics of edge computing, its application to specific domains (such as IoT), and the open challenges that should be addressed to increase its adoption by the industry.

Specifically, focused on the application of Grounded Theory to create hypotheses and theories through the analysis of the perception of edge computing by the industry, few resources can be found, as is also stated by [13]. Some works, such as the one presented by Mengru Tu [55] studied the intention to embrace IoT in Indian organizations, focused on the logistics and supply chain management area, identifying that benefits and cost perception were important over technology trust. Furthermore, Radanliev P. et al. [45] use Grounded Theory to identify current gaps in cyber risk standards and policies, defining the design principles of the future cyber risk impact assessment of the IoT. The same authors use Grounded Theory to build a conceptual cascading model for the future integration of cognition in Industry 4.0 [44] where current and future challenges are identified in the use of Artificial Intelligence in cyber-physical systems.

However, a greater number of works can be found in the literature that analyze these paradigms, through surveys or systematic literature reviews, to characterize edge computing, its benefits and challenges. Some of these works have been analyzed in order to, first, better outline some of the questions presented in this work to the industry; and, second, compare the conclusions obtained from applying Grounded Theory and the conclusions obtained by analyzing the related works. A summary of the analyzed works can be found in  B.

As a summary, the reviewed related works highlight some benefits of edge computing that has also been identified in this work analyzing the responses of the industry, such as the improvement in the quality of service, the performance of the applications, a better user experience, the increase in data privacy, the decrease of network and cloud overhead, and, also, the decrease in the energy consumption. Nevertheless, a notable key benefit for the industry is to better satisfy business needs. This is a crucial benefit for any company that is usually addressed by related work at the second level (Table 13). Only [39], identifies that one of the benefits of edge computing is the ability to create more innovative solutions.This is because academia usually focuses on more technical aspects, which require greater coordination between both worlds to address and provide more business-related benefits. Therefore, the research community needs to invest efforts to show how this paradigm can be applied by the industry to create solutions that better meet the needs of businesses and their customers.

Linked to these benefits, from the companies’ responses, we can also identify a series of challenges that still need to be addressed more deeply, some of them are also highlighted in the related works analyzed. For example (as can be seen in B), both companies and academia identify complexity, latency, cost, scalability, security and privacy, and certifications as challenges that need to be further addressed. This alignment between the two worlds will allow us to address these challenges in an agile and successful way and will provide greater usefulness.

However, we have identified some challenges that are relevant to companies and have not been described in the literature so far (Table 12), such as: the complexity of the systems using edge computing, the time required to deploy these hyperdistributed systems, the speed in the delivery of the product, how to decrease the time to market applying these solutions, and whether certifications are needed to guarantee the quality of these distributed systems and also those people developing them. As can be seen, these challenges are closely related to the development and maintenance of IoT applications, which is the main concern of companies. On the industry side, companies are the ones that first detect these challenges as relevant because they need to solve them to develop massively systems that apply this paradigm. On the research side, they are currently not core challenges because, although they are important, they are addressed once they are demanded by the industry. This gap between the two worlds may show that the industry’s need of applying this paradigm is closer than expected. Therefore, more effort must be invested to make the application and adoption of edge computing smoother. Furthermore, by solving these challenges, a success similar to that provided by cloud computing can be obtained, in which these challenges were also addressed.

Therefore, although the main benefits and challenges of edge computing are similar in both the research and industry contexts, there are some issues mainly related to the impact in business, such as the improvement of the quality of the applications and the decrease in the time-to-market, which have to be deeply addressed in both contexts to increase the adoption of edge computing.

7 Conclusion

In recent years, the expression “edge computing” has become familiar in the IoT domain. However, not all stakeholders seem to share the same semantics for this expression, leading to confusion in its implementation and application.

The aim of this work has been to develop comprehensive qualitative research that sheds some light on the meaning of edge computing for the industry. The theory developed in this work comprises nine constructs and nine propositions that define the ingredients that, according to the companies interviewed, are central in the edge computing paradigm. From this point of view, the theory satisfies the parsimony criterion (the degree to which a theory is economically constructed with a minimum of concepts and propositions) and is a substantive and analytic theory.

The main contribution of this work is to show, by construction of a theory, that industry and the standard ISO/IEC TR 30164 are mostly aligned. This makes us expect that in the near future the interoperability issues experienced in the edge computing world will vanish or, at least, will be less predominant. It is worth mentioning that the best alignment between industry and the standard is achieved in the discussion of what the functionalities that the edge computing paradigm should support. Our results show unanimity in the expected benefits of the paradigm in terms of resource consumption, security, performance, etc.

There exists also a clear agreement in the sketch of the challenges that the paradigm must address, even though many of them are not appropriate for the paradigm itself but for software engineering. To highlight some of these shared challenges, concerns were detected about how to address the construction and maintenance of complex systems and reliability and efficiency issues. Other challenges are more characteristic of the IoT environment, such as how to increase the computational performance and storage of devices at the edge and how to improve the use of communication networks.