1 Introduction

Information systems increasingly link to the physical world. Technological advancements and declining unit costs of sensor technology combined with increased connectivity drive the spread and complexity of the Internet of Things (IoT) (Wortmann and Flüchter 2015) or so-called cyber-physical systems (Lasi et al. 2014; Lin et al. 2017). Today, billions of sensors feed information systems (IS) with data describing physical phenomena – such as temperature, pressure, humidity, velocity, chemical components, or material composition – across many areas ranging from industrial applications (e.g., smart factories) to consumer applications (e.g., smart watches). They form a key foundation for AI-based information systems that apply machine learning and generate analytics-based solutions. In particular, sensor data represents an essential building block of digital twins as an important phenomenon of interest for the BISE community (van der Aalst et al. 2018). As digital duplicates of real assets in the physical world, they rely on sensor technology for continuous data acquisition: As an example, the digital representation of a production plant (captured via physical or virtual sensors) may be used to optimize the production process by means of simulation or to develop predictive maintenance services (Tao et al. 2019). The increasing importance of sensors and IoT-based data for IS is also evident from the rapidly growing number of articles in academic IS journals dealing with ‘sensors’, which has increased more than tenfold within the last two decades.Footnote 1

This development of cyber-physical systems is drawing attention to the question of how data can be captured from the physical world and be fed into a connected IS: the condition of the physical world can either be “directly” observed (by a physical sensor) or indirectly derived by fusing data from one or more physical sensors, i.e., applying virtual sensors.

Typically, embedding physical sensor output into IS is subject to a number of limitations: equipping assets with sensors is cost-intensive, sensor signals are noisy or may interfere with each other, sensors may lose accuracy over time, or their use is even technically not feasible due to spatial or environmental conditions.

However, software-based virtual sensors offer an additional abstraction layer built on digital representations of sensor hardware. They issue signals that aggregate input from physical sensors; thus, they may overcome the limitations mentioned above, offering lower operating cost or increased reliability, agility, or even indirect measurement of physically non-measurable properties. In addition, virtual sensors can make low-level physical sensor information more broadly available for application in cyber-physical systems: they foster collaboration on the level of sensors (e.g., improving accuracy of individual sensors), on the level of assets (e.g., replacing or substituting individual sensors) and even on the level of organizations (e.g., enabling different service providers to offer services based on the same sensor hardware). Thus, while physical sensors typically feed specific, isolated applications only, virtual sensors become the primary source of physical world data for generalized and connected cyber-physical systems.

While the basic concept of virtual sensors dates back to Muir (1990), still today a number of unresolved challenges (like data access and availability, standardization, platform deployment) limit its application within IS – and stand in the way of effective and efficient cyber-physical systems and IoT-based solutions. In this article, we will clarify the terminology around virtual sensors and describe their advantages (Sect. 2), describe the virtual sensor concept and differentiate four levels of application – from pure sensor virtualization to dynamic-cooperative sensing (Sect. 3). We emphasize the importance of systems thinking for the effective application of virtual sensors (Sect. 4) and outline research challenges for the BISE community (Sect. 5).

2 Physical Versus Virtual Sensors

In general, sensors are technical devices that monitor their environment and continuously produce signals at a regular frequency (either in analog form, like electric impulses, or digitally, like measurement data). A physical sensor is a sensor that reacts to a physical stimulus (e.g., temperature, light, pressure, magnetism, or a particular motion) and transmits a resulting impulse – typically through electrical signals that can be captured and stored in digital form (Fraden 2016; Merriam-Webster n.d.). In contrast to physical sensors, a so-called virtual sensor is a pure software sensor which autonomously produces signals by combining and aggregating signals that it receives (synchronously or asynchronously) from physical or other virtual sensors (Kabadayi et al. 2006): Fig. 1 illustrates various constellations of virtual sensors (VS): (a) a virtual sensor based on physical sensors (PS) only, (b) a virtual sensor based on another virtual sensor only, (c) a virtual sensor based on both physical and virtual sensors. Thus, virtual sensors only process data originally gathered by physical sensors. The data they deliver is then typically embedded into more complex functions or software applications that merge this input with data from other sources and execute analytics algorithms on the combined set of data.

Fig. 1
figure 1

Various constellations of virtual sensors

By fusing and processing multiple physical sensor inputs, virtual sensors are able to measure abstract conditions or process variables that may not be physically measurable themselves (Albertos and Goodwin 2002; Kabadayi et al. 2006) – as, for instance, a type of sealing defect indicated by a function of several process signals (Martin and Kühl 2019): this condition could not be detected by any physical sensor built into the sealing itself. In existing literature, however, the distinction between physical and virtual sensors is fuzzy as most physical sensors are typically described as not capturing a measurand in a direct way. In fact, most physical sensors measure the phenomena of interest (e.g., pressure or force) by using physical correlations (e.g., the piezoelectric effect) to translate the variable to be measured into a processable electric signal. Thus, most real-world sensors already include additional hardware and software components for signal processing (Fraden 2016) – and in a strict sense would in fact be virtual sensors.

In literature, the general idea of combining several (homogeneous or heterogeneous) sensors has already been discussed for decades using different terms: A sensor network is comprised of a number of “sensor devices that are deployed in an ad hoc fashion [to] cooperate on sensing a physical phenomenon” (Tilak et al. 2002, p. 28). Nodes in sensor networks usually have no or limited computing power and, thus, transmit the sensed data to a central location where it can be processed further (Yick et al. 2008). While the concept of sensor networks focusses on connecting sensors at the physical (i.e., hardware and connectivity) level, sensor fusion describes a merge of different sensors at a data and information level. The concept of sensor fusion denotes strategies that serve to overcome issues of individual physical sensors (such as limited spatial and temporal coverage, uncertainty, or limited robustness). It describes the combination of “information from multiple sensors and sensor types to increase the accuracy and to resolve ambiguities in the knowledge about the environment” (Chiu et al. 1986, p. 1629). In other words, fusion enables both more precise measurements of one specific phenomenon (e.g., temperature at a specific location within a system) as well as abstract representations of diverse signals (e.g., a defect within the system).

Based on these concepts, the term virtual sensor (sometimes also referred to as soft sensor) has evolved as the implementation of a sensor fusion based on a sensor network. However, the term is still not unanimously defined, and we observe also other, slightly different meanings. Some authors emphasize architectural aspects and describe virtual sensors as a pure software abstraction layer without further specifying data processing aspects (Madria et al. 2014; Bose et al. 2019). Other authors only address certain aspects of virtual sensors, such as the ability to leverage different data sources in order to measure an unobservable target without considering aspects like the pure virtualization of a single physical sensor (Kabadayi et al. 2006; Tegen et al. 2019). To address this discord, this article aims to consolidate different definitions and to propose a coherent conceptualization.

Virtual sensors serve to overcome a number of weaknesses of purely physical sensors. First, there is the obvious advantage of significantly lower costs of software compared to hardware, applying to both initial investment and ongoing maintenance (Tegen et al. 2019). Second, virtual sensors provide an interesting alternative when a physical sensor cannot be placed in the preferred position due to spatial conditions (e.g., lack of space for a sensor) or a hostile environment (e.g., exposure to acids or extreme temperatures). The resulting delay or inaccuracy of the measurement, when installing the sensor in a less suitable spot, may be compensated by virtual sensors (Tegen et al. 2019). Third, virtual sensor technology can reduce signal noise and, thus, increase confidence in the signals, when a sensor’s output is confirmed by other sensors measuring the same phenomenon (Albertos and Goodwin 2002). Fourth, so-called drifts of physical sensors are a well-known phenomenon rendering a sensor inaccurate over time due to, e.g., wear or pollution (Baier et al. 2019). These drifts can be recognized or compensated by virtual sensors. Finally, virtual sensors are extremely flexible and can be redesigned as required, while physical sensors, once installed, often can only be repositioned by mechanical intervention (Neidhardt et al. 2008; Tegen et al. 2019).

In addition to this functionality of “replacing” physical sensors, virtual sensors are used to deliver a “higher level” output as a function of various, heterogeneous sensor signals (as stated above). For instance, they may transform various sensor data into information about the condition of an asset (e.g., the wear and tear level of an industrial robot) forming a small-scale information system themselves. Based on this output, better decisions could be made (e.g., the scheduling of maintenance).

3 Key Characteristics of Virtual Sensors

Virtual sensors represent a software layer that provides indirect measurements of a process variable or an abstract condition based on data gathered by physical (or other virtual) sensors leveraging a fusion function. In order to clearly describe the concept of a virtual sensor and also to identify key properties, Fig. 2 graphically illustrates its building blocks and their relationships. In the following, we first elaborate on a conceptual framework of a virtual sensor and its inherent assumptions. In a second step, we focus on describing different application levels of the virtual sensor concept.

Fig. 2
figure 2

Virtual sensor concept

An asset describes an object, subject, or system which, as a whole or in parts, is to be monitored or observed in any form. It is a delimitable, natural or artificial “thing” consisting of various components that can be regarded as a common whole due to certain relationships between them. Examples include technical systems such as machines, cars, or airplanes, but also social or sociotechnical systems such as patients to be monitored or a work environment.

Data sources provide data streams about the asset generated by physical or other virtual sensors at a regular frequency. This sensor data may originate from the same asset or other assets in cyber-physical systems. The data can be of any type (e.g., numeric, categorial, etc.) and is typically made available in a continuous fashion. Nevertheless, interruptions of the data streams, time delays and batchwise provision of the data are also conceivable. Moreover, the number of sources or its format may dynamically change over time.

A data fusion function describes a transformation procedure of any complexity which converts source data into a desired output variable or information. The simplest fusion function would reproduce the input signal without any modification. However, more complex, but still simple fusion functions apply methods such as scaling, filtering, linearization, aggregation, smoothing, extrapolation and others to the source data in order to provide a final measurement result (Albertos and Goodwin 2002). These functions depend on the characteristics of the sensor and the sensing environment. Moreover, machine learning-based functions are applicable, which are able to infer a target of interest from data sources of different resolution, availability, type and form (Meng et al. 2020).

The derived measurements produced by a data fusion function represent the virtual sensor data. This time series data can be of any type and form but should be directly attributable to the asset to be observed (e.g., a part of the system).

In order to persist this data, a digital twin is required. According to Dietz and Pernul (2020), a digital twin represents an asset’s virtual counterpart that can be leveraged to digitally mirror and constantly manage it. It combines and integrates an asset’s data sources and controls its availability and validity. This includes providing meta-data, semantics and context information that refines data into information, e.g., the interpretation of a transmitted floating-point number as the measure of electric current in a particular module. Additionally, the digital twin provides necessary interfaces between the virtual and the real world, and enables bi-directional data sharing as well as synchronization (Alam and El Saddik 2017). Thus, virtual sensors can serve both as data sources for digital twins as well as their integrators, since a digital twin is also an integral part of the virtual sensor concept.

Based on these generic building blocks, different degrees of complexity, expansion stages or facets of virtual sensors can be observed in applications or conceptual descriptions that appear in literature. The typology illustrated in Fig. 3 schematically describes different levels of application on the interaction and data level. The degree of complexity with regard to data integration and fusion increases from left to right.

Fig. 3
figure 3

Application levels of virtual sensors (colour figure online)

3.1 Sensor Virtualization

The simplest form of a virtual sensor obtains data from exactly one physical sensor and mirrors it either completely unchanged (Madria et al. 2014; Ko et al. 2015), in aggregated (Corsini et al. 2006), cleaned, or otherwise modified form (Albertos and Goodwin 2002). This kind of virtual sensor is very common in practice, as advances in communication technologies and increased bandwidth allow measurement data from many physical sensors to be made digitally available via cloud infrastructures (Fraden 2016; Matt 2018). A typical example of a virtualized sensor based on a single input signal is the pedometer in smartphones: Simple algorithms transform the output signal of an accelerometer into the number of steps taken over time (Abadleh et al. 2017). An accelerometer in turn is a force sensor with a seismic mass attached, which leverages the piezoelectric effect to translate a force into a proportional measurable electric signal (Gautschi 2002). In turn, an acceleration sensor can also be leveraged, for instance, to detect abnormal behavior in mechanical components such as pumps or bearings through defined threshold values in order to initiate maintenance actions (Donelson and Dicus 2002).

3.2 Competitive Sensing

Sensor configurations where each sensor provides independent measurements of the same property are called competitive or redundant. If several sensors – possibly with different accuracies – perceive the same features in the environment, overall accuracy may increase, and at the same time uncertainty as well as transmission volume is reduced, as less data needs to be transmitted (Luo and Kay 1989; Tegen et al. 2019). Multiple sensors providing redundant information can also increase reliability in the event of a sensor failure or malfunction (Luo and Kay 1989). Furthermore, the influence of drifts caused by decreasing sensor accuracy can be detected and optionally corrected (Dornfeld and DeVries 1990; Baier et al. 2019). Guérin et al. (2003) present an exemplary implementation of competitive sensing, in which the signals of two microphones are leveraged to improve the audio quality for hands-free car kits.

3.3 Static Cooperative Sensing

Cooperative sensing leverages data provided by several independent sensors to derive information that would not be available from an isolated view. However, one problem is the increased sensitivity to inaccuracies of individual sensors involved (Brooks and Iyengar 1998). For the same reason, suitable fusion functions for cooperative sensing usually show higher complexity compared to competitive sensing due to different types of involved sensors. An example is a neural network predicting NOx at cylinder level based on individual cylinder pressures and a downstream cylinder-aggregated NOx sensor (Henningsson et al. 2012). These cylinder-specific measurements can support the design of improved engines that meet customer demands for low fuel consumption as well as comply with legal regulations. In the static case, incorporated sensors are available at any time, so that the fusion function may permanently access a constant set of features.

3.4 Dynamic Cooperative Sensing

When the permanent availability of physical sensors is not guaranteed, dynamic fusion functions care for flexible adaptations to systemic changes (Tegen et al. 2019). Reasons can be dynamic changes in the system itself, such as the omission or addition of a system component equipped with sensors, as well as the limited availability of physical sensors for technical or economic reasons. An example would be the observation of the motion profile of a person that, depending on the time of day, is carrying either a smartphone or a fitness tracker with different built-in sensors. Another example is the pedestrian recognition function of autonomous vehicles, which can rely on camera signals in good weather conditions, but not in fog or at night. Dynamic cooperative sensing, thus, requires a complex fusion function being able to handle dynamic feature availability to adequately accommodate the context as well as the accuracy of a measurement (Mihailescu et al. 2017).

4 Towards a Systems Thinking Mindset

As described above, there are different application levels of virtual sensors. Higher levels allow for increasing accuracy, reliability and informational value of a virtual sensor, but hinge on the use of a richer set of data. This in turn calls for inclusion of a broader set of data sources across different assets (as in Sect. 3) or even across different organizational entities, and, thus, the extension of the system boundary along these dimensions (cf. Fig. 4). Thus, higher performance of a virtual sensor is linked to the inclusion of additional resources from the enlarged system – as generally postulated for service system engineering in IS (Böhmann et al. 2014) and evident from the advancement of cross-industry platforms (Beverungen et al. 2020).

Fig. 4
figure 4

System boundary extension potentials

With respect to the different application levels of virtual sensors described in Sect. 3, access to a broader set of data may improve sensor performance within any particular level as well as allow to progress to the next level:

In sensor virtualization, where only one sensor signal is used, there may be different options for sensor positioning affecting the measurable correlation to the actual target variable. The more options for picking a sensor signal from own assets or even from those of other organizations, the more accurately the target variable may be measured. However, this positioning may entail the permission or support of other entities: A humidity or temperature sensor at a public weather station may be a good (isolated) data source for estimating the weather conditions for a particular target location nearby (Fig. 4, scenario I). Access to and a switch to a similar sensor at other self-owned weather stations may yield even better predictions (Fig. 4, scenario II) (Maniscalco and Rizzo 2017), while additional access to private weather stations would offer even more options to identify the best suited sensor (Fig. 4, scenario III). Therefore, an application-specific assessment of the benefit (increase of sensor performance) against the potential costs for the extension of the system boundaries (price of integration) is required.

In competitive sensing, more fusion options become available when different data sources can be joined: in the example above, simultaneous access to all available sensors of the different weather stations and the triangulation of their (“competitive”) signals might improve sensor performance.

For static and dynamic cooperative sensing, sensors tapping additional data sources for different signals may be the key for adequate performance: In an industrial context, the condition of a sealing cannot be monitored on the base of an individual sensor or type of signal. Only a higher-level cooperative sensing solution achieves this, when sensor data across different assets in the operational process are combined (Martin and Kühl 2019). This, however, calls for extending the system boundary around several industrial assets provided by different manufacturers (Fig. 4, scenario III) – requiring interoperability, connectivity and a common platform.

Thus, in designing effective virtual sensor solutions – as part of larger information systems – we need to strive for exploiting data sources across assets and organizations. Joining physical sensor data will allow the creation of virtual sensors that can exploit connections and correlations among individual system components (e.g., assets or organizations). Thus, completely new avenues for the design of information systems and value co-creation will emerge: The BISE community is to contribute design knowledge and concrete methods to systematically develop virtual sensor concepts across assets and organizations. This also encompasses the economic evaluation of the trade-offs between benefits of higher precision and the costs of extending the system’s boundaries.

Today most decisions on necessity, type and position of physical sensors in different assets are made by companies reflecting their own individual needs (Ji and Zha 2004) limiting data availability. In addition, even data already available within a system is not sufficiently shared with other actors (e.g., customers, suppliers) (Chanson et al. 2019). On the one hand, this is caused by a lack of technical solutions, as the exposure of data to other actors is limited by a lack of data standardization, insufficient exchange platforms and still low communication bandwidths (Matt 2018; Chanson et al. 2019; Martin et al. 2020). On the other hand, data is perceived as a valuable resource that needs to be protected and should not be shared at all (Zhang et al. 2008; Spagnoletti et al. 2015; Chanson et al. 2019). Accordingly, suitable approaches need to be developed to commercialize data as resources in order to create mutual benefits.

An example would be a scenario in which an OEM of a vehicle fleet obtains data of rain sensors below the windshields (which are currently only used to activate the wipers and the headlights). If this data could be exposed on a suitable platform, it might be integrated into advanced local weather forecast models. Such an IoT platform would have to provide interfaces to receive and to manage a huge variety of data from diverse actors, ensure enterprise-grade security, as well as to manage access to other participating actors. Furthermore, such a platform would have to enable the actors to filter the potentially most suitable data sources from the multitude of options for any particular purpose. Not only meteorological institutes could benefit from this, but also other drivers in traffic, who might be provided with an individualized alert by their vehicle or an external navigation app. A multitude of potential applications of these and similar scenarios are conceivable – once not only technical exchange of data is feasible, but also adequate incentives and remunerations for data providers are in place.

5 Challenges for Information Systems Research

The previous paragraph already pointed to the challenges that the application of virtual sensors poses to information systems research and to which the BISE community could contribute:

First, the use of physical sensors in the design and construction of assets has to be informed by potential uses of the produced data in “downstream” virtual sensors. Sensorization of assets has to purposefully be planned to enable particular data-based, digital services that are to be built and run on the generated data.Footnote 2 This requires a customer-oriented mindset that in design thinking manner tries to anticipate user information needs and allows to equip assets with the appropriate sensor technology. The sensor configuration process can either be realized while designing an asset (proactive sensorization) or can even be retrofit to quickly respond to needs that were not known during the initial design phase (reactive sensorization). Especially the ability to retrofit sensor technology by means of virtual sensors adds ample possibilities to satisfy needs that have arisen “post-design”, even without additional hardware. In both options, however, methods are needed that help to “reverse engineer” products with regard to sensors: if the customer or operator of a milling machine needs certain data on the asset’s usage to run effective predictive maintenance, the manufacturer may be able to generate additional value by installing sensors to provide this data. He will only be able to do so, though, once he has the awareness, methods and tools to elicit the customer’s need for information.

Second, in order to use the potential of joining more data sources in particular for cooperative sensing of virtual sensors, easy, intuitive, and secure exchange of data needs to be enabled. Interoperability standards and information exchange platforms for IoT-based, cyber-physical systems have to be developed. In the concrete sealing sensor example above, a whole range of data from different actors within the value network is required in order to draw conclusions about the condition of the seal via cooperative sensing. Although this data is already collected for isolated use, it has not yet been shared. With data being kept in non-standard, proprietary or poorly documented formats, manual pre-processing could prove the added value of the virtual sensor, but it has so far not been possible to implement it for efficient productive use (Martin and Kühl 2019). Therefore, there is an urgent need to develop uniform communication standards for sensor data, which provide detailed meta-information, semantics and context in addition to the actual measured values, such as, e.g., unit of measurement, measuring range, time of measurement or update frequency. This would enable companies to provide sensor data for other actors in a simple manner on dedicated exchange platforms. The quest for these platforms has already begun: The International Data Spaces (IDS) initiative aims to design and develop a platform for trusted and secure data exchange – even beyond sensor data (Otto and Jarke 2019; International Data Spaces Association 2020). This endeavor also reveals that in particular data sovereignty seems to be a limiting factor for inter-organizational data exchange. Although the initiative shows that the questions and challenges identified in the context of virtual sensors also emerge in a broader context, simple solutions are not yet in sight. Moreover, the allocation of ownership of data originating from a multi-layer setting is still under debate (Hirt and Kühl 2018).

Third, enabling the technical exchange of data will not suffice. Only if (data-based) business models are developed that incentivize data providers to expose physical and virtual sensor data, data sharing for building virtual sensors will actually happen. This will require to explore and size the value of sensor-provided data, to develop appropriate data-based services and revenue models (Legner et al. 2017) or even to analyze the benefits of open data provision (Enders et al. 2020).

Fourth, cooperative virtual sensors may provide information on a higher abstraction level that is not directly measurable by individual physical sensors, as, e.g., the condition of an industrial asset. While this is a key benefit of virtual sensors, it may aggravate the “downstream” analysis of the data in machine learning applications, e.g., predictive maintenance forecasts: When the condition is used as a target in AI-based fusion functions, a known subset of the true conditions as the “ground truth” is required to enable training of machine learning models. For situations where this is tedious or excessively costly, methods are needed to deal with insufficient or sparse labelling. Techniques from the fields of semi-supervised learning or domain adaptation, for example, could serve to address these issues. However, this requires research into the suitability of these methods for sensor-specific applications.

Fifth, there is an economic tradeoff between the benefit of a virtual sensor performance level and the cost of incorporating additional data sources to reach this: A single reverse vending machine sensor may help to predict the filling level with certain accuracy (Walk et al. 2020). A (costly) ample set of sensors, though, may significantly improve the prediction quality. Economic information value models are needed to manage the tradeoff between approximate, cheap virtual sensor prediction and more precise, but costly “brute force” physical sensor detection.

For quite some time, virtual sensors have been offering promising and cost-effective options to augment or even replace physical sensors. With the explosion of data generated by IoT-based assets in cyber-physical systems, their understanding and competent use will be key for rendering competitive products and data-based services. Information systems research can and should contribute to the closure of existing research gaps and to exploiting this business potential.