AIoT.exe

Slama, Dirk

doi:10.1007/978-3-030-88221-1_21

Dirk Slama⁶

2429 Accesses

Abstract

The starting point of the discussion on technical execution following will be a deep dive into our topics from the AIoT 101 section, namely, AI, Data, Digital Twin, IoT, and Hardware: the key ingredients of many AIoT products and solutions. Each topic will be specifically looked at from the execution perspective (hence the play with “*.exe”), with a focus on both technology and organization. For each topic, we will also discuss how the technical pipeline and pipeline organization should be addressed and how it can all be integrated (mainly through the IoT perspective) (Fig. 24.1).

You have full access to this open access chapter, Download chapter PDF

The starting point of the discussion on technical execution following will be a deep dive into our topics from the AIoT 101 section, namely, AI, Data, Digital Twin, IoT, and Hardware: the key ingredients of many AIoT products and solutions. Each topic will be specifically looked at from the execution perspective (hence the play with “*.exe”), with a focus on both technology and organization. For each topic, we will also discuss how the technical pipeline and pipeline organization should be addressed and how it can all be integrated (mainly through the IoT perspective) (Fig. 21.1).

An illustration from a i o t playbook dot org depicts 4 pipes with an upward arrow in the middle. At the top is labeled A I o T dot e x e, on the right is I o T dot e x e, and on the left, each of the pipes is labeled data dot e x e, A I dot e x e, D T dot e x e, H W dot e x e. — **Fig. 21.1**

This will then be followed by a more detailed discussion on AIoT product/solution design, Agile AIoT, AIoT DevOps, and finally Trust & Security, Functional Safety, Reliability & Resilience and Quality Management.

1 AI.exe (Fig. 21.2)

Naturally, AI plays a central role in every AIoT initiative. If this is not the case, then it is maybe IoT – but not AIoT. In order to get the AI part right, the Digital Playbook proposes to start with the definition of the AI-enabled value proposition in the context of the larger IoT system. Next, the AI approach should be fleshed out in more detail. Before starting the implementation, one will have to also address skills, resources and organizational aspects. Next, data acquisition and AI platform selection are on the agenda before actually designing and testing the model and then building and integrating the AI Microservices. Establishing MLops is another key prerequisite for enabling an agile approach, which should include PoC, MVP and continuous AI improvements.

The A I dot e x e tab is opened and displays a structure diagram of 4 data pipes titled evaluation, deployment, business understanding, data understanding, data preparation, and modeling. A two-way right arrow points from deployment to another 4-part diagram labeled swarm intelligence, digital twin cloud, asset and product intelligence, and edge. — **Fig. 21.2**

1.1 Understanding the Bigger Picture

Many AIoT initiatives initially only have a vague idea about the use cases and how they can be supported by AI. It is important that this is clarified in the early stages. The team must identify and flesh out the key use cases (including KPIs) and how they are supported by AIoT. Next, one should identify what kind of analysis or forecasting is required to support these KPIs. Based on this, potential sensors can be identified to serve as the main data source. In addition, the AIoT system architecture must be defined. Both will have implications for the type of AI/ML that can be applied (Fig. 21.3).

A framework of the key to A I o T use cases and business case lists the K P I s as product design, sales support, A P M, S M R, and digital services. Below are online data stream processing, and offline batch processing, with a Venn diagram of 2 sets. — **Fig. 21.3**

1.2 The AIoT Magic Triangle

The AIoT Magic Triangle describes the three main driving forces of a typical AIoT solution:

IoT Sensors & data sources: What sensors can be used, taking physical constraints, cost and availability into consideration? What does this mean for the type of sensor data/measurements which will be available? What other data sources can be accessed? And how can relevant data sets be created?
AIoT system architecture: How does the overall architecture look like, e.g. how to distributed data and processing logic between cloud and edge? What kind of data management and AI processing infrastructure can be used?
AI algorithm: Finally, which AI method/algorithm can be used, based on the available data and selected system architecture?

The AIoT magic triangle also looks at the main factors that influence these three important factors:

Business requirements/KPIs, e.g., required classification accuracy
UX requirements, e.g., expected response times
Technical/physical constraints, e.g., bandwidth and latency (Fig. 21.4)

The AIoT magic triangle is definitely important for anybody working on the AIoT short tail (i.e., products), where there are different options for defining any of the tree elements of the triangle. For projects focusing on the AIoT long tail, the triangle might be less relevant – simply because for AIoT long tail scenarios, the available sensor and data sources are often predefined, as is the architecture into which the new solutions have to fit. Keep in mind that the AIoT long tail usually involves multiple, lower-impact AIoT solutions that share a common platform or environment, so freedom of choice might be limited.

A triangle has all 3 parts labeled. The up angle is the A I algorithm labeled supervised, unsupervised, and reinforcement. The left angle is I o T sensors, and data sources have datasets. The right angle is A I o T system architecture has edge versus cloud. — **Fig. 21.4**

1.3 Managing the AIoT Magic Triangle

As a product/project manager, managing the AIoT magic triangle can be very challenging. The problem is that the three main elements have very different lifecycle requirements in terms of stability and changeability:

The IoT sensor design/selection must be frozen earlier in the lifecycle, since the sensor nodes will have to be sourced/manufactured/assembled – which means potentially long lead times
The AIoT System Architecture must usually also be frozen some time later, since a stable platform will be required at some point in time to support development and productization
The AI Method will also have to be fixed at some point in time, while the actual AI model is likely to continuously change and evolve. Therefore, it is vital that the AIoT System Architecture supports remote monitoring and updates of AI models deployed to assets in the field

Figure 21.5 shows the typical evolution of the AIoT magic triangle in the time leading up to the launch of the system (including the potential Start of Production of the required hardware).

An illustration from a i o t playbook.org web page depicts 3 joined right arrows that illustrate the progress of A I o T development, from freeze I o T sensor design and selection, freeze A I o T system architecture, the release of A I model 1.0, and product launch and S O P. — **Fig. 21.5**

Especially in the early phase of an AIoT project, it is important that all three angles of the AIoT magic triangle are tried out and brought together. A Proof-of-Concept or even a more thorough pilot project should be executed successfully before the next stages are addressed, where the elements of the magic triangle are frozen from a design spec point of view, step by step.

1.4 First: Project Blueprint

Establishing a solid project blueprint as early as possible in the project will help align all stakeholders and ensure that all are working toward a common goal. The project blueprint should include an initial system design, as well as a strategy for training data acquisition. A proof-of-concept will help validate the project blueprint.

Proof-of-Concept

In the early stages of the evaluation, it is common to implement a Proof-of-Concept (PoC). The PoC should provide evidence that the chosen AIoT system design is technically feasible and supports the business goals. This PoC is not to be confused with the MVP (Minimal Viable Product). For an AIoT solution or product, the PoC must identify the most suitable combination of sensors and data sources, AI algorithms, and AIoT system architecture. Initially, the PoC will usually rely on very restricted data sets for initial model training and testing. These initial data sets will be acquired through the selected sensors and data sources in a lab setting. Once the team is happy that it has found a good system design, more elaborate data sets can be acquired through additional lab test scenarios or even initial field tests.

Initial System Design

After the PoC is completed successfully, the resulting system architecture should be documented and communicated with all relevant stakeholders. The system architecture must cover all three aspects of the AIoT magic triangle: sensors and data selection, AIoT architecture, and AI algorithm. As the project goes from PoC to MVP, all the assumptions have to be validated and frozen over time, so that the initial MVP can be released. Depending on the requirements of the project (first-time-right vs. continuous improvement), the system architecture might change again after the release of the MVP.

It should be noted that changes to a system design always come at a cost. This cost will be higher the further the project is advanced. Changing a sensor spec after procurement contracts have been signed will come at a cost. Changing the design of any hardware component after the launch of the MVP will cause issues, potentially forcing existing customers to upgrade at extra cost. This is why a well-validated and stable system architecture is worth a lot. If continuous improvement is an essential part of the business plan, then the system architecture will have to be designed to support this. For example, by providing means for monitoring AI model performance in the field, allowing for continuous model retraining and redeployment, and so on.

Define Strategy for Training Data Acquisition and Testing

In many AI projects, the acquisition of data for model training and testing is one of the most critical – and probably one of the most costly – project functions. This is why it is important to define the strategy for training data acquisition early on. There will usually be a strong dependency between system design and training data acquisition:

Training data acquisition will rely on the system architecture, e.g., sensor selection. The same sensor, which is defined by the system architecture, will also have to be used for the acquisition of the training data.
The system architecture will have to support training data acquisition. Ideally, the systems used for training data acquisition should be the same system, which is later put into production. Once the system is launched, the production system can often be used to acquire even more data for training and testing.

Training data acquisition usually evolves alongside the system design – both are going hand in hand. In the early stages, the PoC environment is used to generate basic training data in a simple lab setup. In later stages, more mature system prototypes are deployed in the field, where they can generate even better and more realistic training data, covering an increasing number of real-world cases. Finally, if feasible, the production system can generate even more data from an entire production fleet.

Advanced organizations are using the so-called “shadow mode” to test model improvements in production. In this mode, the new ML model is deployed alongside the production model. Both models are given the same data. The outputs of the new model are recorded but not actively used by the production system. This is a safe way of testing new models against real-world data, without exposing the production system to untested functionality. Again, methods such as the “shadow mode” must be supported by the system design, which is why all of this must go hand in hand.

1.5 Second: Freeze IoT Sensor Selection

The selection of suitable IoT Sensors can be a complex task, including business, functional and technical considerations. Especially in the early phase of the project, the sensor selection process will have to be closely aligned with the other two elements of the AIoT magic triangle to allow room for experimentation. The following summarizes some of the factors that must be weighted for sensor selection, before making the final decision:

Functional feasibility: does the sensor deliver the right data?
Response speed: does it capture time-sensitive events at the right speed?
Sensing range: does it cover the required sensing range?
Repetition accuracy: does it tread similar events equally?
Adaptability: can the sensor be configured as required, are all required interfaces openly accessible?
Form factor: Size, shape, mounting type
Suitability for target environment: ruggedness, protection class, temperature sensitivity
Power supply: voltage range, power consumption, electrical connection
Cost: What is the cost for sensor acquisition? What about additional operations costs (direct and indirect)?

Of course, sensor selection cannot be performed in isolation. Especially in the early phase, it is important that sensor candidates be tested in combination with potential AI methods. However, once the team is convinced on the PoC-level (Proof of Concept) that a specific combination of sensors, AI architecture and AI method is working, the decision for the sensor is the first one that must be frozen, since the acquisition of the sensors will have the longest lead time. Additionally, once this decision is fixed, it will be very difficult to change. For more details on the IoT and sensors, refer to the AIoT 101 and IoT.exe discussion.

1.6 Third: Freeze AIoT System Architecture

The acquisition of an AI platform is not only a technical decision but also encompasses strategic aspects (cloud vs. on premises), sourcing, and procurement. The latter should not be underestimated, especially in larger companies. The often lengthy decision-making processes of technology acquisition/procurement processes can potentially derail an otherwise well planned project schedule.

However, what actually constitutes an AI system architecture? Some key elements are as follows:

Distributed system architecture: how much processing should be done on the edge, how much in the cloud? How are AI models distributed to the edge, e.g., via OTA? How can AI model performance be monitored at the edge? This is discussed in depth in the AIoT 101, as well as the data/functional viewpoint of the AIoT Product/Solution Design.
AI system architecture: How is model training and testing organized? How is MLops supported?
Data pipeline: How are data ingestion, storage, transformation and preparation managed? This is discussed in the Data.exe part.
AI platform: Finally, should a dedicated AI platform be acquired, which supports collaboration between different stakeholders? This is discussed at the end of this chapter.

1.7 Fourth: Acquisition of Training Data

Potentially one of the most resource intensive tasks of an AIoT project is the acquisition of the training data. This is usually an ongoing effort, which starts in the early project phase. Depending on the product category, this task will then either go on until the product design freeze (“first-time-right”), or even continue as an ongoing activity (continuous model improvements). In the context of AIoT, we can identify a number of different product categories. Category I is what we are calling mechanical or electro-mechanical products with no intelligence on board. Category II includes software-defined products where the intelligence is encoded in hand-coded rules or software algorithms. Category III are “first-time-right” products, which cannot be changed or updated after manufacturing. For example, a battery-operated fire alarm might use embedded AI for smoke analysis and fire detection. However, since it is a battery-operated and lightweight product, it does not contain any connectivity, which would be the prerequisite for later product updates, e.g., via OTA. Category IV are connected test fleets. These test fleets are usually used to support generation of additional test data, as well as validation of the results of the model training. A category III product can be created using a category IV test fleet. For example, a manufacturer of fire alarms might produce a test fleet of dozens or even hundreds of fire alarm test systems equipped with connectivity for testing purposes. This test fleet is then used to help finalizing the “first-time-right” version of the fire alarm, which is mass produced without connectivity. Of course, category IV test fleets can also be the starting point for developing an AI which then serves as the starting point for moving into a production environment with connected assets or products in the field. Such a category V system will use the connectivity of the entire fleet to continuously improve the AI and re-deploy updated models using OTA. Such a self-supervised fleet of smart, connected products is the ideal approach. However, due to technical constraints (e.g., battery lifetime) or cost considerations this might not always be possible.

This approach of classifying AIoT product categories was introduced by Marcus Schuster, who heads the embedded AI project at Bosch. It is a helpful tool to discuss requirements and manage expectations of stakeholders from different product categories. The following will look in more detail at two examples (Fig. 21.6).

A diagram list 5 categories of A I products. 1) mechanical or electromechanical products. 2) software-defined products. 3) first-time-right A I in production. 4) connected test fleet. 5) continuous A I improvements. — **Fig. 21.6**

Example 1: “First-Time-Right” Fire Alarm

The first example we want to look at is a fire alarm, e.g., used in residential or commercial buildings. A key part of the fire alarm will be a smoke detector. Since smoke detectors usually have to be applied at different parts of the ceiling, one cannot always assume that a power line or even internet connectivity will be available. Especially if they are battery operated, wireless connectivity usually is also not an option, because this would consume too much energy. This means that any AI-enabled smoke detection algorithm will have to be “first-time-right, and implemented on a low-power embedded platform. Sensors used for smoke detection usually include photoelectric and ionization sensors.

In this example, the first product iteration is developed as a proof-of-concept, which helps validate all the assumptions which must be made according to the AIoT magic triangle: sensor selection, distribution architecture, and AI model selection. Once this is stabilized, a data campaign is executed which uses connected smoke sensors in a test lab to create data sets for model training, covering as many different situations as possible. For example, different scenarios covered include real smoke coming from different sources (real fires, or canned smoke detector tester spray), nuisance smoke (e.g., from cooking or smoking), as well as no smoke (ambient).

The data sets from this data campaign are then validated and organized as the foundation for creating the final product, where the training AI algorithm is then put into or onto silicone e.g., using TinyML and an embedded platform, or even by creating a custom ASIC (application-specific integrated circuit). This standardized, “first-time-right” hardware is then embedded into the mass-manufactured smoke detectors. This means that after the Start of Production (SOP), no more changes to the model will be possible, at least not for the current product generation (Fig. 21.7).

A progress diagram begins with a proof of concept for a new fire alarm with sensors and basic A I and M L model. The next step is a data campaign for data sets, to finalize the product for model training, and then mass manufacturing for the first-time-right product. — **Fig. 21.7**

Example 2: Continuous Improvement of Driver Assistance Systems

The second example is the development of a driver assistance systems, e.g., to support highly automated driving. Usually, such systems and the situations they have to be able to deal with are an order of magnitude more complex than those of a basic, first-time-right type of product.

Development of the initial models can be well supported by a simulation environment. For example, the simulation environment can simulate different traffic situations, which the driver assistance system will have to be able to handle. For this purpose, the AI is trained in the simulator.

As a next step, a test fleet is created. This can be, for example, a fleet of normal cars, which undergo a retrofit with the required sensors and test equipment. Usually, the vehicles in the test fleet are connected, so that test data can be extracted, and updates can be applied.

Once the system has reached a sufficient level of reliability, it will become part of a production system. From this moment onwards, it will have to perform under real-world conditions. Since a production system usually has many more individual vehicles than a test fleet, the amount of data which can now be captured is enormous. The challenge now is to extract the relevant data segments from this huge data stream which are most relevant for enhancing the model. This can be done, for example, by selecting specific “scenes” from the fleet data which represent particularly relevant real-world situations, which the model has not yet been trained on. A famous case here is the “white truck crossing a road making a U-turn on a bright, sunny day”, since such a scenario has once lead to a fatal accident with a autopilot.

When comparing the “first-time-right” approach with the continuous improvement approach, it is important to notice that the choice of the approach has a fundamental impact on the entire product design, and how it evolves in the long term. A first-time-right fire alarm is a much more basic product than a vehicle autopilot. The former can be trained using a data campaign which probably takes a couple of weeks, while the latter takes an entire product organization with thousands of AI and ML experts and data engineers, millions or cars on the road, and billions of test miles driven. But then also the value creation is hugely different here. This is why it is important for a product manager to understand the nature of this product, and which approach to choose (Fig. 21.8).

A progress diagram begins with lab simulation, which involves training and validation, test fleet for initial training and re-training, to continuous learning with the production fleet. — **Fig. 21.8**

The AIoT Data Loop

Getting feedback from the performance of the products in the field and applying this feedback to improve the AI models is key for ensuring that products are perfected over time, and that the models adapt to any potential changes in the environment. For connected products, the updated models can be re-deployed via OTA. For unconnected products, the learning can be applied to the next product generation.

The problem with many AIoT-enabled systems is: how to identify areas for improvement? With physical products used in the field, this can be tricky. Ideally, the edge-based model monitoring will automatically filter out all standard data, and only report “interesting” cases to the backend for further processing. But how can the system decide which cases are interesting? For this, on usually need to find an ingenious approach which often will not be obvious in the first place.

For example, for automated driving, the team could deploy an AI running in so-called shadow mode. This means the human driver is controlling the car, and the AI is running in parallel, making its own decisions but without actually using them to control the car. Every time the AI makes a decision different from the one of the human driver, this could be of interest. Or, let us take our vacuum robot example. The robot could try to capture situations which indicate sub-optimal product performance, e.g., the vacuum being stuck, or even being manually lifted by the homeowner. Another example is leakage detection for pneumatic systems, using sound pattern analysis. Every time the on-site technician is not happy with the system’s recommendations, he could make this known to the system, which in turn would capture the relevant data and mark it for further analysis in the back-office.

The processing of the monitoring data which has been identified as relevant will often be a manual or at least semi-manual process. Domain experts will analyze the data and create new scenarios, which need to be taught to the AI. This will result in extensions to existing data sets (or even new data sets), and new labels which represent the new lessons learned. This will then be used as input to the model re-training. After this, the re-trained models can be re-deployed or used for the next product generation (Fig. 21.9).

A diagram loop of the re-deployment or next product generation steps. It starts with products in the field, monitoring: production data and pre-filtering which are done on edge, to processing of field data and model re-training done on the cloud. The bottom lists how to identify areas for improvement. — **Fig. 21.9**

This means that in the AIoT Data Loop, data really is driving the development process. Marcus Schuster, project lead for embedded AI at Bosch, comments: Data driven development will have the same impact on engineering as the assembly line had on production. Let’s go about it with the necessary passion.

1.8 Fifth: Productize the AI Approach

Based on the lessons learned from the Proof-of-Concept, the chose AI approach must now be productized so that it can support real-world deployment. This includes refining the model inputs/outputs, choosing a suitable AI method/algorithm, and aligning the AI model metrics with UX and IoT system requirements.

Model Inputs/Outputs

A key part of the system design is the definition of the model inputs and outputs. These should be defined as early as possible and without any ambiguity. For the inputs, it is important to identify early on which data are realistic to acquire. Especially in an AIoT solution, it might not be possible technically or from a cost point of view to access certain data that would be ideal from an analytics point of view. In the UBI example from above, the obvious choice would be to have access to the driving performance data via sensors embedded in the vehicle. This would either require that the insurance can gain access to existing vehicle data or that a new, UBI-specific appliance be integrated into the vehicle. This is obviously a huge cost factor, and the insurance might look for ways to cutting this, e.g., by requiring its customer to install a UBI app on their smartphones and try to approximate the driving performance from these data instead.

One can easily see that the choice of input data has a huge impact on the model design. In the UBI example, data coming directly from the vehicle will have a completely different quality than data coming from a smartphone, which might not always be in the car, etc. This means that UBI phone app data would require additional layers in the model to determine if the data are actually likely to be valid.

It is also important that all the information needed to determine the model output is observable in the input. For example, if very blurry photos are used for manual labeling, the human labeling agent would not be able to produce meaningful labels, and the model would not be able to learn from it [19].

Choosing the AI Algorithm

The choice of the AI method/algorithm will have a fundamental impact not only on the quality of the predictions but also on the requirements regarding data acquisition/data availability, data management, AI platforms, and skills and resources. If the AI method is truly at the core of the AIoT initiative, then these factors will have to be designed around the AI methods. However, this might not always be possible. For example, there might be existing restrictions with respect to available skills, or certain data management technologies that will have to be used.

Figure 21.10 provides an overview of typical applications of AI and the matching AI algorithms. The table is not complete, and the space is constantly evolving. When choosing an AI algorithm, it is important that the decision is not only based on the data science point of view but also simply from a feasibility point of view. An algorithm that provides perfect results but is not feasible (e.g., from the performance point of view) cannot be chosen.

A table has 3 columns titled goals, typical questions and examples, and algorithms. The table depicts how A I is selected. — **Fig. 21.10**

In the context of an AIoT initiative, it should be noted that the processing of IoT-generated sensor data will require specific AI methods/algorithms. This is because sensor data will often be provided in the form of streaming data, typically including a time stamp that makes the data a time series. For this type of data, specific AI/ML methods need to be applied, including data stream clustering, pattern mining, anomaly detection, feature selection, multi-output learning, semi-supervised learning, and novel class detection [20].

Eric Schmidt, AI Expert at Bosch: “We have to ensure that the reality in the field – for example the speed at which machine sensor data can be made accessible in a given factory – is matching the proposed algorithms. We have to match these hard constraints with a working algorithm but also the right infrastructure, e.g., edge vs. batch.”

Aligning AI Model Metrics with Requirements and Constraints

There are usually two key model metrics that have the highest impact on user experience and/or IoT system behaviour: model accuracy and prediction times.

Model accuracy has a strong impact on usability and other KPIs. For example, if the UBI model from the example above is too restrictive (i.e., rating drivers as more risk-taking than they actually are), than the insurance might lose customers simply because it is pricing itself out of the market. On the other hand, if the model is too lax, then the insurance might not make enough money to cover future insurance claims.

Eric Schmidt, AI Expert at Bosch: “We currently see that there is an increasing demand in not only having accurate models, but also providing a quantification of the certainty of the model outcome. Such certainty measurements allow – for example – for setting thresholds for accepting or rejecting model results”.

Similarly, in autonomous driving, if the autonomous vehicle cannot provide a sufficiently accurate analysis of its environment, then this will result (in the worst case) in an unacceptable rate of accidents, or (in the best case) in an unacceptable rate of requests for avoidable full brakes or manual override requests.

Prediction times tell us how long the model needs to actually make a prediction. In the case of the UBI example, this would probably not be critical, since this is likely executed as a monthly batch. In the case of the autonomous driving example, this is extremely critical: if a passing pedestrian is not recognized in (near-) real time, this can be deadly. Another example would be the recognition of a speed limited by an AIoT solution in a manually operated vehicle: if this information is displayed with a huge delay, the user will probably not accept the feature as useful.

1.9 Sixth: Release MVP

In the agile community, the MVP (Minimum Viable Product) plays an important role because it helps ensure that the team is delivering a product to the market as early as possible, allows valuable customer feedback and ensures that the product is viable. Modern cloud features and DevOps methods make it much easier to build on the MVP over time and enrich the product step-by-step, always based on real-world customer feedback.

For most AIoT projects, the launch of the MVP is a much “bigger deal” than in a pure software project. This is because any changes to the hardware setup – including sensors for generating data processed by an AI – are much harder to implement. In manufacturing, the term used is SOP (Start of Production). After SOP, changes to the hardware design usually require costly changes to the manufacturing setup. Even worse, changing hardware already deployed in the field requires a costly product recall. So being able to answer the question “What is the MVP of my smart coffee maker, vacuum robot, or electric vehicle” becomes essential.

Jan Bosch is Professor at Chalmers University and Director of the Software Center: If we look at traditional development, I think the way in which you are representing the “When do I freeze what” is spot on. However, there is a caveat. In traditional development, I spend 90% of my energy and time obtaining the first version of the product. So I go from greenfield to first release, and I spend as little as possible afterwards. However, I am seeing many companies which are shifting toward a model that says “How do I get to a V1 of my product with the lowest effort possible?”. Say I am spending 10% on the V1, then I can spend 90% on continuously improving the product based on real customer feedback. This is definitely a question of changing the mindset of manufacturing companies.

Continuous improvement of software and AI models can be ensured today using a holistic DevOps approach, which covers all elements of AIoT: code and ML models, edge (via OTA) and cloud. This is discussed in more detail in the AIoT DevOps section.

Managing the evolution of hardware is a complex topic, which is addressed in detail in the Hardware.exe section.

Finally, the actual rollout or Go-to-Market perspective for AIoT-enabled solutions and products is not to be underestimated. This is addressed in the Rollout and Go-to-Market section.

1.10 Required Skills and Resources

AI projects require special skills, which must be made available with the required capacity at the required time, as in any other project situation. Therefore, it is important to understand the typical AI-roles and utilize them. Additionally, it is important to understand how the AI team should be structured and how it fits into the overall AIoT organization.

There are potentially three key roles required in the AI team: Data Scientist, ML Engineer, and Data Engineer. The Data Scientist creates deep, new Intellectual Property in a research-centric approach that can potentially require a 3 to 12-month development time or even longer. So the project will have to make a decision regarding how far a Data Science-centric approach is required and feasible, or in how far re-use of existing models would be sufficient. The ML Engineer turns models developed by data scientists into live production systems. They sit at the intersection of software engineering and data science to ensure that raw data from data pipelines are properly fed to the AI models for inference. They also write production-level code and ensure scalability and performance of the system. The Data Engineer creates and manages the data pipeline that is required for training data set creation, as well as feeding the required data to the trained models in the production systems (Fig. 21.11).

A framework of the A I roles and their tools of the trade. It has a domain expert, data scientist, M L engineer, and data engineer. — **Fig. 21.11**

Another important question is how the AI team works with the rest of the software organization. The Digital Playbook proposes the adoption of feature teams, which combine all the required skills to implement and deploy a specific feature. On the other hand, especially with a new technology such as AI, it is also important that experts with deep AI and data skills can work together in a team to exchange best practices. Project management has to carefully balance this out.

1.11 Model Design and Testing

In the case of the development of a completely new model utilizing data science, an iterative approach is typically applied. This will include many iterations of business understanding, data understanding, data preparation, modeling, evaluation/testing, and deployment. In the case of reusing existing models, the model tuning or – in the case of supervised learning models – data labeling should also not be underestimated (Fig. 21.12).

A circle diagram of the data process involves evaluation, deployment, business understanding, data understanding, data preparation, and modeling. — **Fig. 21.12**

1.12 Building and Integrating the AI Microservices

A key architectural decision is how to design microservices for inference and business logic. It is considered good practice to separate the inferencing functions from the business logic (in the backend, or – if deployed on the asset – also in the edge tier). This means that there should be separate microservices for model input provisioning, AI-based inferencing, and model output processing. While decoupling is generally good practice in software architecture, it is even more important for AI-based services in case specialized hardware is used for inferencing (Fig. 21.13).

A diagram illustrates the I o T development and testing process and the A I development and testing process. — **Fig. 21.13**

1.13 Setting Up MLOps

Automating the AI model development process is a key prerequisite not only from an efficiency point of view, but also for ensuring that model development is based on a reproducible approach. Consequently, a new type of DevOps is emerging: MLOps. With the IoT, MLOps not only have to support cloud-based environments but also potentially the deployment and management of AI models on hundreds – if not hundreds of thousands – of remote assets. In the Digital Playbook there is a dedicated section on Holistic DevOps for AIoT because this topic is seen as so important (Fig. 21.14).

A diagram of the A I o T dev ops process includes code and model, deploy, operate, monitor, plan, release, test, and build. — **Fig. 21.14**

1.14 Managing the AIoT Long Tail: AI Collaboration Platforms

When addressing the long tail of AI-enabled opportunities, it is important to provide a means to rapidly create, test and deploy new solutions. Efficiency and team collaboration are important, as is reuse. This is why a new category of AI collaboration platforms has emerged, which addresses this space. While high-end products on the short tail usually require very individual solutions, the idea here is to standardize a set of tools and processes that can be applied to as many AI-related problems as possible within a larger organization. A shared repository must support the workflow from data management over machine learning to model deployment. Specialized user interfaces must be provided for data engineers, data scientists and ML engineers. Finally, it is also important that the platforms support collaboration between the aforementioned AI specialists and domain experts, who usually know much less about AI and data science (Fig. 21.15).

The A I framework depicts the roles. The represented roles are data engineer, domain expert, data scientist, and M L engineer. The loop process for the roles is data management, machine learning, and model deployment. The A I collaboration platform is depicted at the bottom with platform, integration, and enterprise systems. — **Fig. 21.15**

2 Data.exe (Fig. 21.16)

As part of their digital transformation initiatives, many companies are putting data strategies at the center stage. Most enterprise data strategies are a mixture of high-level vision, strategic principles, goal definitions, priority setting, data governance models, architecture tools and best practices for managing semantics and deriving information from raw data.

The data dot e x e framework for A I o T product and solution consist of a data strategy which moves from data sourcing to use cases, data pipeline, data capabilities and resource availability and data governance. — **Fig. 21.16**

Since both AI and IoT are also very much about data, every AIoT initiative should also adopt a data strategy. However, it is important to note that this data strategy must work on the level of an individual AIoT-enabled product or solution, not the entire enterprise (unless, of course, the enterprise is pretty much built around said product/solution). This section of the AIoT Framework proposes a structure for an AIoT Data Strategy and identifies the typical dependencies that must be managed.

2.1 Overview

The AIoT Data Strategy proposed by the AIoT Framework is designed to work well for AIoT product/solution initiatives in the context of a larger enterprise. Consequently, it focuses on supporting product/solution implementation and long-term evolution and tries to avoid replicating typical elements of an enterprise data strategy (Fig. 21.17).

The framework of data strategy has use cases, data sourcing, data pipeline, data capabilities and resource availability, and data governance. — **Fig. 21.17**

The AIoT Data Strategy has four main elements. First, the development of a prioritization framework that aims to make the relationship between use cases and their data needs visible. Second, management of the data-specific implementation aspects, as well as the Data Lifecycle Management. Third, Data Capabilities required to support the data strategy. Fourth, a lean and efficient Data Governance approach was designed to work on the product/solution level.

Of course, each of these four elements of the AIoT Data Strategy has to be seen in the context of the enterprise that is hosting product/solution development: Enterprise Business Strategy must be well aligned with the use cases. Data-specific implementation projects frequently have to take cross-organization dependencies into consideration, e.g., if data are imported or exported across the boundaries of the current AIoT product/solution. Product/solution-specific data capabilities must be aligned with the existing enterprise capabilities. Product/solution-specific data governance always has to take existing enterprise-level governance into consideration.

2.2 Business Alignment & Prioritization

The starting point for business alignment and prioritization should be the actual use cases, which are defined and prioritized by business sponsors, or Epics which have been prioritized in the agile backlog. Sometimes, Epics might be too coarse grained. In this case, Features can be used instead.

For each Use Case/Epic, an analysis from the data perspective should be completed:

What are the actual data needs to support the Use Case/Epic?
Which of these data is believed to be already available, which must be newly acquired?
How can the required data quality be ensured for the particular use case?
What are potential financial aspects of the data acquisition?
How do the use cases support the monetization side of things?
Is this a case where the required data adds functional value to the use case, or is there a direct data monetization aspect to it?
What are the relationships between the identified data and the other elements of the AIoT Data Strategy: Implementation & Data Lifecycle Management, specific capabilities applying to this particular kind of data, and Data Governance?

A key aspect of the analysis will be the Data Acquisition perspective. For data that can (at least theoretically) be acquired within the boundaries of the AIoT product/solution organization, the following questions should be answered:

Is the required technical infrastructure already available?
Does the team have the required capabilities and resources available?
Especially in the case of AIoT data acquired via sensors:
- Are new sensors required?
- If so, what is the additional development and unit cost?
- Is there an additional downstream cost from the asset/sensor line-fit point of view (i.e. additional manufacturing costs)?
- What is the impact on the business plan?
- What is the impact on the project plan?
- What are the technical risks for new, unknown sensor technologies?
- What are required steps in terms of sourcing and procurement?

For data that need to be acquired from other business units, a number of additional questions will have to be answered:

Is it technically feasible to access the data (availability of APIs, bandwidth, support of required data access frequency and volume, etc.)?
Can the neighboring business unit support your requirements, not only in terms of technical access, but also in terms of project support and timelines?
Are there costs involved in technical implementation and/or data access (internal billing)?
Are there potential limitations or restrictions due to existing internal data governance guidelines, regional or organizational boundaries, etc.?

For data that have to be acquired from external partners or suppliers, there are typically a number of additional complexities that will have to be addressed:

Technical feasibility across enterprise boundaries
Legal framework required for data access
SLA insurance
Billing and cost management

Based on all of the above, the team should be able to assess the overall feasibility and costs/efforts involved on a per use case/per data item basis. This information is then used as part of the overall prioritization process.

2.3 Data Pipeline: Implementation & Data Lifecycle Management

Sometimes it can be difficult to separate data-specific implementation aspects from general implementation aspects. This is an issue that the AIoT Data Strategy needs to deal with to avoid redundant efforts. Typical data-specific implementation and Data Lifecycle Management aspects include the following:

Data Ingestion: In our context, data ingestion should first be seen as moving data from outside of our organization’s boundary to within. Second, technical aspects such as stream vs. batch processing need to be addressed. Typical data ingestion tasks also include cleansing and quality assurance.
Storage: Depending on the business and technical requirements, data can be stored permanently or temporarily, structured or unstructured, with or without backup, with cache-only or with operational/transactional support, etc. This often needs to be addressed differently for different data types.
Integration: Data integration is the process of merging data from different sources into a single, unified view. In the case of AIoT, this can be – for example – sensor data fusion, done close to the sensors in the edge layer. Or it can be – usually on a high-level of abstraction – a real-time data stream integration process. Or it can be – typically further in the backend – a batch-oriented integration process.
Transformation: Many projects spend much time with data transformation, since this is often a prerequisite for data integration or further data processing. The approaches chosen usually vary widely depending on the format, structure, complexity, and volume of the data being transformed.
Modeling: Data modeling is usually a key step toward dealing with semantics of data and deriving information from raw data. There are different levels of data modeling, including conceptual, logical and physical levels. Another important type of model building on top of data models is AI/ML models. However, these models are usually less data-structure oriented and more mathematical/statistical models.
Validation: Data validation is the tool that helps ensure data quality, e.g., by applying data cleansing and validation checks. Data validation can use simple, local “validation rules” or “validation constraints” that check for correctness and meaningfulness (e.g., a date of birth cannot be in the future). In some cases, data validation can actually be much more complex, e.g., involving interactions with remote systems, or even AI/ML-based validation algorithms.
Analysis: In many cases, data analysis is a key use case other than, for example, transactional use of the data. Generally, data analysis supports the discovery of useful information and supports decision-making. Data analysis is a multifaceted topic. It is key that the required Data Capabilities are provided to support here.
Access Control & Security: Finally, effectively ensuring confidentiality and secure handling of data must be part of every AIoT data strategy. This includes both IoT data coming from assets and data combining from users, other business units, or event external data sources. While security is sometimes dealt with on a different level, fine-grained data access control must usually be dealt with as part of the data strategy.

Another key aspect of Implementation & Data Lifecycle Management is dealing with cross-organizational dependencies. While the earlier data acquisition phase might have already answered some of the high-level questions related to this topic, on the implementation level efficient stakeholder management is a key success factor. Often, earlier agreements with respect to technical data access or commercial conditions, will have to be reviewed, revised or refined during the implementation phase. Some practitioners say that this can sometimes be more difficult in the case of cross-divisional data integration within one enterprise than across enterprise boundaries.

2.4 Data Capabilities and Resource Availability

Data-related capabilities can be important in a number of different areas, including:

Skills: Data-related skills can include a number of areas, including specific data-processing technologies and mathematical, statistical, or algorithmic skills in AI/ML, etc.
Technology: For an AIoT product/solution initiative, it is usually important that technical management agrees on fixed setup technologies that cover most of the required use cases, e.g., batch vs real-time processing, basic analytics vs AI/ML, etc.
Processes & Methods: Depending on the specific environment, this can also be a very important aspect. Data-related processes and methods can be specific to a certain analytics method, or they can be related to certain processes and methods defined by an enterprise organization as mandatory.

Depending on the project requirements, it is also important that specific capabilities be supported by appropriate resources. For example, if it is clear that an AIoT project will require the development of certain AI/ML algorithms, then the project management will have to ensure that this particular capability is supported by skilled resources that are available during the required time period. Managing the availability of such highly specialized resources is a topic that can be difficult to align with the pure agile project management paradigm and might require longer-term planning, involving alignment with HR or sourcing/procurement.

2.5 Data Governance

Larger AIoT product/solution initiatives will require Data Governance as part of their Data Strategy. This Data Governance cannot be compared with a Data Governance approach typically found on the enterprise level. It needs to be lightweight and pragmatic, covering basic aspects such as:

Data & Trust Policies: How is this specific AIoT product/solution dealing with this topic? This is likely to be very use case specific, so the AIoT initiative will have to build on generic enterprise-level requirements but will have to add policies specific to its own use case.
Data Architecture: It is not always clear if data architecture is a discipline on its own, or if this is simply one facet of the product/solution architecture. For example, the AIoT Framework has a dedicated viewpoint to support the combination of data and functionality.
Data Lineage: Data lineages traces where data originate, what happens with it on the way, and where it moves over time. Data lineage provides visibility and transparency and can help simplify root cause analysis in the data analytics process. Data Governance can either support the central documentation of data lineages or provide tools and best practices for implementation teams.
Metadata Management and Data Catalog: Efficient management of metadata is a prerequisite for efficient data processing and analytics. Types of metadata include descriptive, structural and administrative. A data catalog can provide support for metadata management, together with other tools, such as search.
Data Model Management: For many AIoT applications, centrally managing a high-level data model that describes key entities and their relationships, as well as dependencies on different use cases and components, can be of great help in creating transparency and improving alignment between different teams. The AIoT Framework proposes a lightweight AIoT Domain Model approach. In addition, the Data Governance team could also provide tooling and best practices for teams that need more detailed models in their areas. This can also be linked back to the Metadata Management and Data Catalog topics.
API Management: In his famous “API Mandate”, Amazon CEO Jeff Bezos declared that “All teams will henceforth expose their data and functionality through service interfaces.” at Amazon. This executive-level support for an API-centric way of dealing with data exchange (and exposing component functionality) shows how important API management has become at the enterprise level. The success of an AIoT initiative will also depend strongly on it. If there is no enterprise-wide API infrastructure and management approach available, this is a key support element that must be provided and enforced by the Data Governance team.

Finally, the Data Governance/Data Strategy team should give itself a setup of KPIs by which they can measure their own success and the effectiveness and efficiency of the AIoT Data Strategy.

3 Digital Twin.exe (Fig. 21.18)

As discussed in Digital Twin 101, a Digital Twin is the virtual representation of a real-world physical object. Digital Twins help manage complexity by providing a semantically rich abstraction layer, especially for systems with a high level of functional complexity and heterogeneity. As an AIoT project lead, one should start by looking at the question “Is a Digital Twin needed, and if so – what kind of Digital Twin?” before defining the Digital Twin implementation roadmap.

The digital twin dot e x e pipeline process starts with business understanding, digital twin requirements, twin design and modeling, development and integration, test and simulation, and deployments. When a digital twin is in production, it has applications like digital product features, operations management, and product design and manufacturing. — **Fig. 21.18**

3.1 Is a Digital Twin Needed?

The decision of whether and when to apply the Digital Twin concept in an AIoT initiative will depend on at least two key factors: Sensor Data Complexity/Analytics Requirements and System Complexity (e.g., the number of different machine types, organizational complexity, etc.).

If both are low, the system will probably be fine with using Digital Twin more as a logical design concept and applying traditional data analytics. Only with increasing sensor data complexity and analytics requirements will the use of AI be required.

High system complexity is an indicator that dedicated Digital Twin implementation should be considered, potentially utilizing a dedicated DT platform. The reason is that a high system complexity will make it much harder to focus on the semantics. Here, a formalized DT can help (Fig. 21.19).

A diagram depicts the connection between the digital twin and A I o T. — **Fig. 21.19**

3.2 If So, What Kind of Digital Twin?

Since Digital Twin is a relatively generic concept, the concrete implementation will heavily depend on the type of data that will be used as the foundation. Since Digital Twins usually refer to physical assets (at least in the context of our discussions), the potential data can be identified along the lifecycle of a typical physical asset: design data or digital master data, simulation data, manufacturing/production data, customer data, and operational data. For the operational data, it is important to differentiate between data related to the physical asset itself (e.g., state, events, configuration data, and history) versus data relating to the environment of the asset.

Depending on the application area, the Digital Twin (DT) can have a different focus. The Operational DT will mainly focus on operational data, including the internal state and data relating to the environment. PLM-focused DT will combine the product/asset design perspective with the operational perspective, sometimes also adding manufacturing-related data. The simulation-focused DT will combine design data with operational data and apply simulation to it. And finally, the holistic DT will combine all of the above (Fig. 21.20).

An illustration of the categorization of D T with icons at the top and below are design data and digital master, simulation data, product data, customer, and operational data. — **Fig. 21.20**

3.3 Examples

The Digital Twin concept is quite versatile, and can be applied to many different use cases. Figure 21.21 provides an overview of four concrete examples and how they are mapped to the DT categories introduced earlier.

A table displays the data type and examples for the operational digital twin, P L M-focused digital twin, simulation-focused digital twin, and holistic digital twin. — **Fig. 21.21**

The drone-based building facade inspection is covered in detail in the TÜV SÜD case study. The physics simulation example is covered in the Digital Twin 101 section. The following provides an overview of the pneumatic system example, as well as the elevator example.

Operational DT: Pneumatic System

Leakage detection for pneumatic systems is a good example for an operational Digital Twin. Pneumatic systems provide pressured air to implement different use cases, e.g., the drying of cars in a car wash, eliminating bad grains in a stream of grains analyzed using high-speed video data analytics, or cleaning bottles in a bottling plant. Experts estimate that pneumatic systems consume 16 billion kilowatt hours annually, with a savings potential of up to 50% (mader.eu). In order to address this savings potential, an AIoT-enabled leakage detection system can help to identify and fix leakages at customer sites. One such solution is currently developed by the AIoT Lab. This solution is based on a combination of ultrasound sensors and edge-ML for sound pattern analysis. The solution can be used on-site to perform an analysis of the customer’s pneumatic application for leakages. The results can then be used by a service technician to fix the problems and eliminate the leakages (Fig. 21.22).

The flow diagram of A I and I o T has a digital twin in the cloud and starts with the process dashboard, service technician, fix leakages, ultrasound microphone, edge node, and leakage detection. — **Fig. 21.22**

The foundation for the leakage detection system is an operational Digital Twin. Since customers usually don not provide detailed design information about their own systems, the focus here is to obtain as much information during the site visit and build up the main part of the Digital Twin dynamically while being on site. The system is based on Digital Twin data in four domains:

Domain I includes the components of the AIoT solution itself, e.g., the mobile gateways and ultrasound sensors. This DT domain is important to support the system administration, e.g., OTA-based updates of the ML models for sound detection.
Domain II includes the pneumatic components found on-site, including pressure generators, pressure tanks, valves, etc. The definitions of these components are provided via the product catalogue, and can be selected dynamically on-site.
Domain III includes the fuselage and how it is mapped to the applications of the customer. Key parts of the customer equipment must be identified and included in the DT model for documentation purposes. Usually, only those parts of the customer equipment are captured that are involved with any of the leakages found.
Domain IV includes the leakages that are identified during the on-site assessment. These leakages are also captured as Digital Twins, including information about the related sound patterns, as well as the position of the leakage relative to DT information from domains II and III.

The creation of the Digital Twins happens along these domains: DT data in domain I are created once per test equipment pack. Domains II-IV are created dynamically and per customer site (Fig. 21.23).

A framework diagram of the digital twin domain 5 for leakages. The diagram is labeled digital twin domain 3, digital twin domain 2, and digital twin domain 1. Below is a road map for the initial setup 1 for site analysis. — **Fig. 21.23**

Daniel Burkhardt, Chief Product Owner, AIoT Lab: We have the goal of providing a solution architecture that enables ML model reuse and holistic AIoT DevOps. The implementation of leakage detection based on a Digital Twin of a pneumatic system provided us with relevant insights about the requirements and design principles for achieving this goal. In comparison to typical software development, reuse and AIoT DevOps require design principles such as continuous learning, transferability, modularization, and openness. Realizing these principles will guarantee the ease of use of AIoT for organizations with, e.g., no technological expertise, which in the long term leads to more detailed and meaningful Digital Twins and thus more accurate and valuable analytics.

Holistic Digital Twin: DT and Elevators

A good example of the use of a holistic Digital Twin approach is elevators, since they have a quite long and complex lifecycle that can benefit from this approach. What is interesting here as well is the combination of the elevator lifecycle in combination with the building lifecycle, since most elevators are deployed in buildings. The example in the following shows how a standard elevator design is fitted into a building design. This is a complex process that needs to take into consideration the elevator design specification, building design, elevator shaft design, and required performance parameters (Fig. 21.24).

A screenshot depicts a view from a camera displaying the 3-D model of the digital twin. — **Fig. 21.24**

The CAD model and EBOM data of the elevator design can be a good foundation for the digital twin. To support efficient monitoring of the elevator during the operations phase, an increasing number of advanced sensors have been applied. These include, for example, sensors to monitor elevator speed, braking behavior, positioning of the elevator in the elevator shaft, vibrations, ride comfort, doors, etc. Based on these data, a dashboard can be provided that provides reports for the physical conditions and the elevator utilization.

One pain point for building operators is the usually mandatory on-site inspections by a third party inspection service. Using advanced remote monitoring services based on a digital twin of the elevator, some countries are already allowing combination or remote and on-site inspections. For example, instead of 12 on-site inspections per year, this could be reduced to 4 on-site inspections with 8 inspections being performed remotely. This helps save costs and reduces operations interruptions due to inspection work.

The Digital Twin concept helps brings together all relevant data, and allows semantic mappings between data from different perspectives and created during different stages of the lifecycle (Fig. 21.25).

A framework displays the process of digital twins from product design to custom fit and manufacturing, to deployment and operations, to inspection and maintenance. Below is the description and digital twin data. — **Fig. 21.25**

3.4 Digital Twin Roadmap

From the execution perspective, a key question is how to design a realistic roadmap for the different types of Digital Twins we have looked at here. The following provides two examples, one from the automotive perspective and one from the building perspective.

Operational Digital Twin (Vehicle Example)

Let us assume an OEM wants to introduce the Digital Twin concept as part of their Software-defined Vehicle initiative. Over time, all key elements of the vehicle should be represented on the software layer as Digital Twin components. How should this be approached?

Importantly, this should be done step by step or more precisely use case by use case. Developing a Digital Twin for a complex physical product can be a huge effort. The risk of doing this without specific use cases and interim releases is that the duration and cost involved will lead to a cancellation of the effort before it can be finished. This is why it is better to select specific use cases, then develop the Digital Twin elements required for them, release this, and show value creation along the way. Over time, the Digital Twin can then develop to an abstraction layer that will cover the entire asset, hopefully enabling reuse for many different applications and use cases (Fig. 21.26).

A diagram displays 4 use cases for digital twins applications, D T based applications, and the physical world. — **Fig. 21.26**

Holistic Digital Twin (Building Example)

A good example of use of a holistic Digital Twin concept from design to operation and maintenance is the digital building lifecycle:

During the building design phase, the BIM (Building Information Model) approach can help optimize the design with simulation and automated validation. This way, aspects such as future operational sustainability and capacity can be evaluated. Automated design validation provides a higher level of planning safety.
During the building construction process, AIoT-enabled solutions such as robot-based construction progress monitoring can provide transparency and reliability. Meeting budgets and timelines can be better ensured.
Sub-systems like elevators can also be integrated into the Digital Twin approach, as discussed in the previous section.
Finally, building inspection can be supported by solutions such as the Drone-based façade inspection. The results of the façade inspection can be mapped back to the Digital Twin, augmenting the planning data with real-world as-is data.

The decision for a BIM/Digital Twin-based approach for building and construction is strategic. Upfront investments will have to be made, which must be recuperated through efficiency increased further downstream. The holistic Digital Twin approach here is promising, but requires a certain level of stringency to be successful (Fig. 21.27).

A diagram displays building design, building construction, building sub-systems, and building inspection for D T-based applications, digital twins, and the physical world. — **Fig. 21.27**

3.5 Expert Opinion

The following short interview with Dominic Kurtaz (Managing Director for Dassault Systèmes in Central Europe) highlights the experience that a global PLM company is currently making with its customers in the area of AIoT and Digital Twins.

Dirk Slama: Welcome Dominic. Can you briefly introduce your company?
Dominic Kurtaz: Dassault Systèmes consists of 20,000 inspired people around the world, developing software solutions and supporting clients in the manufacturing, healthcare and life science sector, as well as the infrastructure sector. We help to digitally design and manufacture more than 1 in 4 of the physical products you touch every day, with a focus on how they are being used by the end users and consumers. We believe that the virtual world can enhance and improve the overall physical world toward a more sustainable world, which I think is probably a good segue to the whole topic of AIoT.
Dirk: In this context, AIoT and Digital Twins can play an important role as enablers. What kind of activities and investments are you currently seeing in this space?
Kurtaz: When people think of AIoT or IoT, they immediately think of operational performance measurements with sensors, predictive maintenance, and so on. Which of course is a very valid application, but we need to think far beyond that. This is why I like this concept of a holistic Digital Twin. We need to take a step back from IoT right now. When you are looking at the Experience Economy, you will see that the value that we perceive as customers and consumers is going increasingly away from the actual product itself. Today, it is often much more about the end-to-end experience: how the product is perceived how we select it, how we are using it, and how we dispose and recycle the product. The end-to-end life cycle experience is clearly important. From my experience, we need to look at the IoT through the eyes of the customer and the eyes of the consumer. First, we have to understand how business strategies and business execution with AIoT can truly support and improve those aspects.
Second, I believe that the Digital Twin is truly becoming pervasive across industries and all products. Take, for example, one of the most mundane products that we experienced in our lives – the light bulb. If you go back 10 years, it was just this item at the end of the shopping list that you grab off from the shop shelf without thinking much about it – you bought it, you screwed it in, you turned it on and off, and hopefully you would never have to think about that product again for the next years…until it breaks.
Today, this is fundamentally changing. I am not just buying a commodity product for my house anymore. I am buying something that is part of a connected ecosystem. I can set different moods at home using different light configurations. I can use smart lighting as part of my home security system. From a business perspective, this is a game changer. Light bulb manufacturers are no longer just producing light bulbs – today, they are connected to their customers. In the past, we did not know our customers or how they were using our products. Today, − enabled by the IoT – I can have a direct relationship with my customer. This will change things on many levels and opens up new business models.
Thus far, we have only seen the tip of the iceberg: although many of the enabling technologies are reaching a good level of maturity, the actual implementations are often still very immature and limited to those basic connectivity features – but not delivering the holistic Digital Twin experience. For example, I have recently bought a new kitchen, including connectivity to my smart home. Now I can control and integrate it into my own kitchen facilities. This is really good and interesting as well as delivering additional features but I was not able to experience and understand the value that it can really bring until after I had purchased all of those IoT enabled and connected products. And in today’s world, I should have been able to use a Digital Twin of the product prior to my buy to fully understand not just the product, but the behavior, the context, the operational aspect of that post my buying – and that is simply not yet possible. Take, as another example, mobility. As a customer, I should be able to experience all these new features such as advanced driver assistance, before I acquire the physical product – enabled by a holistic Digital Twin. I really want to be able to experience in the virtual world how these products are going to behave, before using them in the physical world. This is also very helpful for product development, because it allows us to validate the customer experience in the virtual world – before making expensive investments in physical prototypes.
From what I am seeing from our customers, this is not just a hype or a fad. I think it is absolutely mission critical for anybody who is designing and manufacturing products, and dealing with the digital experience of those products. We see this across all industries where we are operating: manufacturing, healthcare, life science, and infrastructure.
Dirk Slama: What are your recommendations from the implementation perspective?
Dominic Kurtaz: You need a clear focus on the end user experience that you are trying to deliver. This will determine the holistic design philosophy you need to apply. Many companies have started with Big Data, and they are now drowning in it. The problem is to find and connect the data that are relevant for the end user experience. The connection of digital, semantic models with data will open up potential for all industries. Of course, this has to be done step-by-step, use case by use case – building up the holistic Digital Twin with a clearly value-driven approach.
Another key aspect is the alignment between the digital supply chain and the physical supply chain. For the IT, we have Continuous Integration and Continuous Delivery (CI/CD). For the physical product, we have simultaneous engineering and closed loop PLM. The challenge is now to close the even bigger loop around all of this– bringing IT DevOps together with physical product engineering. This is exactly where AIoT and Digital Twin will play an important role. AIoT enables new digital/physical product features. And the Digital Twin is the semantic interface between the digital and the physical world. During design and development, the Digital Twin helps create the required interfaces at the technical and the organizational levels. During runtime, it enables a new customer experience.

4 IoT.exe (Fig. 21.28)

The IoT perspective in AIoT is usually much more focused on the physical product and/or the site of deployment, as well as the end-to-end system functionality. In this context, it makes sense to look at the IoT through the lens of the process that will support building, maintaining and enhancing the end-to-end system functionality. The AIoT Framework is based on the premise that overall an agile approach is desirable, but that due to the specifics of an AIoT system, some compromises will have to be made. For example, this could concern the development of embedded and hardware components, as well as safety and security concerns.

The I o T dot e x e application framework has feature and product increments, A I o T pipelines, and sprints. — **Fig. 21.28**

Consequently, the assumption is that there is an overarching agile release train, with different (more or less) agile work streams. Each workstream represents some of the key elements of the AIoT system, including cloud services, communication services and IoT/EDGE components. In addition, AIoT DevOps & Infrastructure as well as cross-cutting tasks such as security and end-to-end testing are defined as workstreams. Finally, asset preparation is a workstream that represents the interface to the actual asset/physical product/site of deployment.

The following provides a more detailed description of each of the standard work streams:

Agile Release Train: Responsible for end-to-end coordination, UX, and system architecture; ultimately responsible for ensuring that the AIoT system is implemented, tested, deployed and released
Cross-Cutting: Addresses tasks that are cutting especially across the cloud and IoT/EDGE, including end-to-end security, testing and QA
AIoT DevOps & Infrastructure: Must provide the infrastructure and processes for automating the AIoT system lifecycle, utilizing the AIoT DevOps concepts outlined in the AIoT Framework
Cloud Services: Should more accurately be called Backend services, including cloud and on-premises AIoT applications, as well as enterprise system integration/EAI. Must also address the backend side of Digital Twin, as well as AIoT-related business processes
Communication Services: Must provide LAN and WAN communication services. Can involve complex service contract negotiations in case a global AIoT WAN is required
IoT/EDGE Components: Includes responsibility for the development/procurement of all hardware (e.g., gateways, sensors), software, firmware and AI/ML execution environments deployed on or near the asset/product
Asset Preparation: Must ensure that the asset/physical product (or, in the case of an AIoT solution, the sites of deployment) are prepared to work with the AIoT system. Must include basic tasks such as ensuring power supply and providing storage/assembly points for AIoT hardware components

The following will look at both the product and solution perspectives in more detail.

4.1 Digital OEM: Product Perspective

This section looks at key milestones for an AIoT-enabled product, along the work streams defined earlier:

Basic prototype/pilot: Must include a combination of what will later become the AI-enabled functionality (could be scripted/hard coded at this stage), plus basic system functionality and ideally a rudimentary prototype of the actual asset/physical product (A/B samples). Should show end-to-end how the different components will interact to deliver the desired user experience
Fully functional prototype: Functional, basic prototype with full AIoT functionality and a relatively high level of maturity. Must include first real AI models and AI-driven functionality, as well as full asset/physical product functionality (C/D samples). After this, both the APIs between the cloud and EDGE should be stable, as well as the interfaces to the asset/physical product (power lines, antenna and gateway fastening, etc.).
AIoT MVP: This focuses only on the AIoT elements, assuming that the asset/physical product will no longer undergo any major changes. The AIoT MVP must not only be functionally complete, but also ensure that all procurement aspects are finalized. Furthermore, a fully automated AIoT DevOps infrastructure, including cloud, IoT and AI pipelines, should be developed
SOP (Start of Production): This is the day of no return: the manufacturing lines will now start processing assets/physical products and shipping them to customers around the world. Any changes/fixes on the hardware side will now become very costly or nearly impossible. Currently, the required operations support must also be fully operational (either providing fully automated online support services, or call-center or even on-site field services)
Cloud SW Updates after SOP: This must utilize the AIoT DevOps pipeline, including Continuous Integration and Continuous Testing for quality purposes
EDTE SW Updates after SOP: Finally, this must utilize the established OTA infrastructure to deliver updates to assets in the field (which will have already been established in the later stages of system field tests)

Note that this perspective does not differentiate between the hardware engineering and manufacturing perspectives of the on-asset AIoT hardware vs. the actual asset/physical product itself. Furthermore, it also does not differentiate between line-fit and retrofit scenarios (Fig. 21.29).

The A I o T framework has a basic prototype and pilot, a fully functional prototype, A I o T M V P, S O P start of production, first cloud S W- update, and first E D G E S W- update. — **Fig. 21.29**

4.2 Digital Equipment Operator: Solution Perspective

An AIoT solution is usually not focused on the design/manufacturing of assets/physical products. In many cases, assets are highly heterogeneous, and the AIoT solution components will be applied using a retrofit approach. Instead of asset preparation, the focus is on site preparation. Additionally, the level of productization is usually not as high.

This makes the process and the milestones easier and less complex:

Pilot: Usually, much more lightweight; could simply be some sensors retrofitted to an existing asset, with a WLAN connection to a standard cloud backend
MVP: Again, more lightweight and most likely also less sophisticated in terms of process automation
Roll-out: Critical part of the process: not only in technical terms but also in terms of fulfilling on-site user expectations
First Cloud SW-Update: Should be automated, utilizing existing standard cloud DevOps mechanisms
First EDGE SW-Update: Can be automated and utilizing OTA, but for small-scale solutions; potentially also manual (Fig. 21.30)

The A I o T framework has pilot, M V P, roll-out, first cloud S W-update, and first E D G E S W-update. — **Fig. 21.30**

5 Hardware.exe (Fig. 21.31)

The execution of the hardware implementation can vary widely. For a simple retrofit solution using commercial-off-the-shelf hardware components, this will mainly be a procurement exercise. For an advanced product with complex, custom hardware, this will be a multidisciplinary exercise combining mechanical engineering, electric and electronic engineering, control system design, and manufacturing.

The A I o T hardware dot e x e application has a Venn diagram of 4 sets. The sets are labeled mechanical systems, electronic systems, control systems, and computers. — **Fig. 21.31**

5.1 A Multidisciplinary Perspective

The development of custom hardware often requires a multidisciplinary perspective. Take, for example, the development of the predictive maintenance solution for hydraulic systems, introduced in the case study section. Here, the design and manufacturing of the actual hydraulic system components is not in scope. However, hardware design and manufacturing still include a number of elements:

Custom hardware for the Data Acquisition Hub (DAQ)
A number of custom sensor packages to monitor electric motors, hydraulic pumps, tanks, oil quality, filters, and so on
Custom connecting elements for fitting the sensors onto the hydraulic components

To develop this hardware, a number of different skills are required, including strong domain knowledge, knowledge about electronic systems, control systems, and embedded compute nodes.

If we go even further and consider real digital/physical products – like a vacuum robot or a smart kitchen appliance – we will even need to include mechanical systems engineering in the equation to build the physical product.

Mechatronics is the discipline that brings all these perspectives together, combining mechanical system engineering, electronic system engineering, control system engineering and embedded as well as general IT system engineering. The intersection between mechanical systems and electronic systems is often referred to as electromechanics. The intersection between electronic systems and control systems includes control electronics. The intersection between control systems and computers includes digital control systems. Mechanical systems usually require mechanical CAD/CAM for system design and modelling, as well as validation via simulation. Model Based System Engineering (MBSE) supports this with collaboration platforms covering system requirements, design, analysis, verification and validation (Fig. 21.32).

A Venn diagram of 4 sets. The sets are labeled mechanical systems, electronic systems, control systems, and computers. Inner sets are sensors and actuators, system modeling and simulation, control electronics, digital control systems, and microcontrollers. — **Fig. 21.32**

5.2 Embedded Hardware Design and Manufacturing

Embedded hardware design and manufacturing are often at the heart of AIoT development because even for a retrofit solution, this is often a key requirement. Even if standard microprocessors, CPUs, sensors and communications modules are used, they often have to be combined into a custom design to exactly fit the project requirements. During the planning phase, hardware requirements are captured in a specification document. The analysis and design phase includes feasibility assessment, schematic PCB (Printed Circuit Board) design and layout, and BOM (Bill of Material) optimization. Procurement should not be underestimated, including component procurement and supply chain setup. The actual board bring-up includes hardware assembly, software integration, testing and validation, and certification. Manufacturing preparation includes machine configuration, assembly preparation, as well as automated inspection. After the SOP (Start of Production), logistics and shipment operations as well as customer support will have to be ensured (Fig. 21.33).

A pipeline flow diagram starts with planning, analysis and design, procurement, board bring-up, manufacturing preparation, and manufacturing and support. — **Fig. 21.33**

5.3 Minimizing Hardware Costs vs. Planning for Digital Growth

In the past, almost all digital/physical products have been optimized to minimize the hardware costs. This is especially true for mass-market products such as household appliances and other consumer products. In these markets, margins are often thin, and minimizing hardware costs is essential for the profit margin.

However, the introduction of smartphones has started to challenge this approach. Smartphone revenues and profits are now driven to a large extent by apps delivered through app stores. Smartphones are often equipped with new capabilities such as extra sensors, which have no concrete use cases upon release of the new hardware. Instead, manufacturers are betting on the ingenuity of the external developer community to make use of these new capabilities and deliver additional, shared revenue via apps. This means that the revenue and profit perspective is not limited to the initial phone sales; instead, this is looked at through the lens of the total lifetime value.

The same holds true for some car manufacturers: Instead of minimizing the cost for the car BOM, they invest more in advanced hardware, even if this hardware is not fully utilized by the software in the beginning. Utilizing Over-the-Air capabilities, OEMs are constantly optimizing and extending the software that uses advanced hardware capabilities.

Of course this can be a huge bet, and it is not always clear whether it will pay off. Take, for example, a smart kitchen appliance. Instead of building it according to a minimal spec, one can provide a more generous hardware spec, including additional sensors (cooking temperature, weight, volume, etc.), which might only be fully utilized after the Start of Production of the hardware – either by providing a partner app store or even a fully open app store. In the early stages of such new product development, this can be a risk if there is no proof point that partners will jump on board – but on the other hand, the upside can be significant (Fig. 21.34).

A table lists the difference between minimal spec versus the growth-enabled along with icons. Minimal specs lead to cost optimization and growth enabled includes pre invest for future revenues. — **Fig. 21.34**

5.4 Managing System Evolution

One of the biggest challenges for successful products with multiple system components is managing the evolution of the system design and its components over time. This is already true for the software/AI side, especially from a configuration and version management point of view. However, at least here, we can apply as many changes as needed, even after the SOP (assuming OTA is enabled for all edge components). This is much more difficult for hardware since hardware upgrades are significantly costlier than software upgrades. The following looks at some examples to discuss this in more detail.

Example 1: Smartphones

Since the release of the first iPhone in 2007, the smartphone industry has constantly enhanced their offerings, releasing many new versions of phone hardware, phone OS upgrades, core application upgrades, and updates for cloud-based backend services. Backward compatibility is a key concern here: smartphone manufacturers are interested in continuously evolving and optimizing their offerings. However, it is not always possible to ensure backwards compatibility for every change – both on the software as well as the hardware side – because managing too many variants simply increases complexity to a level where it becomes unmanageable. A key prerequisite for managing compatibility across different hardware versions often boils down to creating and maintaining standardized interfaces. Examples include the interfaces between smartphones and headsets, or smartphones and chargers. How open these interfaces are is another topic for debate – some vendors prefer closed ecosystems.

Jan Bosch is Professor at Chalmers University and Director of the Software Center: Today, all the different hardware and software components of the smartphone ecosystem are intrinsically interwoven, forming an integrated digital offering. I usually don’t care about the individual bits or atoms anymore. I am buying into the digital offering as a whole. And this should always be available to me in the most current version. This means frequent and proactive software upgrades, as well as periodic upgrades of electronics and hardware. I get new, phone every one or two years, and I don’t even notice the difference anymore. OK, the camera is a little bit better, but basically it’s the same thing, right? And this is a good thing. Because I am getting the value from the digital offering, and I don’t care how they are handling the mechanics and electronics and the software and the AI. I just want the offering, and I am paying for that. Would it be better if I could replace only parts of the phone like the battery or the main board? Yes, but in the greater scheme of things it is working for me. I am not looking at my smartphone as a physical offering anymore. It’s the digital end-to-end offering that I have bought into (Fig. 21.35).

A diagram with icons depicts the evolution of hardware such as earphones and smartphones. Generation 1 has a wired earpiece icon, labeled cable, play forward slash pause forward slash volume. Generation 2 has a headphone icon labeled wireless. Generation 3 has a wireless earphones icon labeled wireless plus A I. All these are linked to smartphone annual model releases that are stored on the cloud with daily updates. — **Fig. 21.35**

Example 2: Electric Vehicles and Automated Driving

Another interesting example of system evolution are modern electric vehicles (EVs), and especially their Driver Assistance (DA) or Automated Driving (AD) systems. Early movers such as Tesla are constantly evolving and optimizing their products. Tesla has even gone as far as to develop their own chips and computers both for the on-board computers, and for the backend AI-training platforms.

In 2019, Tesla introduced their “Hardware 3.0” or “FSD Computer” (Full Self-Driving). This is a custom AI hardware, which replaces the off-the-shelf GPUs that Tesla was using until then. Tesla claims that their FSD hardware is by orders of magnitude more efficient for AI inference processing, which is needed for DA/AD functions. What is also very significant is that Tesla supports upgrades to the new hardware for customers with older cars. This requires a level of modularity and well-defined interfaces, which is quite advanced.

In 2021, Tesla unveiled its new supercomputer “Dojo”, which is built entirely in-house, including the Dojo D1 chip. Dojo is optimized for training Tesla’s advanced neural networks to support their self-driving technology. Together, these are significant, multibillion investments in creating a deeply integrated AI company (Fig. 21.36).

A diagram displays the evolution of H W upgrade for cars with the onboard hardware and custom F S D computer, and A I training backend, which has standard H W and customer H W. — **Fig. 21.36**

Jan Bosch from Chalmers University and the Software Center: With cars effectively becoming digital/physical products, we are seeing much more fluidity in this space, to match the fast advancements in technology development. Some of the leading EV manufacturers are taking a very different approach here. For them, the car is constantly evolving. The car manufactured in July is an updated version of the car manufactured in June. And again, the same in August. Taking this to the extreme, they might have two floors on their factory: One floor, they are manufacturing the cars according to the latest spec. At the floor below, they are constantly tweaking and twisting and doing all kinds of improvements to the car architecture. And whenever they are satisfied with the improvements, they bring it up to the manufacturing floor to get the update into production. This means that the next version of the car will be manufactured with the next version of the mechanics or electronics, or whatever it was they have optimized. This might not be a reality for most of the incumbent OEMs today. There are of course questions regarding functional safety and homologation. However, if you look at the market valuation of some of these more agile OEMs, it seems clear that this is where the world is going.

This is also a general mindset thing. Instead of long-term planning where everything is cast in stone for a longer period, we need to look at this as a flow. This relates to manufacturing but also to procurement. We need more flexible contracts which support this. Instead of focusing on getting the best possible deal for the next 100,000 sensors, I need a contract to get a flow of sensors, which I can change at any point in time. If I then decide to go from one sensor to another or from one hardware board to another, I can do this, because everything is set up as a flow system – enabling fast and rapid change. This is no longer about the upfront cost for the Bill of Materials. This is about the lifetime value that I can create from that system.

You can categorize the companies I am working with into two buckets. The first bucket includes the companies that do not want any improvements. They just want the system to continue to work as is. So all you do here is bug fixing in the beginning, with feedback from early version in the field. But once this is stabilized, you do not change the running system. The second bucket includes the companies that are looking for continuous product improvements. For these customers, the most important rule is ‘Thou shall be on the latest and greatest version at any point in time’. So you do not get to choose between hardware version 17.14, software version 18.15, and machine learning model version 19.16. No, you will always have the latest version. That is the only way you can manage the complexity here. Let’s say you are doing two hardware platform updates a year. This means that in a two-year period, you have four versions of the hardware. Customers might obtain a grace period of six months before they are forced to upgrade. This means you will always only have two or three versions of your hardware platforms that you are supporting at any point in time. You must take a very proactive approach to limiting the variance space. Otherwise, the complexity will kill you.

Of course you must find a suitable model with your customers. For example, many of the car manufacturers that I am working with are initially selling their new cars via a leasing model for two to three years. After this, the cars are sold to the private market. This is a very typical pattern. What if I could provide an electronics package upgrade after the initial leading period, and thus extend the lifetime of the car and increase the value for the next owner. Tesla did exactly this with their upgrade option to FSD 3.0, which they are offering to owners of older cars. In the future, being able to not only upgrade software and AI but also entire sub-systems, including hardware, will be a key differentiator.

Author information

Authors and Affiliations

Ferdinand Steinbeis Institute, Berlin, Germany
Dirk Slama

Authors

Dirk Slama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dirk Slama .

Editor information

Editors and Affiliations

Robert Bosch GmbH, Ferdinand Steinbeis Institute, Berlin, Germany
Dirk Slama
Robert Bosch GmbH, Stuttgart, Germany
Tanja Rückert
Udacity Inc, Mountain View, CA, USA
Sebastian Thrun
Microsoft (United States), Redmond, WA, USA
Ulrich Homann
Ferdinand-Steinbeis-Institut, Stuttgart, Germany
Heiner Lasi

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Slama, D. (2023). AIoT.exe. In: Slama, D., Rückert, T., Thrun, S., Homann, U., Lasi, H. (eds) The Digital Playbook. Springer, Cham. https://doi.org/10.1007/978-3-030-88221-1_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-88221-1_21
Published: 01 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88220-4
Online ISBN: 978-3-030-88221-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics