1 Introduction

Technical development in the field of artificial intelligence (“AI”) and machine learning (“ML”) have the potential to hugely impact the industrial sector. For embedded software running in control loops, on edge devices and through various machine-interfaces, machine learning models is becoming an increasingly popular addition. Knowledge of AI and ML will therefore grow in importance as traditional embedded solutions are supplemented by additional algorithmic handling of system data in order to improve for example reliability in service, optimization of features and prediction of maintenance. Working with ML in embedded systems requires knowledge of specialized software tools and frameworks developed for resource constrained environments.

To adapt to a changing industrial sector, Realtime Embedded AB (“RTE”) has been a long time partner in pan-European/ECSEL projects such as SCOTT, DEWI, PaPP and SMECY. The goal has been to participate in research activities in close proximity to industry-driven use cases. RTE is a consultant company that focuses on embedded systems and IoT solutions for the Swedish industry, with customers in lines of business ranging from life science and medical technology to the automotive industry.

To discuss ML in embedded software applications, this chapter takes as vantage point a three-part collaborative effort carried out within the InSecTT project, in which RTE serves as middle partner. The use case is centered on a digitalized marine vessel located in the port of Gdansk, and the use case’s focus is to perform predictive maintenance for various on board systems. In short, the project includes digitalization and data logging (overseen by Gdansk University of Technology, GUT), algorithmic data management through ML modeling (performed by RTE) and visualization (implemented by Vemco). The focus of this chapter is the middle part: ML as developed as part of an embedded system that targets industrial business clients, with special emphasis on recommendations, guidelines and lessons learned. Still if the use case is specific, the discussion is—when applicable—broadened to a more general level.

The remainder of this chapter is organized as follows: the first sections covers overall project description and goals, and describes the project on an organizational level. The next sections discusses project design on a technical level, including both general and case specific aspects. The technical design section is followed by two short introductions to cloud verses edge computing and IoT security, and last the reader will find a short concluding discussion.

2 Project Description and Goals

The three-part collaborative project, from here on referred to as the Tucana project,Footnote 1 is based on vast sampling of operational data from a marine vessel. The onboard sensors are connected with the plug-and-play marine communications standard NMEA 2000. For RTE, the project aims to develop an algorithm that can detect behavioral discrepancies in order to avoid a complete malfunctioning in one or more of the onboard systems. This is commonly referred to as predictive maintenance. Given the economical implications for such diagnostic tools, predictive maintenance has grown to be a sought after application within the field of ML.

All in all, eleven sensors that report electrical battery voltages, course over ground, heading, geographical position, rate of turn, speed over ground, engine revolution and rudder angle are deployed on the vessel. For all sensors that collect operational data, see Table 1. The data is stored in a time series database and subscriptions to all data can be set up through a message broker, as shown in Fig. 1. Thus, historical data is fetched from the time series database and used as training data for the ML model, while during runtime, the ML model is invoked using real-time data samples received either through the broker or by requests from the time series database. After realtime analysis using ML, the result is translated into a corresponding status message and forwarded in accordance with a third party API. The API describes four possible messages that mirror vessel status. This status is integrated in a visualization software that shows vessel whereabouts and condition on a map of the port area. In addition to the forward signaling, a message describing a prediction of a possible need for maintenance can be sent back to the vessel personnel. This message will be sent only if such a result is the conclusion of the ML analysis. For full architectural overview, see Fig. 1. Please note that the concept “real-time” in this use case is used in a broader context than in traditional Real-Time computing scenarios. Due to the sampling frequency of the sensors and latency in connectivity, subscription data can be up to some seconds old, but will none the less be referred to as live or real-time data. Older data is referred to as historical data.

Table 1 List of sensors deployed on the Tucana vessel
Fig. 1
A flow diagram depicts the architectural overview. influx D B and the ship connected each other. The analyzer. The v e m c o visualization software. All three are connected to the broker.

Architectural overview

3 Project Design

This section covers the design of the software developed by RTE, that serves as middle part in the complete Tucana project setup. The software consists of a communication platform which includes an analyzer that integrates one or more ML models, and tools for creating a training data set. Additionally, a test environment to simulate the third party software is designed for verification purposes.Footnote 2 The design is described through its two phases. In stage 1, a simple ML model is used as a proof of concept in order to test the platform and data flow. In this stage, live data is fetched through queries to the time series database. In stage 2, the analyzer is generalized to be able to invoke different kinds of ML models and live data is retrieved by setting up subscriptions through a message broker. Both stages of the design use pre-trained ML models that does not learn beyond a defined moment in time. Alternatively, a learning model that is continuously retrained as new data become available could be used. This hypothetical scenario is discussed briefly in the section Alternative model setup. During stage 1. the complete RTE software is run on a regular PC with a Linux installation. However, since all software is written in Python, it is not platform dependent.

During design stage 2, the analyzer part can also be run on an edge device, in this case a Nitrogen6X board from Boundary Devices, with a 4 core, 1 GHz ARM Cortex-A9 CPU. For device environment, see Appendix A.

4 Machine Learning in Embedded Systems

Working with ML in embedded systems requires specific knowledge of software tools and frameworks developed for resource constrained environments. In this context, an embedded system is defined as a system dedicated to a specific set of functionality that has limited hardware resources, such as processor power or operating system capabilities (cf. Haigh et al., 2015). Embedded software is low-level software, tightly integrated with the specific hardware’s design. These systems are often run through various predefined machine interfaces and therefore need to handle legacy. In the Tucana project this is reflected by the choice of InfluxDB and RabbitMQ message broker, that was made during the previous digitalizing part of the project, carried out by GUT.

Performing near real-time predictions in a live system affect choice of algorithms, communication protocols and overall software design. Embedded designs need to be reliable and include a sufficient amount of error handling, which extends to the parts that implement ML. Hence, the choice of tools, frameworks and standards chosen for the Tucana project have been made with embedded applications in mind, although the ML part itself is platform independent. In stage 1 of the project, the ML model is a sequential neural network, implemented by using the TensorFlow framework. The model is saved as a protbuf (.db) file, which makes it easy to deploy on different hardware devices.

Stage 2 includes training a neural network in a cloud environment and export it as a TensorFlow Lite (.tflite) model. TensorFlow Lite is optimized for on-device machine learning and has multiple platform support, covering Android and iOS devices, embedded Linux and a range of microcontrollers. The second development stage also includes an exploratory approach, in which an unsupervised ML model is trained (clustering).

The edge device used in the second development stage is an iMX6 microcontroller, equipped with four ARM Cortex-9 CPU kernels. The software is written in Python, which provides easy access to libraries for socket communication as well as modules that are developed especially for ML purposes. For a list of enclosed software and versions, see Appendix A.

5 Communication Platform

5.1 Design Layout

The Tucana project’s communication platform consists of four independently deployable processes in a client–server arrangement, see Fig. 2. The controller unit acts as server and handles all interprocess messaging. Three clients—an analyzer, an MQTT consumer and an InfluxDB client—can connect to the controller by using unique identification. The connection is a standard socket connection bound to a port number for the TCP layer to identify the application communication. The database client implements a periodic query to the Influx database, the MQTT consumer sets up message subscriptions to the broker and the analyzer handles system data processing and predictions.

Fig. 2
A flow chart depicts the design overview from the top to bottom. It consists of a controller, and socket bind handle client to publish. It includes influx of D B clients, analyzer, and M Q T T consumer.

Design overview

Each client is run as a separate process that connects to the controller in a client–server socket connection setup. Using sockets allow clients and server to be run on different machines, should that be desired. In the Tucana project, all processes are run on the same machine. See Appendix B for sample code of a client controller class’ connect and disconnect methods. In case a client is disconnected, the controller will keep serving remaining clients. If the controller shuts down, clients will try to reconnect until a new connection is established. Neither shut down of server nor clients should result in uncontrolled crashes. In the controller, each connected client is appended to a client list. The controller loops through all connected clients to handle incoming and outgoing messages.

Six inter-process messages are handled by the controller, see Table 2.

Table 2 Inter-process messages

5.2 Message Queuing with RabbitMQ

Message Queuing Telemetry Transport (MQTT) and Advanced Message Queuing Protocol (AMQP) are open-source protocols used for asynchronous messaging. Both are binary protocols and work on top of TCP/IP. They allow messages between applications irrespective of underlying software stack and is widely deployed for IoT services. MQTT is more light-weight and generally deployed in embedded systems, whereas AMQP is a more complete message protocol that is often used in larger systems and originates from the banking industry. For applications and small edge devices operating on minimum bandwidth, MQTT would likely be a preferable starting point, although security requirements, network reliability and scale might impact choice of protocol. Although MQTT and AMQP are common within IoT, they are still less commonly used in industrial control environments.

For the Tucana project at RTE, the choice to use RabbitMQ as message broker and AMQP over SSL is a prerequisite. It serves as a flexible solution that is both scalable and modular. The connectivity service is implemented as middleware and the design in large follows the strategy laid out by [4]. The design aims at separating protocol handling and control logic so that they can be independently deployed. For this purpose, we chose to implement the message queuing using Pika. Pika is a pure-Python client implementation of the AMQP 0-9-1 protocol for RabbitMQ. Several other Python libraries are similarly easy to use, for example Paho, however not as straight forward combined with RabbitMQ. Since RabbitMQ has an MQTT plugin that transparently translates between MQTT and AMQP, the Tucana project design depicts its message queuing client as MQTT.

The Pika implementation is made as simple as possible. It creates a channel connection and starts to consume messages from a queue dedicated to Tucana messaging. The channel.basic_consume method binds messages for a specific consumer tag to a callback. The consumer tag is automatically created should a specific tag not be declared. If there is a message with this tag in the Tucana queue, all subscribers will be notified. On message, the callback handles the AMQP message and forwards its payload in an inter-process message to the controller, as described in Figs. 3 and 4. For a list of all inter-process messages, see Table 2. In order to remove the AMQP message from the queue, the client finishes by sending a channel basic_ack. For sample code of message queuing using the Pika client, see Appendix B.

Fig. 3
An illustration depicts the system messaging stage 1. It consists of M Q T T domain, consumer, analyzer, identity, publish, communication controller, influx D b client, M S G analyze, alarm status, and identity.

Stage 1 system messaging

Fig. 4
An illustration depicts the system messaging stage 2. It consists of M Q T T domain, subscribe, M Q T T consumer, analyzer, identity, body, publish, communication controller, M S G analyze, alarm status, and communication platform.

Stage 2 system messaging

5.3 Inter-Process Messaging

In design stage 1, vessel data is fetched from the time series database with an interval of 20 s. In stage 2, subscriptions to live data is setup through the MQTT broker and the interval for performing an analysis is configurable. The data is routed to the analyzer through the controller. The analyzer always starts an analysis by checking that all sensor values pass basic rules and thresholds. After this initial sequence, it invokes the ML model in order to classify the present status of the marine vessel’s onboard systems. Should the model predict a state of needed maintenance, an MQTT status message is published to a dedicated queue. Any other software with the adequate credentials, within or without the Tucana project, can subscribe to such messages. MQTT publishing is handled directly by the controller. Please note that for a larger project, a dedicated MQTT publisher might be a preferable option. For an overview of all inter-process messaging, see Figs. 3 and 4.

6 Data Extraction

This section discusses data and data extraction. These are central topics to any embedded software project that aim to integrate ML methods. The rapid development of machine learning over the past decade and its applications within the industrial sector brings forth new questions of data, data extraction and data representation connected to this field.

Three issues of general character quickly present themselves when dealing with data driven development: (1) what kind of data is accessible, (2) how is data accessed and (3) what kind of interpretations is that data open to? For the Tucana project, eleven sensors connected through the NMEA 2000 system are deployed on the marine vessel. The data is accessible via InfluxDB queries or as MQTT subscriptions, and the goal of the project is to interpret the sensor data to predict a need for onboard maintenance. Therefore, an ML model will have to recognize system anomalies. This means that the model has to be trained either by using labeled anomalies or by settings up model boundaries for how to interpret model output.

If the system in question is operating as a live system in an industrial environment, as is the case in the Tucana project, real anomaly data might be hard to come by. The vessel cannot be taken out of service for experimental purposes, and the onboard equipment is expensive. Putting high pressure on valuable industrial equipment to provoke errors that lead to anomaly data may be impossible. Instead, synthesizing anomaly data can be a way forward. To interpret and synthesize data accurately, thorough data exploration of historical data is necessary. Data to its nature is not solid, long lasting pieces of knowledge that makes universal sense, but should rather be looked at within the context of its creation. However contra intuitive a word like ‘creation’ might appear in this regard, it serves as a reminder of the complex process of interpreting data, as it captures the significant degree of human–machine interference that data extraction necessitates.

According to conventional wisdom around so called Big Data, more data is equated with a more accurate result. However, in embedded systems—as well as in general—relying on data analysis also requires a process of interpretation and translation in order to reveal “truth-telling” knowledge about a system. Sensor data and measurements might vary with temperature, air pressure or computational load, they might depend of geographical position, or they are perhaps only interpretable during a limited period of time. To address the question of the representative nature of data, even the term “data” itself has been questioned as a misnomer, as its etymological definition is that which is ‘given’ [5]. Instead, the term ‘capta’ has been suggested, meaning what is ‘taken’ [2]. This mirrors that which data to use is a choice.

Albeit the possible difference between the very fluid character of cultural data and technical measurements reported by for example digital sensors, keeping this discussion in mind has proven helpful during the development of the Tucana project. Here, system data is strictly bound to one specific vessel, rendering the statistical analysis closely tied to certain environmental circumstances. However, in order to make the platform more general, the Tucana project strives for a modular design, where the ML model can easily be swapped for another. The statistical analysis is also divided in two separate parts: a set of threshold rules that can be configured in accordance with the origin of the data, and a machine learning part that is trained on vessel specific data. Should the platform code be deployed for another vessel, the ML model most likely would have to be retrained.

Retraining for another vessel would, using the same model design and corresponding amount of training data, take approximately three to four hours on a standard laptop.

For this purpose, the project has developed a special toolchain for creating a training set database. For a more detailed description of this, see the next section. Depending on latency or restricted access issues, mirroring databases from customer site to development site can also constitute good practice to facilitate the development work.

7 Training Data Set and Model

7.1 Design Stage 1

This section describes a simple classifier model used for design stage 1, and the construction of a training set database. Using neural network based ML applications can be particularly challenging for embedded systems with limited resources. A number of methods exist to address these difficulties, including efficient model design, customized hardware accelerator designs and hardware/software co-design strategies [6].

In this proof of concept example, the model input vector is constituted of twelve data points, and the output is 1 or 0, meaning that the vessel is estimated to be located either inside or outside the port area. It is not an elaborated model, but it gives some information about the data itself and can tell if the vessel machinery behaves differently depending on its geographical location. However, in order to anticipate more detailed misbehavior, a more complex model must be deployed.

The training data build up is based on a so called “base measurement”. For the Tucana project, this is represented by the rate of turn measurement. Historical data is fetched from the time series database (InfluxDB), which stores measurements with individual timestamps. Therefore, data from different sensors will only in rare cases carry the same timestamp. Fetching the latest of all eleven measurements thus means the timestamps will differ depending on each measurement's update frequency. Since rate of turn has the lowest update frequency, choosing this as base measurement means that the other measurements can be selected as “close” as possible to the rate of turn-values. To start extracting a training data set, a script tool reads entries from a base measurement file, see Fig. 5. As shown, twenty minutes worth of data is fetched for each entry. This results is one json file per entry, containing the stated time series for the rate of turn measurement. In this example this means that 12 json files containing time series for the rate of turn values will be created.

Fig. 5
A screenshot of the base measurement file. It displays the data chunks, timestamps of the vessel, plot area, measurement, database query, and label.

Base measurement json file (top) and resulting data chunks (bottom). The first and second column in the base measurement file specify between which timestamps the vessel has been located inside or outside the port area. The third column shows name of the measurement, the fourth and fifth specify what key(s) to look for in the database query and the last column sets the label. The label “fishing” is used as a metaphor for when the vessel is located outside the port area

The script tool then continues by fetching the closest possible values for all other measurements. All measurements’ time series are stored in separate files. Thus, the final database will contain 11 × 12 files, or 12 chunks of data, each with a varying number of timestamp entries depending on the length of the respective time series.

When all data chunks are ready, the script tool finishes by reading all the data into an input matrix. The data values are normalized between 0 and 1, shuffled and split into one training set (70%) and one validation set (30%). The training set is used to train the model, and the validation set is used for evaluation. The model itself then splits the training set in one training suite and one test suite while fitting the model, see Fig. 6. In Fig. 7, validation of the neural network model is represented by true and false positives (tp, fp), true and false negatives (tn, fn), as well as precision (P) and recall (R):

Fig. 6
A screenshot displays the input node 1. The sigmoid function of the input layer is displayed in three lines. It includes the input layer, dense, shape, and the activation.

Sequential model. The model splits the training data set into training data (0.67%) and test data (0.33%)

Fig. 7
A screenshot represents the data dimensions of the train set shape, label shape, test set shape, test labels shape, and model input shape. The evaluation metrics at the right display the loss, f n, f p, t p, t n, precision, and recall values.

Data dimensions (left) and evaluation metrics (right)

$$p = tp/tp + fp R = tp/tp + fn.$$

The neural network used in this example is a simple Tensorflow sequential model with 12 input nodes, one hidden layer of 6 nodes and one output layer with a sigmoid function, see Fig. 6. The reason that the number of input nodes are 12 and not 11 (as in the number of onboard sensors) is that the position sensor reports both latitude and longitude, which are represented by one node each.

Using the base measurement as shown above, the training data, test data, label data and model will end up with the dimensions shown in Fig. 7. A batch size of 128 and 100 epochs evaluate the model with a precision value of 0.98 and recall of 0.99. These metrics clearly show that the ML model with a high degree of certainty can determine if the vessel is inside or outside the port area, which of course is not surprising given that the position data is included in the input vector. Excluding position data renders the output less accurate, with P approximately 0.83 and R approximately 0.85. Although these P and R values are not as good as when position data is included, the result is still useful to some extent.

7.2 Design Stage 2

In the second development stage, the idea was to use a clustering method to determine if any identifiable clusters would show up in the data. First, the plan was to use a clustering algorithm such as KMeans or DBSCAN. However, after the number of dimensions in the data was reduced from 12 to 2, two different areas for the data observations could be identified and used as clusters. After trying a number of other clustering methods, this method turned out to have the best outcome. The result is presented here in a series of visualized steps. The numbers of feature dimensions was reduced using Principal Component Analysis (PCA). Using two components proved to cover approximately 63% of the data variability, see Fig. 8, and we settled for this dimensional reduction. As shown in Fig. 9, two different data point areas appear in the 2D variability scatter plot: one “inner” area and one “outer” for label 0 and 1 respectively. Figure 10 shows a suggestion for how to determine if an observation should be classified as label 0 or 1.

Fig. 8
A graph of explained variance versus the number of components. A curve represents a concave increasing trend from (0, 0.3) and (12, 1.0). A vertical line was drawn at 2 and a horizontal line at 0.65. The values are approximate.

Cumulative variance for the number of components. Two components cover 62.71% of the variability

Fig. 9
A scatterplot of the second principal component versus the first principal component. The title reads 2 D scatterplot 62.7% of the variability captured. The two components are differentiated in two different colors, denote a high at (negative 1.5, 4.5). The values are approximate.

Explained variance for two components. Label 0 corresponds to the vessel being inside the breakwater area, and 1 corresponds to the vessel being outside

Fig. 10
A scatterplot of the second principal component versus the first principal component. The title reads 2 D scatterplot 62.7% of the variability captured. The two components are differentiated in two different colors and a circle drawn in between, denotes a high at (negative 1.5, 4). The values are approximate.

Example of how an inner cluster can be separated from the outside area

7.3 Alternative Model Setup

As discussed above, the Tucana project design integrates an ML model that does not learn beyond a fixed point in time. This means that data reported after this moment is not included in the training data set, and the model, once deployed, is not retrained. As an alternative to this setup, a model that is continuously retrained can be used. Our suggestion for such an arrangement is to add another client process, refer to Fig. 2. This process should handle extraction and storing of new data as well as trigger retraining sessions of the model. New data chunks of course have to be labeled, why the process is preferably initiated on user command in combination with which label to use. This can be done in a number of ways: through a message subscription sent from a web application, from the command line interface or other. In case more training data become available, retraining the model can also be done automatically on a configurable interval, if the data is accurately labeled. For an example of a dynamically retrained framework architecture for embedded AI, see Fig. 11 (proposed by Brandalero et al.).

Fig. 11
A flow chart depicts the framework. It consists of sensors and a processing platform. Sensors, training data, M L training framework, trained model, and processing platform. Processing platform, intelligent actions, and environment.

AITIA Framework for Embedded AI as proposed by Brandalero et al. (2020)

8 Cloud or Edge?

For a project such as the Tucana project, both cloud and edge computing can be a suitable choice. Cloud computing means that model training and data analysis is performed in the cloud, and edge computing means that these processes are handled on a small network device with less capacity. On one hand, system data is time-driven. Additionally, connectivity on the marine vessel could be limited, if the vessel is far from any cell tower. To avoid latency, these circumstances point towards edge computing (that the computing unit is located onboard, for example). On the other hand, the ML algorithm can potentially be modeled with large amounts of input data, and although the input data is time-driven, the predictions made by the ML model are not very time-sensitive. Maintenance work on a marine vessel will not start immediately, and decisions made by the crew or the shipping company will most likely not depend on the latest prediction. This does not mean that one single prediction could not make a difference, but decisions relying on model output will not have to be made within a very short period of time. Since the task of the ML algorithm is to predict a need for more acute maintenance, a likely scenario is that the vessel will undergo maintenance procedures by the end of the day, or when it returns to port. These circumstances allow for cloud computing. A mixed scenario with edge computing onboard and cloud computing when possible is also possible. Due to practical reasons during the development phase, the Tucana software is executed on a PC located far from the vessel. Whether the computational part will later on be deployed using edge or cloud computing is at the time of writing not decided. The ML model itself can be trained in a cloud environment, on a PC or on an edge device, as long as it is exported in a suitable format.

9 Security

This section serves as a very brief introduction to IoT security. The purpose is not to provide a detailed review, but to direct attention to this important matter. Many IoT devices are not designed with security in mind, but for industrial purposes security is often a central aspect. Thus, developing IoT systems that are scalable, error tolerant and easy to advance is not enough. An industrial online system also has to be secured from breaches and hostile take over. Therefore, the Tucana software is developed to fit a secure cloud based reference architecture developed by RTE during the SCOTT project (Secure Connected Trustable Things).Footnote 3 For a general architecture overview, see Fig. 12. The secure architecture is based on a Google Cloud design. In many aspects, the three major cloud services Amazon Web, Google Cloud and Microsoft Azure are roughly equivalent, but easy access to API:s and an intuitive user interface resulted in the choice of Google Cloud. For the Tucana design, the IoT unit symbolizes the marine vessel and the Tucana software, as described in this document, would run on the gateway device. Note: A gateway can also act as an IoT unit, and there is no clear definition that tells them apart. For a smaller IoT unit, the communication link between the IoT units and the gateway can be realized with for example BLE (Bluetooth Low Energy), but for the Tucana design, the marine data is communicated using MQTT. Communication between the gateway and the cloud is setup using MQTT, and browser access is realized with HTTPS. All communication links are encrypted and communication between components in the architecture follows a pattern of authentication—to establish the sender’s identity—and authorization—to establish the senders right to access services supplied by the receiver. To handle this, each component is equipped with a public/private pair of keys. A dedicated server that stores security policies performs the authorization. Figure 13 shows an overview of different services that are integrated in the RTE secure platform. For further reading, please refer to online documentation for each component/service shown in Fig. 13.

Fig. 12
A flow chart represents the general architecture. from top to bottom consists of three steps. It includes a web browser, cloud, gateway, and I o T unit.

General reference architecture

Fig. 13
A flow chart depicts the architecture of secure I o T from top to bottom. It consists of a web browser, Yubi key, H T T P S, microservices, Google Oauth public key ingress server, authentication server, H T T P S gateway, and I o T unit.

RTE reference architecture for secure IoT

Important note: In August 2022, Google announced that Google Cloud’s IoT Core service would be discontinued within the year. For business customers and other professional users, this can potentially lead to big and time consuming migration projects. Thus, using cloud solutions that rely on services being maintained by other parties can cause extra work and should be part of a risk discussion with concerned clients and users. In this particular case, the discontinuation of IoT Core does not affect the Tucana project.

10 Conclusion

Working with ML in embedded systems require decisions that concern working with limited resources, security, real-time response requirements, high throughput-performance, cloud or edge computing, requirements of robustness, transparency and so on. This chapter provides a number of short introductions to areas that are central to embedded systems that integrate AI solutions. We conclude that no ML algorithm will be more intelligent than data exploration and interpretation allow it to be, and hence we want to put extra emphasis on this domain. As the Tucana project proceeds, an important conclusion is also that ML can serve as a powerful tool during the development phase, but an ML model might not have to be included in the final product. As data exploration is a necessary part of all ML development, learning about the data itself, using for example ML technologies, can lead to other ways to integrate this knowledge in the final product. This can prove to be a more stable and robust software. Thus, ML can also serve as a feasible learning tool, and evaluation and validation of ML models will have to guide the development team along the way. Suggestions for further development in the Tucana project include for example completing a design that can combine cloud and edge computing depending on network availability, and implementing a dynamically retrained framework architecture for embedded AI.