Skip to main content

Introduction into Artificial Intelligence and Machine Learning

  • Chapter
  • First Online:
Armament, Arms Control and Artificial Intelligence

Abstract

Artificial intelligence (AI) has become an important topic in research as well as industry since its birth in the 1950s. Research on early approaches of machine learning has actually been going on since the 1940s. Still, the question often arises what is actually meant by AI, especially in practice. In this chapter, we give a brief introduction into artificial intelligence and more specifically machine learning (ML). We briefly summarise the history of artificial intelligence and machine learning, introduce the concepts of supervised learning, unsupervised learning, and reinforcement learning as the tree main types of ML algorithms and discuss how to measure the quality of machine learning algorithms. For the interested reader, the appendix of the chapter includes a brief description of artificial neural networks and machine learning metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Buxmann .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Artificial Neural Networks

The basic idea behind the development of ANN is to simulate the (human) brain. In general, an ANN consists of nodes (neurons) and edges (synapses). As the following figure shows, three types of neurons are distinguished, which are also called units (Goodfellow et al., 2016; Rey & Wender, 2018):

  • Input units receive the input data, for example pixels in an image recognition algorithm or blood values when diagnosing diseases. Input units are denoted by x in Fig. 6.

  • Hidden units are located between input and output units and thus represent the inner layers of an ANN. They can be arranged in several layers one after the other and are denoted by h1…hn in Fig. 4.

Output units contain the output data, for example a classification dog or cat in an algorithm for the recognition of animals. These are marked with y in Fig. 4.

Fig. 4
A chart has three types of circles placed vertically for input units followed by a set of four and three circles for multiple hidden layers and finally two for output units. Each circle has a forward arrow for a circle that is a circle in the input will have four arrows emerging from it pointing at four circles of the middle layer. A flow from c, h subscript 1 and n, y.

Example for an artificial neural network

A simple neural network contains only one hidden layer and is often already sufficient for many applications. Deep neural networks have multiple hidden layers, while the necessary or good number of layers and neurons depends on the individual application.

As the figure shows, the neurons are connected by edges, expressed as arrows. If we denote two neurons with i and j respectively, wij expresses the weight along the edge between i and j (Fig. 5).

Fig. 5
An image has a circle with i on the left and a forward arrow emerges from it and points at a circle with j. W subscript i and j is mentioned below the arrow.

Two neurons i and j and their respective weights wij

Ultimately, the acquired knowledge of an ANN is represented by these weights, which can be easily represented on the basis of matrices (Fig. 6).

Fig. 6
An image has two types of circles placed vertically on the left for layer n and two on the right for layer n plus 1. The circles on the left have one and two while the right has three and four. There are two forward arrows and crisis cross arrows emerging from layer n towards the right layer. On the left, W equals W subscript i j and w equals W subscript 13, 14, 23, and 24.

Representation of the weights

The input that one neuron receives from others depends on the output of the sending neuron(s) and the weights along the edges. If Outputi denotes the activity level of a sending neuron i, then the input that a neuron j receives can be expressed as the sum over the weighted outputs of the neurons feeding it, adjusted with a bias offset value bj, as in the following equation.

$$ {Input}_j=\sum \limits_i\left({Output_i}^{\ast }{w}_{ij}\right)+{b}_j $$

The output of a neuron is based on the input and an activation function. Various function types are conceivable for this activation function a—in the simplest case it is linear.

$$ Outpu{t}_i=a\left( Inpu{t}_i\right) $$

The weights represent the knowledge of the ANN. These weights are modified based on learning rules. For example, when applying a supervised learning algorithm, the weights are modified or adjusted based on the training data. The most common procedure today is probably the so-called backpropagation method. Put simply, it works in such a way that errors in the initial layer are proportionately attributed to the error contributions of the hidden units involved and the weights are iteratively adjusted (Rumelhart et al., 1986).

1.2 Machine Learning Metrics

Different metrics exist to measure the quality of machine learning approaches, often depending on the type of approach that is used, e.g. classification, regression or deep learning. Note that a metric is different from a loss function. A loss function maps one or several variables to a real number and is often used as an objective function in mathematical optimization, for example. While metrics are usually used to measure the performance of an approach, a loss function is used to train a machine learning approach.

1.2.1 Classification Metrics

For classification problems several metrics exist, including accuracy, precision, recall and the F1 score. They can all be computed based on the CM (see Fig. 3).

1.2.1.1 Classification Accuracy

Classification accuracy is computed as the ratio of the number of correct predictions to the total number of input samples. While it is a simple metric, it is problematic when the costs of one type of misclassification are very high. If a patient is wrongly classified as non-cancerous, for example, it can have fatal consequences.

In general, accuracy can be computed as:

$$ \boldsymbol{accuracy}=\frac{\boldsymbol{number}\ \boldsymbol{of}\ \boldsymbol{correct}\ \boldsymbol{predictions}}{\boldsymbol{total}\ \boldsymbol{number}\ \boldsymbol{of}\ \boldsymbol{predictions}} $$

With respect to the CM, accuracy can be computed by taking the values on the main diagonal:

$$ \boldsymbol{accuracy}=\frac{\boldsymbol{true}\ \boldsymbol{positives}+\boldsymbol{true}\ \boldsymbol{negatives}}{\boldsymbol{total}\ \boldsymbol{number}\ \boldsymbol{of}\ \boldsymbol{predictions}} $$
1.2.1.2 Detection Rate

The detection rate gives the percentage of correctly predicted trues (or 1 s) with respect to the total number of predictions:

$$ \boldsymbol{detection}\ \boldsymbol{rate}=\frac{\boldsymbol{true}\ \boldsymbol{positives}}{\boldsymbol{total}\ \boldsymbol{number}\ \boldsymbol{of}\ \boldsymbol{predictions}} $$
1.2.1.3 Precision

The precision or the positive predicted value gives the percentage of correctly predicted 1 s with respect to all predicted 1 s:

$$ \boldsymbol{precision}=\frac{\boldsymbol{true}\ \boldsymbol{positives}}{\boldsymbol{true}\ \boldsymbol{positives}+\boldsymbol{false}\ \boldsymbol{positives}} $$
1.2.1.4 Recall

A recall score measures the percentage of correctly predicted 1 s with respect to all actual 1 s. It is also called sensitivity or true positive rate:

$$ \boldsymbol{recall}=\frac{\boldsymbol{true}\ \boldsymbol{positives}}{\boldsymbol{true}\ \boldsymbol{positives}+\boldsymbol{false}\ \boldsymbol{negatives}} $$
1.2.1.5 Specificity

The specificity is also called the true negative rate. It determines the percentage of all 0 s that were correctly predicted:

$$ \boldsymbol{specificity}=\frac{\boldsymbol{true}\ \boldsymbol{negatives}}{\boldsymbol{false}\ \boldsymbol{positives}+\boldsymbol{true}\ \boldsymbol{negatives}} $$
1.2.1.6 Balanced Accuracy

The balanced accuracy is computed as the mean of recall and specificity and therefore balances the percentages of correctly predicted 1 s and 0 s:

$$ \boldsymbol{balanced}\ \boldsymbol{accuracy}=\frac{\boldsymbol{recall}+\boldsymbol{specificity}}{\mathbf{2}} $$
1.2.1.7 F1 Score

The F scores combine the precision and recall metrics. In general, an F score for a value β can be computed as:

$$ {\boldsymbol{F}}_{\boldsymbol{\beta}}=\left(\mathbf{1}+{\boldsymbol{\beta}}^{\mathbf{2}}\right)\cdotp \frac{\boldsymbol{precision}\cdotp \boldsymbol{recall}}{{\boldsymbol{\beta}}^{\mathbf{2}}\cdotp \boldsymbol{precision}+\boldsymbol{recall}} $$

In the special case for β = 1, the F1 score is the harmonic mean between precision and recall. The range for the F1 score is [0, 1]. The greater the F1 score, the better the performance of the model. F1 can be computed as:

$$ {\boldsymbol{F}}_{\mathbf{1}}=\mathbf{2}\cdotp \frac{\boldsymbol{precision}\cdotp \boldsymbol{recall}}{\boldsymbol{precision}+\boldsymbol{recall}} $$

1.2.2 Regression Metrics

Typical regression metrics are the mean absolute error and the mean squared error.

1.2.2.1 Mean Absolute Error

The mean absolute error is equal to the average of the absolute differences between the original values vi and the predicted values wi. It expresses how far the predictions were from the actual values. However, it does not give the direction of the error, i.e. whether data was over or under predicted. With N denoting the number of values, it can be computed as:

$$ \boldsymbol{mean}\ \boldsymbol{absolute}\ \boldsymbol{error}=\frac{\mathbf{1}}{\boldsymbol{N}}\ {\sum}_{\boldsymbol{i}=\mathbf{1}}^{\boldsymbol{N}}\left|{\boldsymbol{v}}_{\boldsymbol{i}}-{\boldsymbol{w}}_{\boldsymbol{i}}\right| $$
1.2.2.2 Mean Squared Error

The mean squared error and the mean absolute error are comparably similar. The only difference is that the mean squared error uses the average of the squares of difference between the original and the predicted values. Due to taking the square of the error, larger errors become more dominant compared to smaller errors. Therefore, when using the mean squared error, the focus is on larger errors:

$$ \boldsymbol{mean}\ \boldsymbol{squared}\ \boldsymbol{error}=\frac{\mathbf{1}}{\boldsymbol{N}}\ {\sum}_{\boldsymbol{i}=\mathbf{1}}^{\boldsymbol{N}}{\left({\boldsymbol{v}}_{\boldsymbol{i}}-{\boldsymbol{w}}_{\boldsymbol{i}}\right)}^{\mathbf{2}} $$

The root mean squared error takes the square root of the average of the squares of difference between the original and the predicted values and is therefore also sensitive to outliers.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Reuter-Oppermann, M., Buxmann, P. (2022). Introduction into Artificial Intelligence and Machine Learning. In: Reinhold, T., Schörnig, N. (eds) Armament, Arms Control and Artificial Intelligence. Studies in Peace and Security. Springer, Cham. https://doi.org/10.1007/978-3-031-11043-6_2

Download citation

Publish with us

Policies and ethics