Introduction into Artificial Intelligence and Machine Learning

Reuter-Oppermann, Melanie; Buxmann, Peter

doi:10.1007/978-3-031-11043-6_2

Part of the book series: Studies in Peace and Security ((SPS))

673 Accesses

Abstract

Artificial intelligence (AI) has become an important topic in research as well as industry since its birth in the 1950s. Research on early approaches of machine learning has actually been going on since the 1940s. Still, the question often arises what is actually meant by AI, especially in practice. In this chapter, we give a brief introduction into artificial intelligence and more specifically machine learning (ML). We briefly summarise the history of artificial intelligence and machine learning, introduce the concepts of supervised learning, unsupervised learning, and reinforcement learning as the tree main types of ML algorithms and discuss how to measure the quality of machine learning algorithms. For the interested reader, the appendix of the chapter includes a brief description of artificial neural networks and machine learning metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Anthalye, A., Engstrom, L., Ilyas, A., & Kwok, K. (2017). Fooling neural networks in the physical world with 3D adversarial objects. https://www.labsix.org/physical-objects-that-fool-neural-nets/
Bishop, C. (2006). Pattern recognition and machine learning. Springer-Verlag New York.
Google Scholar
Brynjolfsson, E., & McAfee, A. (2017). The business of artificial intelligence. Harvard Business Review. https://hbr.org/cover-story/2017/07/the-business-of-artificial-intelligence
Buxmann, P., & Schmidt, H. (2019). Künstliche Intelligenz. Springer.
Book Google Scholar
Carbonell, J. G., Boggs, W. M., Mauldin, M. L., & Anick, P. G. (1983). The XCALIBUR project: A natural language interface to expert systems. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI'83), Morgan Kaufmann Publishers Inc., 653–656.
Google Scholar
Chaib-Draa, B., Moulin, B., Mandiau, R., & Millot, P. (1992). Trends in distributed artificial intelligence. Artificial Intelligence Review, 6(1), 35–66. https://doi.org/10.1007/BF00155579
Article Google Scholar
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), 1, IEEE, 886-889. https://doi.org/10.1109/CVPR.2005.177.
Dartmouth College. (1956). Summer research project on artificial intelligence. In Volunteer officer experience (VOX) conference. USA.
Google Scholar
Faraj, S., Pachidi, S., & Sayegh, K. (2018). Working and organizing in the age of the learning algorithm. Information and Organization, 28(1), 62–70. https://doi.org/10.1016/j.infoandorg.2018.02.005
Article Google Scholar
Fawcett, T., & Provorst, F. (1997). Adaptive Fraud Detection. Data Mining and Knowledge Discovery, 1(3), 291–316. https://doi.org/10.1023/A:1009700419189
Article Google Scholar
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874. https://doi.org/10.1016/j.patrec.2005.10.010
Article Google Scholar
Franklin, S., & Graesser, A. C. (1997). Intelligent agents III. Lecture notes on artificial intelligence (pp. 21–35). Springer-Verlag.
Google Scholar
Goertzel, B. (2010). Toward a formal characterization of real-world general intelligence. Proceedings of the 3rd Conference on Artificial General Intelligence (AGI). Atlantis Press, 74-79. https://doi.org/10.2991/agi.2010.17.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
Google Scholar
Gouriveau, R., Medjaher, K., & Zerhouni, N. (2016). From prognostics and health systems management to predictive maintenance 1: Monitoring and prognostics. Wiley.
Book Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer-Verlag New York.
Book Google Scholar
Hess, T., Legner, C., Esswein, W., Maaß, W., Matt, C., Österle, H., Schlieter, H., Richter, P., & Zarnekow, R. (2014). Digital life as a topic of business and information systems engineering? Business & Information Systems Engineering, 6(4), 247–253. https://doi.org/10.1007/s12599-014-0332-6
Article Google Scholar
Hyndman, R., & Koehler, A. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22(4), 679–688. https://doi.org/10.1016/j.ijforecast.2006.03.001
Article Google Scholar
Kononenko, I. (2001). Machine learning for medical diagnosis: History, state of the art and perspective. Artificial Intelligence in Medicine, 23(1), 89–109. https://doi.org/10.1016/S0933-3657(01)00077-X
Article Google Scholar
Korf, R. E. (1997). Does deep blue use artificial intelligence? ICGA Journal, 20(4), 243–245.
Article Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS'12), Curran Associates Inc (pp. 1097–1105). https://doi.org/10.1145/3065386
Chapter Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. https://doi.org/10.1109/5.726791
Article Google Scholar
Litjens, G., Kooi, T., Ehteshami Bejnordi, B., Setio, A. A. A., Ciompi, F., Ghafoorian, M., Van Der Laak, J. A., Van Ginneken, B., & Sánchez, C. I. (2017). A survey on deep learning in medical image analysis. Medical Image Analysis, 42, 60–88. https://doi.org/10.1016/j.media.2017.07.005
Article Google Scholar
Manhart, K. (2017). Eine kleine Geschichte der Künstlichen Intelligenz. http://www.cowo.de/a/3330537
Marsland, S. (2014). Machine learning: An algorithmic perspective. Taylor & Francis, Inc.
Book Google Scholar
Mertens, P. (1985). Künstliche Intelligenz in der Betriebswirtschaft. In D. Ohse, A. C. Esprester, H.-U. Küpper, P. Stähly, & H. Steckhan (Eds.), DGOR. Operations research proceedings (pp. 285–292). Springer. https://doi.org/10.1007/978-3-642-70457-4_71
Chapter Google Scholar
Mitchell, T. M. (1997). Machine learning. McGraw-Hill.
Google Scholar
Murphy, K. P. (2012). Machine learning: A probabilistic perspective. The MIT Press.
Google Scholar
Newell, A., & Simon, H. (1958). Heuristic problem solving: The next advance in operations research. Operations Research, 6(1).
Google Scholar
Nilsson, N. J. (2014). Principles of artificial intelligence. Tioga Press.
Google Scholar
Odagiri, H., Nakamura, Y., & Shibuya, M. (1997). Research consortia as a vehicle for basic research: The case of a fifth generation computer project in Japan. Research Policy, 26(2), 191–207.
Article Google Scholar
Pennachin, C., & Goertzel, B. (2007). Contemporary approaches to artificial general intelligence. In B. Goertzel & C. Pennachin (Eds.), Artificial General Intelligence (pp. 1–30). Springer. https://doi.org/10.1007/978-3-540-68677-4_1
Chapter Google Scholar
Perrault, R., Shoham, Y., Brynjolfsson, E., Clark, J., Etchemendy, J., Grosz, B., Lyons, T., Manyika, J., Mishra, S., & Niebles, J. C. (2019). The AI index 2019 annual report. AI Index Steering Committee, Human-Centered AI Institute, Stanford University.
Google Scholar
Pfitzner, D., Leibbrandt, R., & Powers, D. (2009). Characterization and evaluation of similarity measures for pairs of clusterings. Knowledge and Information Systems, 19(3), 361–394. https://doi.org/10.1007/s10115-008-0150-6
Article Google Scholar
Polanyi, M. (1966). The tacit dimension. Peter Smith.
Google Scholar
Powers, D. (2011). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies, 2(1), 37–63. https://doi.org/10.48550/arXiv.2010.16061
Article Google Scholar
Reinsel, D., Gantz, J., & Rydning, J. (2018). The digitization of the world: From edge to core. IDC White Paper. https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf
Rey, G. D., & Wender, K. F. (2018). Neuronale Netze. In Eine Einführung in die Grundlagen, Anwendungen und Datenauswertung. Hogrefe Verlag GmbH & Co. KG.
Google Scholar
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536. https://doi.org/10.1038/323533a0
Article Google Scholar
Russell, S. J., & Norvig, P. (2010). Artificial intelligence: A modern approach. Prentice Hall Internationall, Inc.
Google Scholar
Saul, L. K., & Roweis, S. T. (2003). Think globally, fit locally: Unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research, 4, 119–155.
Google Scholar
Searle, J. R. (1980). Minds, brains, and programs. Behavioral and Brain Sciences, 3(3), 417–457. https://doi.org/10.1017/S0140525X00005756
Article Google Scholar
Shortlife, E. H., Davis, R., Axline, S. G., Buchanan, B. G., Green, C. C., & Cohen, S. N. (1975). Computer-based consultations in clinical therapeutics: Explanation and rule acquisition capabilities of the MYCIN system. Computers and Biomedical Research, 8, 303–320.
Article Google Scholar
Zack, K. (2016). Sheepdog or mop? URL: https://twitter.com/teenybiscuit/status/707670947830968320/photo/1: Karen Zack via Twitter.

Download references

Author information

Authors and Affiliations

Information Systems | Software & Digital Business, Technical University of Darmstadt, Darmstadt, Germany
Melanie Reuter-Oppermann & Peter Buxmann

Authors

Melanie Reuter-Oppermann
View author publications
You can also search for this author in PubMed Google Scholar
Peter Buxmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Buxmann .

Editor information

Editors and Affiliations

PEASEC, TU Darmstadt, Darmstadt, Germany
Thomas Reinhold
Peace Research Institute Frankfurt, Frankfurt am Main, Hessen, Germany
Niklas Schörnig

Appendix

1.1 Artificial Neural Networks

The basic idea behind the development of ANN is to simulate the (human) brain. In general, an ANN consists of nodes (neurons) and edges (synapses). As the following figure shows, three types of neurons are distinguished, which are also called units (Goodfellow et al., 2016; Rey & Wender, 2018):

Input units receive the input data, for example pixels in an image recognition algorithm or blood values when diagnosing diseases. Input units are denoted by x in Fig. 6.
Hidden units are located between input and output units and thus represent the inner layers of an ANN. They can be arranged in several layers one after the other and are denoted by h₁…h_n in Fig. 4.

Output units contain the output data, for example a classification dog or cat in an algorithm for the recognition of animals. These are marked with y in Fig. 4.

A chart has three types of circles placed vertically for input units followed by a set of four and three circles for multiple hidden layers and finally two for output units. Each circle has a forward arrow for a circle that is a circle in the input will have four arrows emerging from it pointing at four circles of the middle layer. A flow from c, h subscript 1 and n, y. — **Fig. 4**

A simple neural network contains only one hidden layer and is often already sufficient for many applications. Deep neural networks have multiple hidden layers, while the necessary or good number of layers and neurons depends on the individual application.

As the figure shows, the neurons are connected by edges, expressed as arrows. If we denote two neurons with i and j respectively, w_ij expresses the weight along the edge between i and j (Fig. 5).

An image has a circle with i on the left and a forward arrow emerges from it and points at a circle with j. W subscript i and j is mentioned below the arrow. — **Fig. 5**

Ultimately, the acquired knowledge of an ANN is represented by these weights, which can be easily represented on the basis of matrices (Fig. 6).

An image has two types of circles placed vertically on the left for layer n and two on the right for layer n plus 1. The circles on the left have one and two while the right has three and four. There are two forward arrows and crisis cross arrows emerging from layer n towards the right layer. On the left, W equals W subscript i j and w equals W subscript 13, 14, 23, and 24. — **Fig. 6**

The input that one neuron receives from others depends on the output of the sending neuron(s) and the weights along the edges. If Output_i denotes the activity level of a sending neuron i, then the input that a neuron j receives can be expressed as the sum over the weighted outputs of the neurons feeding it, adjusted with a bias offset value b_j, as in the following equation.

$$ {Input}_j=\sum \limits_i\left({Output_i}^{\ast }{w}_{ij}\right)+{b}_j $$

The output of a neuron is based on the input and an activation function. Various function types are conceivable for this activation function a—in the simplest case it is linear.

$$ Outpu{t}_i=a\left( Inpu{t}_i\right) $$

The weights represent the knowledge of the ANN. These weights are modified based on learning rules. For example, when applying a supervised learning algorithm, the weights are modified or adjusted based on the training data. The most common procedure today is probably the so-called backpropagation method. Put simply, it works in such a way that errors in the initial layer are proportionately attributed to the error contributions of the hidden units involved and the weights are iteratively adjusted (Rumelhart et al., 1986).

1.2 Machine Learning Metrics

Different metrics exist to measure the quality of machine learning approaches, often depending on the type of approach that is used, e.g. classification, regression or deep learning. Note that a metric is different from a loss function. A loss function maps one or several variables to a real number and is often used as an objective function in mathematical optimization, for example. While metrics are usually used to measure the performance of an approach, a loss function is used to train a machine learning approach.

1.2.1 Classification Metrics

For classification problems several metrics exist, including accuracy, precision, recall and the F1 score. They can all be computed based on the CM (see Fig. 3).

1.2.1.1 Classification Accuracy

Classification accuracy is computed as the ratio of the number of correct predictions to the total number of input samples. While it is a simple metric, it is problematic when the costs of one type of misclassification are very high. If a patient is wrongly classified as non-cancerous, for example, it can have fatal consequences.

In general, accuracy can be computed as:

$$ \boldsymbol{accuracy}=\frac{\boldsymbol{number}\ \boldsymbol{of}\ \boldsymbol{correct}\ \boldsymbol{predictions}}{\boldsymbol{total}\ \boldsymbol{number}\ \boldsymbol{of}\ \boldsymbol{predictions}} $$

With respect to the CM, accuracy can be computed by taking the values on the main diagonal:

$$ \boldsymbol{accuracy}=\frac{\boldsymbol{true}\ \boldsymbol{positives}+\boldsymbol{true}\ \boldsymbol{negatives}}{\boldsymbol{total}\ \boldsymbol{number}\ \boldsymbol{of}\ \boldsymbol{predictions}} $$

1.2.1.2 Detection Rate

The detection rate gives the percentage of correctly predicted trues (or 1 s) with respect to the total number of predictions:

$$ \boldsymbol{detection}\ \boldsymbol{rate}=\frac{\boldsymbol{true}\ \boldsymbol{positives}}{\boldsymbol{total}\ \boldsymbol{number}\ \boldsymbol{of}\ \boldsymbol{predictions}} $$

1.2.1.3 Precision

The precision or the positive predicted value gives the percentage of correctly predicted 1 s with respect to all predicted 1 s:

$$ \boldsymbol{precision}=\frac{\boldsymbol{true}\ \boldsymbol{positives}}{\boldsymbol{true}\ \boldsymbol{positives}+\boldsymbol{false}\ \boldsymbol{positives}} $$

1.2.1.4 Recall

A recall score measures the percentage of correctly predicted 1 s with respect to all actual 1 s. It is also called sensitivity or true positive rate:

$$ \boldsymbol{recall}=\frac{\boldsymbol{true}\ \boldsymbol{positives}}{\boldsymbol{true}\ \boldsymbol{positives}+\boldsymbol{false}\ \boldsymbol{negatives}} $$

1.2.1.5 Specificity

The specificity is also called the true negative rate. It determines the percentage of all 0 s that were correctly predicted:

$$ \boldsymbol{specificity}=\frac{\boldsymbol{true}\ \boldsymbol{negatives}}{\boldsymbol{false}\ \boldsymbol{positives}+\boldsymbol{true}\ \boldsymbol{negatives}} $$

1.2.1.6 Balanced Accuracy

The balanced accuracy is computed as the mean of recall and specificity and therefore balances the percentages of correctly predicted 1 s and 0 s:

$$ \boldsymbol{balanced}\ \boldsymbol{accuracy}=\frac{\boldsymbol{recall}+\boldsymbol{specificity}}{\mathbf{2}} $$

1.2.1.7 F1 Score

The F scores combine the precision and recall metrics. In general, an F score for a value β can be computed as:

$$ {\boldsymbol{F}}_{\boldsymbol{\beta}}=\left(\mathbf{1}+{\boldsymbol{\beta}}^{\mathbf{2}}\right)\cdotp \frac{\boldsymbol{precision}\cdotp \boldsymbol{recall}}{{\boldsymbol{\beta}}^{\mathbf{2}}\cdotp \boldsymbol{precision}+\boldsymbol{recall}} $$

In the special case for β = 1, the F1 score is the harmonic mean between precision and recall. The range for the F1 score is [0, 1]. The greater the F1 score, the better the performance of the model. F1 can be computed as:

$$ {\boldsymbol{F}}_{\mathbf{1}}=\mathbf{2}\cdotp \frac{\boldsymbol{precision}\cdotp \boldsymbol{recall}}{\boldsymbol{precision}+\boldsymbol{recall}} $$

1.2.2 Regression Metrics

Typical regression metrics are the mean absolute error and the mean squared error.

1.2.2.1 Mean Absolute Error

The mean absolute error is equal to the average of the absolute differences between the original values v_i and the predicted values w_i. It expresses how far the predictions were from the actual values. However, it does not give the direction of the error, i.e. whether data was over or under predicted. With N denoting the number of values, it can be computed as:

$$ \boldsymbol{mean}\ \boldsymbol{absolute}\ \boldsymbol{error}=\frac{\mathbf{1}}{\boldsymbol{N}}\ {\sum}_{\boldsymbol{i}=\mathbf{1}}^{\boldsymbol{N}}\left|{\boldsymbol{v}}_{\boldsymbol{i}}-{\boldsymbol{w}}_{\boldsymbol{i}}\right| $$

1.2.2.2 Mean Squared Error

The mean squared error and the mean absolute error are comparably similar. The only difference is that the mean squared error uses the average of the squares of difference between the original and the predicted values. Due to taking the square of the error, larger errors become more dominant compared to smaller errors. Therefore, when using the mean squared error, the focus is on larger errors:

$$ \boldsymbol{mean}\ \boldsymbol{squared}\ \boldsymbol{error}=\frac{\mathbf{1}}{\boldsymbol{N}}\ {\sum}_{\boldsymbol{i}=\mathbf{1}}^{\boldsymbol{N}}{\left({\boldsymbol{v}}_{\boldsymbol{i}}-{\boldsymbol{w}}_{\boldsymbol{i}}\right)}^{\mathbf{2}} $$

The root mean squared error takes the square root of the average of the squares of difference between the original and the predicted values and is therefore also sensitive to outliers.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Reuter-Oppermann, M., Buxmann, P. (2022). Introduction into Artificial Intelligence and Machine Learning. In: Reinhold, T., Schörnig, N. (eds) Armament, Arms Control and Artificial Intelligence. Studies in Peace and Security. Springer, Cham. https://doi.org/10.1007/978-3-031-11043-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-11043-6_2
Published: 09 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11042-9
Online ISBN: 978-3-031-11043-6
eBook Packages: Political Science and International StudiesPolitical Science and International Studies (R0)

Publish with us

Policies and ethics

Introduction into Artificial Intelligence and Machine Learning

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Artificial Neural Networks

1.2 Machine Learning Metrics

1.2.1 Classification Metrics

1.2.1.1 Classification Accuracy

1.2.1.2 Detection Rate

1.2.1.3 Precision

1.2.1.4 Recall

1.2.1.5 Specificity

1.2.1.6 Balanced Accuracy

1.2.1.7 F1 Score

1.2.2 Regression Metrics

1.2.2.1 Mean Absolute Error

1.2.2.2 Mean Squared Error

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation