A Short Introduction to Artificial Intelligence: Methods, Success Stories, and Current Limitations

Heitzinger, Clemens; Woltran, Stefan

doi:10.1007/978-3-031-45304-5_9

Clemens Heitzinger⁸ &
Stefan Woltran⁹

8293 Accesses

Abstract

This chapter gives an overview of the most important methods in artificial intelligence (AI). The methods of symbolic AI are rooted in logic, and finding possible solutions by search is a central aspect. The main challenge is the combinatorial explosion in search, but the focus on the satisfiability problem of propositional logic (SAT) since the 1990s and the accompanying algorithmic improvements have made it possible to solve problems on the scale needed in industrial applications. In machine learning (ML), self-learning algorithms extract information from data and represent the solutions in convenient forms. ML broadly consists of supervised learning, unsupervised learning, and reinforcement learning. Successes in the 2010s and early 2020s such as solving Go, chess, and many computer games as well as large language models such as ChatGPT are due to huge computational resources and algorithmic advances in ML. Finally, we reflect on current developments and draw conclusions.

You have full access to this open access chapter, Download chapter PDF

Turing on the Integration of Human and Machine Intelligence

From the String Landscape to the Mathematical Landscape: A Machine-Learning Outlook

An AlphaZero-Inspired Approach to Solving Search Problems

1 Introduction

Dartmouth College, 1956, USA. Renowned scientists from various disciplines, including Claude Shannon, the founder of information theory; Herbert Simon, who later won the Nobel Prize for Economics; and the computer scientists Marvin Minsky and John McCarthy, met to explore the potential of the emerging computer technology. The term “artificial intelligence” had already been coined the year before in the course of planning the meeting, and now the following idea was formulated: “If computers manage to perform tasks such as calculating ballistic trajectories better than any human just by applying simple calculation rules, it should be possible to simulate or even generate human thinking by working through simple logical rules.” In fact, in the 1960s, the first computer programs were equipped with logical methods that could create a mathematical proof (“Logic Theorist”) or beat humans at games like chess. The euphoria of those days fizzled out relatively quickly, however, and we will discuss the reasons in more detail in Sect. 2.1.

One disappointment resulted from the fact that while the explicit specification of rules (“symbolic AI”) works well in areas such as proving mathematical statements or planning a sequence of concrete steps to reach a specified goal, other supposedly simpler cognitive performances, such as recognizing objects in a picture or understanding language, turned out to be extremely difficult, if not impossible, to specify in this way. For tasks of this kind, a different approach, which already existed in theory since the late 1940s, but only led to breakthroughs in the twenty-first century due to the availability of the necessary huge data sets, proved to be more purposeful (see Sect. 2.2). Here, no rules are given to the computer, such that the processing of the rules leads to the solution of the problem, but solutions are learned on the basis of data by self-learning. This approach, of course, requires large amounts of data and computing power.

Understanding and distinguishing between these two methods is central to grasp the limitations of current AI research, as well as the resulting problems; we will discuss this in more detail in Sect. 3. From a digital humanism perspective, we consider it paramount from an understanding of the existing methods to discuss dangers but also opportunities that arise from the pervasiveness and availability of AI systems in various areas of life today. We will therefore not address issues such as the treatment of AIs with consciousness, but discuss implications of the so-called singularity (Walsh, 2017), or transhumanistic visions. For space reasons, we also omit topics from the field of robotics (“embodied AI”) as well as their implications (e.g., autonomous weapon systems). For other aspects such as bias, trustworthiness, or AI ethics, we refer to the corresponding chapters in this book.

2 Methods of AI

2.1 Symbolic AI

Symbolic AI refers to those methods that are based on explicitly describing the problems or the necessary solution steps to the computer. Logic or related formal languages are used to describe the problems; actually finding possible solutions (“search”) is a central aspect of symbolic AI. It should already be pointed out at this stage that in this model, the “explainability” of a solution is conceptually easy to obtain (however, for larger specifications, the explanations tend to become incomprehensible for humans). Furthermore, the correctness of a solution is generally definite and not subject to probabilities. The “intelligent” behavior results here simply by the computing power.

Let’s consider an example: board games like chess are defined by clear rules that tell the players the possibilities of their moves. Assume we are in a game situation where I can mate black in two moves, i.e., there is a move for me so that no matter what the other player decides, I have a move that mates the opponent; this is called a winning strategy. To find such a strategy, I simply let the computer try all possible moves on my part. For each such move, I let the computer calculate all possible choices of the opponent and my possible answer to them. If we assume in a simplified way that there are 10 moves to choose from in each situation, we have 10³ = 1000 operations to perform. If we want to calculate one turn ahead, it is already 10⁵ = 10,000 and so on. It is clear that this cannot be carried on arbitrarily, since the problem of the “combinatorial explosion” comes to bear. In chess programs, this is solved by so-called board evaluations (with which move do I have the best possible position after three rounds, e.g., guaranteed more pieces on the board than the opponent). Mediocre players can be beaten with such a preview already with reasonable computing power and simple board evaluations; for grandmasters, however, it took until 1997 when Deep Blue was able to defeat the then world chess champion Garry Kasparov.

The Power of Propositional Logic

It is important to emphasize that for problems where the computational effort increases exponentially with the problem size, symbolic methods have a scalability problem. This is true in many areas: finding models for logical formulas, creating an optimal shift schedule of workers, designing timetables, computing routes in a traffic network, or for expert systems of different kinds. Since it was clear that any progress in the computing power of chips would not withstand exponential growth, symbolic AI methods were not considered to have much potential for solving problems on the scale needed in industrial applications. However, the tide turned in the mid-1990s when Kautz and Selman (1992) proposed to reduce problems of this type to one that is as easy to handle as possible (but still has to deal with the combinatorial explosion) and to use search methods that are as efficient as possible for this problem. This problem is the satisfiability problem of propositional logic (SAT).

In this logic, atomic propositions (which can be true or false) are combined via connectives. The truth value of the compound formula is then given by the assertions to atomic propositions and the semantics of the connectives. Let us have a simple example with the atomic proposition “ai” (standing for “one should study artificial intelligence”) and “dh” (standing for “one should study digital humanism”). The state of an agent might be represented by the following formula:

$$ \left(\mathrm{ai}\ \mathrm{OR}\ \mathrm{dh}\right)\ \mathrm{AND}\ \mathrm{NOT}\left(\mathrm{ai}\ \mathrm{AND}\ \mathrm{dh}\right) $$

stating the fact that one should study AI or digital humanism or both (the part “ai OR dh”), but at same time—maybe due to time constraints—one should not study both at the same time (the part “NOT (ai AND dh)”). We have four possible assertions to the atomic propositions: setting both ai and dh to true; setting ai to true and dh to false; setting ai to false and dh to true; and, finally, setting both to false. Without giving an exact definition of the semantics of the connectives “AND,” “OR,” and “NOT,” it should be quite intuitive that only two of the assertions make the entire formula true, namely, those stating that one should study either AI or digital humanism. The formula is thus satisfied. Suppose now we add the knowledge that one should study AI whenever studying digital humanism and, likewise, one should study digital humanism whenever studying AI. Formally, this leads to the formula

$$ \left(\mathrm{ai}\ \mathrm{OR}\ \mathrm{dh}\right)\ \mathrm{AND}\ \mathrm{NOT}\left(\mathrm{ai}\ \mathrm{AND}\ \mathrm{dh}\right)\ \mathrm{AND}\ \left(\mathrm{dh}->\mathrm{ai}\right)\ \mathrm{AND}\ \left(\mathrm{ai}->\mathrm{dh}\right). $$

This formula is now unsatisfiable, since whatever assertions are provided to the atomic propositions, the formula does not evaluate to true. What makes the SAT problem computationally challenging is the fact that the possible assertions to be checked grow exponentially in the number of atomic propositions present in the formula.

However, it turned out that by using clever search heuristics, exploiting shortcuts in the search space, and by using highly bred data structures, certain formulas with millions of variables can be solved, but other, randomly generated, formulas cannot (Ganesh & Vardi, 2020). However, the formulas that can be solved well are often those found in the “wild.” This is partly explained by the fact that they have certain structural properties, which are used by the search procedure—if one now reduces, e.g., routing problems in traffic networks to such formulas, then the formulas have “good” properties, because in the real world, traffic networks have, e.g., maximum node degree^{Footnote 1} 10 and are not arbitrary graphs. This led in the past years to a success story of SAT-based methods in many areas, especially in the verification of specifications in hardware and software.

Since these applications are often no longer attributed to AI, here is an example where SAT has actually led to the solution of an open problem in mathematics, namely, the problem of Pythagorean triples: the question here is whether the natural numbers can be divided into two parts in such a way that neither of the two parts contains a triple (a, b, c) with a² + b² = c². For the numbers 1 to 10, this is still possible, because I only have to avoid putting the numbers 3, 4, and 5 into the same pot. If we have to divide numbers from 1 to 15, more caution is already needed since now 5, 12, and 13 must not end up in the same pot as well, but it still works. The question is now as follows: Is this division always possible no matter how big the range of numbers is? The SAT solver said no. The numbers from 1 to 7825 can no longer be divided in this way! We refer to Heule et al. (2016) for further details on this project.

The Limits of Propositional Logic

We have thus seen that (propositional) logic can be used to solve problems that exceed human capabilities. In fact, the pioneers of symbolic AI considered logic a central vehicle to describe and simulate human thinking. However, apart from the problem of combinatorial explosion outlined above, another obstacle has arisen here. Human thinking does not always follow (classical) logical steps; we have to deal with uncertainties, process contradictory information, or even revise conclusions once made. In fact, in classical logic, it is already immensely complex to represent plausible facts like “if I put block A on B, the position of all other blocks remains unchanged”; see Hayes (1973). In the course of this, in the 1970s and 1980s, symbolic AI has been centrally concerned with other types of logic systems that allow formalizations of “common-sense reasoning.” The numerous varieties cannot be enumerated here comprehensively, but it should not remain unmentioned that these are today often subsumed under the term “knowledge representation and reasoning” (van Harmelen et al., 2008) and offer a rich portfolio of methods that could find relevance in future AI applications—in particular if it comes to explainability.

2.2 Machine Learning

General Considerations

The defining characteristic of algorithms in machine learning (ML) is that they are self-learning, meaning that the algorithm improves itself, or learns, using data. Traditionally, classical chess programs were explicitly programmed using rules that describe the advantage or disadvantage a player has in terms of points. For example, taking a rook is worth about five points, and dominating the center of the board is advantageous. Self-learning algorithms, by contrast, draw their own conclusions by watching many chess games; there is no programmer who tunes built-in rules. Hence, in ML, the availability of larger and larger data sets makes time-consuming and error-prone fine-tuning of internal rules or parameters of the algorithm superfluous.

In other words, the machine learns, while the human designs the learning algorithm. It was already recognized at the Dartmouth Workshop in 1956 that self-improvement and self-learning are central notions of intelligence.

In the modern view of ML, the data that are used for learning are supposed to be drawn from a probability distribution. Therefore, any learning is stochastic by nature, which gives rise to fundamental considerations. Because the number of data samples is always finite, although it may be huge, we may never observe samples that are important, or we may observe samples that are not representative. The first issue means our learning result can only be probably correct. The second issue means that our learning results can only be approximately correct. Therefore, the best learning results are “probably approximately correct” (PAC) statements about the quality of a learned result.

To illustrate these considerations, let us consider the example of searching for black swans. The black swan (Cygnus atratus) lives in southeastern and southwestern Australia. We must always sample the whole space, but if the number of samples is insufficient, we will never encounter a black swan. This is the first issue. The second issue is that the first black swans that we encounter may have an uncharacteristically light color, misleading us in our approximation of its color.

ML is a large field, but it broadly consists of supervised learning, unsupervised learning, and reinforcement learning. We consider these three large subfields in turn and mention some important applications of ML.

Supervised Learning

Supervised learning (SL) is concerned with finding functions that correctly classify or predict an output value given an input value. These functions are called classifiers or predictors and are chosen from a predefined class of functions and parameterized by parameters to be learned. For example, in image recognition, the inputs are the pixels of an image, and the output may be whether an object that belongs to a certain class of objects (cats, dogs, etc.) is visible or whether the image satisfies a certain property. In SL, the learning algorithm uses training data that consists of inputs and outputs and hence the name. The outputs are often called labels; e.g., an input sample may be a photo of a dog, and the corresponding output may be the label “dog.” In classification tasks, the set of all outputs is finite, whereas in prediction tasks, the set of all outputs is infinite (real numbers).

Many algorithms have been developed for SL, and we mention some of the most important ones: artificial neural networks (ANN), decision trees, random forests, ensemble learning, k-nearest neighbor, Bayesian networks, hidden Markov models, and support vector machines.

Without doubt, nowadays, the most prominent approach to SL is the use of ANNs as classifiers/predictors (Heitzinger 2022, Chapter 13). ANNs are functions that are arranged in layers, where linear functions alternate with pointwise applied nonlinear functions, the so-called activation functions (see Fig. 1). ANNs have a long history, having been already discussed at the Dartmouth Workshop in 1956. A first breakthrough was the backpropagation algorithm (which is automatic backward differentiation), because it enabled the efficient training of ANNs.

A flowchart of the A N N model. The nodes of the input layer are mapped to the nodes of the hidden layer 1, followed by hidden layers 2 and 3, and the output layer. Each node of the hidden layer 3 is mapped to the nodes of the output layer. — **Fig. 1**

Why are ANNs so successful? Although classification is a discrete problem, ANNs are differentiable functions, and, as such, they have gradients, which are the directions of fastest change of a function. Knowing this direction is extremely useful for solving optimization problems, as the gradient provides a useful search direction. For training in SL, it is hence expedient to use the gradient of the classifier/predictor in order to solve the error minimization problem. In ANNs, calculating the gradient is surprisingly fast due to the backpropagation algorithm, taking only about twice as long as evaluating the ANNs.

ANNs are very flexible data structures, and many different ones have been employed, since the number of layers and their sizes can or must be adjusted to the SL problem at hand.

If the number of layers is small, but the sizes of the layers become larger, any continuous function can be approximated, resulting in the famous universal approximation property of ANNs. However, this property of wide ANNs is misleading. In practice, increasing the number of layers helps image recognition and many other applications, resulting in deep, not wide, ANNs. This is the main observation behind deep learning, which is learning using deep ANNs.

A breakthrough recent development are transformers, which are a certain kind of ANN that uses the so-called attention mechanism to learn relationships between words across long distances in a text. Transformers originated in machine translation (Vaswani et al., 2017), yielding the best and fastest machine translation at that time. They were adapted for use in InstructGPT, ChatGPT, and GPT-4 and are a milestone in natural language processing.

The attention mechanism solves two main challenges in natural language processing, both for translation and for text generation. The first challenge is that natural language presupposes a lot of background knowledge—or a world model or common sense—in order to make sense of ambiguities in natural language. The second challenge is the use of pronouns and other relationships between words, sometimes over large distances in a text. The attention mechanism addresses both challenges surprisingly well and can learn the grammar of most natural languages.

Unsupervised Learning

In contrast to SL, there are no outputs in unsupervised learning (UL). In UL, the learning task is to find patterns in input samples from untagged or unlabeled data. Often, the input samples are to be grouped into classes according to their features, or relationships between the samples are to be found.

These relationships are often expressed as graphs or by special kinds of ANNs such as autoencoders.

Common approaches in UL are clustering methods, anomaly detection methods, and learning latent variable models. Clustering methods include hierarchical clustering, k-means, and mixture models. An example of a clustering problem is taking a set of patients and clustering them into groups or clusters according to the similarities of their current states. Anomaly detection methods include local outlier factors and isolation forests. Latent variables can be learned by the expectation-maximization algorithm, the method of moments, and blind signal separation techniques.

Reinforcement Learning

Reinforcement learning (RL) is the subfield of machine learning that is concerned with finding optimal policies to control environments in time-dependent settings (Sutton & Barto, 2018). In each time step, the actions of the agent influence the environment (and possibly the agent itself), and the new state of the environment and a reward are then communicated to the agent. The learning task is to find optimal policies that maximize the expected value of the return, i.e., the discounted sum of all future rewards that the agent will receive while following a policy.

RL is a very general concept that includes random environments as well as policies whose actions are random. It encompasses all board games such as Go, chess, and backgammon, where the rewards are non-zero typically only at the end of the game. The agent receives a reward of +1 for winning, a reward of −1 for losing, and a reward of 0 for a draw. Other applications occur in robotics, user interactions at websites, finance, autonomous driving, medicine (Böck et al., 2022), etc.

Reinforcement learning problems are hard in particular when lots of time passes between taking an action and receiving positive rewards due to this action or a combination of actions. This is the so-called credit assignment problem.

In deep reinforcement learning, deep neural networks are used to represent the policies and the so-called action-value functions. In this context, deep neural networks serve as powerful function approximators in infinite state (and action) spaces. In distributional reinforcement learning as an extension of the classic approach, not only is the expected value of the return maximized, but the whole probability distribution of the return is calculated. This makes it possible to know the risk that is associated with an action that may be taken in a given state.

An early success of RL in the 1990s was solving the board game of backgammon with a huge search tree and a large random component (Tesauro, 1995). Starting in the 2010s until today, reinforcement learning has been the field that enabled a string of milestones in the history of AI. The string of publications (Silver, 2016, 2017, 2018) showed in progressively simpler, but at the same time more powerful, algorithms that Go, chess, shogi, and a large collection of Atari 2600 games can be solved by self-learning algorithms. Quite impressively, a single algorithm, AlphaZero, can learn to play these games at superhuman level. It also learns starting from zero knowledge (tabula rasa), hence Zero in its name.

In the following years, more complicated computer games were solved by similar approaches. Computer games and card games such as poker pose their own challenges, as they contain a considerable amount of hidden information, while all state information is observable by the agent in board games such as chess and Go.

RL is also the reason that InstructGPT (Ouyang et al., 2022)—a precursor—and ChatGPT/GPT-4 (OpenAI, 2023) work so well. A generative pre-trained transformer (GPT), having been trained on vast amounts of text, can generate beautiful text, but it is hard to make it give helpful answers.

The final, but crucial, step in training InstructGPT and ChatGPT is reinforcement learning from human feedback (RLHF) (Ouyang et al., 2022), where four answers to prompts are ordered and these orderings are used as the reward model in the final RL training step. This RL training step aims to align the language model to the needs of the user.

The needs of the user are essentially the 3H (OpenAI, 2023). The first H is for honest; the language model should give honest/correct answers. (Truthful or correct would be better names, as the inner belief of the language model is unknown.) The second H is for helpful; the answers should be helpful and useful. The third H is for harmless; the system should not give any answers that may cause harm. Unfortunately, these three goals are very difficult to achieve in practice and even contradictory. If we ask a language model how to rob a bank, it cannot be helpful and harmless at the same time. Much of ongoing research on AI safety is concerned with satisfying the 3H requirements.

Applications of ML

Because its algorithms are very versatile, ML has found many applications, and its range of applications is still expanding. Due to the speed of using ML algorithms and due to the algorithms having reached near-human or superhuman capabilities in many areas, they have become practically important in many areas.

Applications include bioinformatics, computer vision, data mining, earth sciences, email filtering, natural language processing (grammar checker, handwriting recognition, machine translation, optical character recognition, speech recognition, text-to-speech synthesis), pattern recognition (facial recognition systems), recommendation systems, and search engines.

2.3 Combination of Methods

It is evident that human intelligence relies on different cognitive tasks with the separation into fast and slow thinking (Kahneman, 2011) being a popular approach. Fast thinking refers to automatic, intuitive, and unconscious tasks (e.g., pattern recognition), while slow thinking describes conscious tasks such as planning, deduction, and deliberation. It is evident that machine learning is the method to simulate fast thinking, while symbolic approaches are better suited for problems related to slow thinking. Consequently, the combination of both approaches is seen as the holy grail for next-level AI systems.

In recent years, the term neuro-symbolic AI has been established to name this line of research. However, it comes in many different flavors, and we shall just list a few of them. First, the famous AlphaGo system is mentioned as a prototypical system in this context: the symbolic approach is Monte Carlo tree search to traverse the search space (recall our consideration on chess in Sect. 2.1), but the board evaluation is done via ML techniques (in opposite to Deep Blue where board evaluation was explicitly coded and designed by experts).

A second branch are neuro-symbolic architectures where the neural nets are generated from symbolic rules (for instance, graph neural networks—GNN). Finally, approaches like DeepProbLog offer a weak coupling between the neural and the symbolic part; essentially, deep neural networks are treated as predicates that can be incorporated, with an approximate probability distribution over the network output, into logical rules, and semantic constraints can be utilized for guided gradient-based learning of the network. However, it has to be mentioned that such hybrid architectures do not immediately lead to human-like cognitive capabilities or even consciousness or self-awareness.

3 Reflections

3.1 AI4Good

Through the lens of digital humanism, one might ask where AI provides us with a valuable tool to support human efforts toward solutions to vexing problems. Such applications are often termed “AI for Good,” and there are indeed many examples where we benefit from AI. Such applications range from applications in medicine (treatments, diagnosis, early detection, cancer screening, drug design, etc.) to the identification of hate speech or fake news (cf. chapter by Prem and Krenn) and tools for people with disabilities. A more subtle domain is climate change: while AI techniques can be used to save energy, control herbicide application, and many more, it is AI itself that requires a certain amount of energy (in particular, in the training phase). For a thorough discussion on this important topic, we refer to Rolnick et al. (2023).

3.2 Is ChatGPT a Tipping Point?

ChatGPT is without doubt a major milestone in the history of AI. It is the first system that can interact in a truly helpful manner with users, as demonstrated by its scores on many academic tests (Eloundou et al., 2023). It shows surprising powers of reasoning, considering that it is a system for generating text. Its knowledge is encyclopedic, since during learning, the GPT part has ingested vast amounts of text, probably a good portion of all existing knowledge.

Interestingly enough, ChatGPT’s creativity is closely coupled to its so-called temperature parameter and can therefore be adjusted easily. During text generation, the next token or syllable is chosen from an ordered list of likely continuations. At a low temperature, only the first syllables on the list have a chance of being selected, but at a higher temperature, more syllables down the list also stand a chance. Thus, a higher temperature parameter during text generation increases creativity. Again, the Dartmouth Workshop turned out to be prescient, since creativity in the context of intelligence was a major topic discussed there.

ChatGPT can also be used to solve mathematical or logical problems. However, as a program for generating text, it is no substitute for specialized programs such as computer algebra systems and SAT solvers. However, it is straightforward to couple such specialized programs to ChatGPT. ChatGPT can be trained to become proficient in using programming languages, and therefore, it can generate input to those coupled programs. We expect that such interfaces will become more and more refined and will gain importance, supplementing ChatGPT’s capabilities with procedural domain-expert knowledge.

Therefore, we predict that ChatGPT will revolutionize human-computer interfaces (HCI). The reason is that it can act as a knowledgeable translator of the user’s intention to a vast array of specialized programs, reducing training time for human users and resulting in interactive usage with lots of guidance and easily accessible documentation.

Its potential for revolutionizing HCIs in this sense may likely turn out to be its true power and a tipping point, but its effects to society should not be underrated, and critical reflection over different disciplines is needed.^{Footnote 2}

3.3 Pressing Issues

Media often associate the danger of AI with robot apocalypses of various kinds. However, we see the main issue in the ever-growing use of data (see Sect. 2.2) to train more and more advanced AI models. This leads to several problems related to copyright, privacy and personalized systems, and low-cost workers for labeling data and training the models. Due to limited space, we will not discuss other important issues here, such as education and the impact of AI on the working world.

In fact, ChatGPT has been fueled with sources such as books, (news) articles, websites or posts, or even comments from social networks to perform its core function as a dialogue system. No one has asked us whether we agree that our research papers, blog articles, or comments in social media shall be used to train such an AI. While this kind of copyright issues has always been pressing on the Web, the use of private data becomes even more problematic with personal AI assistants. We know that social media platforms are designed to keep the user on the platform (and thus to present more advertisements) using data about her personal preferences, browser history, and so on. As side effects, we have seen what is called filter bubbles, echo chambers, etc., leading to political polarization and undermining democratic processes in the long run.

All these effects have the potential to be multiplied when AI assistants start to use knowledge about their users to give answers they want to hear, supporting them in radical views, etc. We should have learned our lessons and be extremely careful in feeding AI systems with personal data! Finally, it should not be overseen that AI often relies on hidden human labor (often in the Global South) that can be damaging and exploitative—for instance, these workers have to label hate speech, violence in pictures and movies, and even child pornographic content. For ChatGPT, it has been revealed that the fine-tuning of the system in order to avoid toxic answers has been delegated to outsourced Kenyan laborers earning less than $2 per hour.^{Footnote 3} For the first time, this led to some media echo in this respect, thus raising awareness to a broader public. However, this particular problem is not a new one and seems inherent to the way advanced AI systems are built today (Casilli, 2021).

4 Conclusions

There are three main factors that have resulted in the current state of the art of AI. The first is the availability of huge data sets and databases for learning purposes, also due to the Internet. This includes all modalities, e.g., labeled images for supervised learning and large collections of high-quality texts for machine translation. The second is the availability of huge computational resources for learning, in particular graphic cards (GPUs) and clusters of GPUs for calculations with ANNs. The third factor are algorithmic advancements and software tools. While ANNs have appeared throughout the history of AI, new structures and new gradient-descent algorithms have been and still are instrumental to applications. Another example are the advancements in RL algorithms and SAT solvers.

A division of AI into symbolic AI on the one hand and into machine learning or non-symbolic AI on the other hand can also be viewed as a division of all methods employed in AI into discrete and continuous methods. Here, continuous methods are methods that use the real numbers or vectors thereof, while discrete methods do not and often focus on logic and symbolic knowledge. Furthermore, many problems in AI can be formulated as (stochastic) optimization problems; for example, in supervised learning, an error is to be minimized, and in reinforcement learning, optimal policies are sought.

Among optimization problems, continuous optimization problems can be solved much more efficiently than discrete optimization problems due to the availability of the gradient, which indicates a useful search direction and which is the basis of the fundamental gradient-descent and gradient-ascent algorithms. Thus, the formulation of learning problems as problems in continuous optimization has turned out to be tremendously fruitful. An example is image classification, a problem in supervised learning, which is discrete by its very nature: the question whether an image shows a dog has a discrete answer. Using ANNs and a softmax output, this discrete problem is translated into a continuous one, and training the ANN benefits from gradient descent.

Since the Dartmouth Workshop, AI has seen tremendous, albeit nonlinear, progress. Throughout the history of AI, we have witnessed AI algorithms becoming able to replicate more and more capabilities that were unique to human minds before, in many cases surpassing human capabilities. ChatGPT is the recent example that revolutionizes how AI deals with natural language. It is remarkable that it can compose poems much better than nearly all humans. Also, systems such as AlphaZero and ChatGPT took many people, including AI researchers, by surprise.

We expect these developments and the quest for superhuman capabilities to continue. The recent breakthroughs will see some consolidation in the sense that learning algorithms will become more efficient and better understood. At the same time, many open questions and challenges remain, and the three driving factors of AI discussed at the beginning of this section will remain active.

Research will continue at a fast pace, and more and more human capabilities will be matched and surpassed. The defining characteristic of humans has always been that we are the smartest entities and the best problem-solvers. This defining characteristic is eroding. It will be up to us to improve the human condition and to answer the philosophical question of what makes us human; it will not be our capabilities alone.

Discussion Questions for Students and Their Teachers

1.
Which are, in your opinion, the major opportunities and positive effects of AI technology?
2.
Provide a list of cognitive tasks humans are capable to do, and discuss which AI method would be the one to solve it.
3.
Which are, in your opinion, the major risks of AI technology?
4.
Which types of questions can be answered well by large language models such as ChatGPT? Which cannot be answered well?
5.
For which types of questions and in which areas do you trust the answers of large language models such as ChatGPT?
6.
What do you expect to use computers for in 5 years’ time for which you are not using them nowadays? In 10 years’ time?
7.
In their book Why Machines Will Never Rule the World, Jobst Landgrebe and Barry Smith argue that human intelligence is a capability of a complex dynamic system that cannot be modeled mathematically in a way that allows them to operate inside a computer (see also the interview here: https://www.digitaltrends.com/computing/why-ai-will-never-rule-the-world/). Find arguments in favor and against their claim.
8.
For a provocative article on machine learning and its limits, see Darwiche (2018). Discuss this article in the light of recent developments.

Learning Resources for Students

1.
Marcus, G., and Davis, E. (2019) Rebooting AI—Building Artificial Intelligence We Can Trust. Pantheon.

This is a popular science book by a psychologist and a computer scientist; it offers an analysis of the current state of the art and discusses the need for robust, trustworthy AI systems.
2.
Russell, S.J., and Norvig, P. (2021) Artificial Intelligence, a Modern Approach. 4th edition. Pearson.

This is a standard textbook on artificial intelligence, which comprises 7 parts (artificial intelligence; problem-solving; knowledge, reasoning, and planning; uncertain knowledge and reasoning; machine learning; communicating, perceiving, and acting; conclusions) on more than one thousand pages. The two authors, highly accomplished researchers, provide comprehensive treatments of all major strands of AI.

Notes

1.
The degree of a node in a graph is the number of nodes which are directly connected to that node.
2.
See https://dighum.ec.tuwien.ac.at/statement-of-the-digital-humanism-initiative-on-chatgpt/.
3.
https://time.com/6247678/openai-chatgpt-kenya-workers/

References

Böck, M., Malle, J., Pasterk, D., Kukina, H., Hasani, R., & Heitzinger, C. (2022). Superhuman performance on sepsis MIMIC-III data by distributional reinforcement learning. PLoS One, 17(11), e0275358. https://doi.org/10.1371/journal.pone.0275358
Article Google Scholar
Casilli, A. (2021). Waiting for robots: The ever-elusive myth of automation and the global exploitation of digital labor. Sociologias, 23(57), 112–133.
Article Google Scholar
Darwiche, A. (2018). Human-level intelligence or animal-like abilities? Communications of the ACM, 61(10), 56–67.
Article Google Scholar
Eloundou T., et al. (2023). GPTs are GPTs: An early look at the labor market impact potential of large language models. arXiv:2303.10130.
Google Scholar
Ganesh, V., & Vardi, M. Y. (2020). On the unreasonable effectiveness of SAT solvers. In Beyond the worst-case analysis of algorithms (pp. 547–566). Columbia University Press.
Chapter Google Scholar
Hayes, P. (1973). The frame problem and related problems in artificial intelligence. University of Edinburgh.
Google Scholar
Heitzinger, C. (2022). Algorithms with Julia (1st ed.). Springer.
Book Google Scholar
Heule, M. J. H., Kullmann, O., & Marek, V. W. (2016). Solving and verifying the Boolean Pythagorean triples problem via cube-and-conquer. Proceedings SAT, 2016, 228–245.
MathSciNet Google Scholar
Kahneman, D. (2011). Thinking, fast and slow. Farrar, Straus and Giroux.
Google Scholar
Kautz, H. A., & Selman, B. (1992). Planning as satisfiability. Proceedings ECAI, 1992, 359–363.
Google Scholar
OpenAI. (2023). GPT-4 Technical Report. arXiv:2303.08774.
Google Scholar
Ouyang, L., et al. (2022). Training language models to follow instructions with human feedback. arXiv:2203.02155.
Google Scholar
Rolnick, D., Donti, P. L., Kaack, L. H., Kochanski, K., Lacoste, A., Sankaran, K., Ross, A. S., Milojevic-Dupont, N., Jaques, N., Waldman-Brown, A., Luccioni, A. S., Maharaj, T., Sherwin, E. D., Mukkavilli, S. K., Kording, K. P., Gomes, C. P., Ng, A. Y., Hassabis, D., Platt, J. C., Creutzig, F., Chayes, J. T., & Bengio, Y. (2023). Tackling climate change with machine learning. ACM Computing Surveys, 55(2), 42.1–42.96.
Article Google Scholar
Silver, D. (2016). Mastering the game of Go. Nature, 529, 484–489.
Article Google Scholar
Silver, D. (2017). Mastering the game of Go without human knowledge. Nature, 550, 354–359.
Article Google Scholar
Silver, D. (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362, 1140–1144.
Article MathSciNet Google Scholar
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT Press.
Google Scholar
Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3). https://doi.org/10.1145/203330.203343
van Harmelen, F., Lifschitz, V., & Porter, B. W. (Eds.). (2008). Handbook of knowledge representation. Elsevier.
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Kaiser, L. (2017). Attention is all you need. arXiv:1706.03762.
Google Scholar
Walsh, T. (2017). The singularity may never be near. AI Magazine, 38(3), 58–62.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Center for Artificial Intelligence and Machine Learning, Institute of Information Systems Engineering, TU Wien, Wien, Austria
Clemens Heitzinger
Center for Artificial Intelligence and Machine Learning, Institute of Logic and Computation, TU Wien, Wien, Austria
Stefan Woltran

Authors

Clemens Heitzinger
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Woltran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefan Woltran .

Editor information

Editors and Affiliations

TU Wien, Vienna, Austria
Hannes Werthner
DEIB, Politecnico di Milano, Milano, Italy
Carlo Ghezzi
Department of Computing, Imperial College London, London, UK
Jeff Kramer
Ludwig-Maximilians-Universität München, München, Germany
Julian Nida-Rümelin
Lero & The Open University, Milton Keynes, UK
Bashar Nuseibeh
University of Vienna, Vienna, Austria
Erich Prem
Middlebury College and Santa Fe Institute, Middlebury, VT, USA
Allison Stanger

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Heitzinger, C., Woltran, S. (2024). A Short Introduction to Artificial Intelligence: Methods, Success Stories, and Current Limitations. In: Werthner, H., et al. Introduction to Digital Humanism. Springer, Cham. https://doi.org/10.1007/978-3-031-45304-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-45304-5_9
Published: 21 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45303-8
Online ISBN: 978-3-031-45304-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Short Introduction to Artificial Intelligence: Methods, Success Stories, and Current Limitations

Abstract

Similar content being viewed by others

Turing on the Integration of Human and Machine Intelligence

From the String Landscape to the Mathematical Landscape: A Machine-Learning Outlook

An AlphaZero-Inspired Approach to Solving Search Problems

1 Introduction

2 Methods of AI

2.1 Symbolic AI

The Power of Propositional Logic

The Limits of Propositional Logic

2.2 Machine Learning

General Considerations

Supervised Learning

Unsupervised Learning

Reinforcement Learning

Applications of ML

2.3 Combination of Methods

3 Reflections

3.1 AI4Good

3.2 Is ChatGPT a Tipping Point?

3.3 Pressing Issues

4 Conclusions

Discussion Questions for Students and Their Teachers

Learning Resources for Students

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

A Short Introduction to Artificial Intelligence: Methods, Success Stories, and Current Limitations

Abstract

Similar content being viewed by others

Turing on the Integration of Human and Machine Intelligence

From the String Landscape to the Mathematical Landscape: A Machine-Learning Outlook

An AlphaZero-Inspired Approach to Solving Search Problems

1 Introduction

2 Methods of AI

2.1 Symbolic AI

The Power of Propositional Logic

The Limits of Propositional Logic

2.2 Machine Learning

General Considerations

Supervised Learning

Unsupervised Learning

Reinforcement Learning

Applications of ML

2.3 Combination of Methods

3 Reflections

3.1 AI4Good

3.2 Is ChatGPT a Tipping Point?

3.3 Pressing Issues

4 Conclusions

Discussion Questions for Students and Their Teachers

Learning Resources for Students

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation