1 Introduction

Dartmouth College, 1956, USA. Renowned scientists from various disciplines, including Claude Shannon, the founder of information theory; Herbert Simon, who later won the Nobel Prize for Economics; and the computer scientists Marvin Minsky and John McCarthy, met to explore the potential of the emerging computer technology. The term “artificial intelligence” had already been coined the year before in the course of planning the meeting, and now the following idea was formulated: “If computers manage to perform tasks such as calculating ballistic trajectories better than any human just by applying simple calculation rules, it should be possible to simulate or even generate human thinking by working through simple logical rules.” In fact, in the 1960s, the first computer programs were equipped with logical methods that could create a mathematical proof (“Logic Theorist”) or beat humans at games like chess. The euphoria of those days fizzled out relatively quickly, however, and we will discuss the reasons in more detail in Sect. 2.1.

One disappointment resulted from the fact that while the explicit specification of rules (“symbolic AI”) works well in areas such as proving mathematical statements or planning a sequence of concrete steps to reach a specified goal, other supposedly simpler cognitive performances, such as recognizing objects in a picture or understanding language, turned out to be extremely difficult, if not impossible, to specify in this way. For tasks of this kind, a different approach, which already existed in theory since the late 1940s, but only led to breakthroughs in the twenty-first century due to the availability of the necessary huge data sets, proved to be more purposeful (see Sect. 2.2). Here, no rules are given to the computer, such that the processing of the rules leads to the solution of the problem, but solutions are learned on the basis of data by self-learning. This approach, of course, requires large amounts of data and computing power.

Understanding and distinguishing between these two methods is central to grasp the limitations of current AI research, as well as the resulting problems; we will discuss this in more detail in Sect. 3. From a digital humanism perspective, we consider it paramount from an understanding of the existing methods to discuss dangers but also opportunities that arise from the pervasiveness and availability of AI systems in various areas of life today. We will therefore not address issues such as the treatment of AIs with consciousness, but discuss implications of the so-called singularity (Walsh, 2017), or transhumanistic visions. For space reasons, we also omit topics from the field of robotics (“embodied AI”) as well as their implications (e.g., autonomous weapon systems). For other aspects such as bias, trustworthiness, or AI ethics, we refer to the corresponding chapters in this book.

2 Methods of AI

2.1 Symbolic AI

Symbolic AI refers to those methods that are based on explicitly describing the problems or the necessary solution steps to the computer. Logic or related formal languages are used to describe the problems; actually finding possible solutions (“search”) is a central aspect of symbolic AI. It should already be pointed out at this stage that in this model, the “explainability” of a solution is conceptually easy to obtain (however, for larger specifications, the explanations tend to become incomprehensible for humans). Furthermore, the correctness of a solution is generally definite and not subject to probabilities. The “intelligent” behavior results here simply by the computing power.

Let’s consider an example: board games like chess are defined by clear rules that tell the players the possibilities of their moves. Assume we are in a game situation where I can mate black in two moves, i.e., there is a move for me so that no matter what the other player decides, I have a move that mates the opponent; this is called a winning strategy. To find such a strategy, I simply let the computer try all possible moves on my part. For each such move, I let the computer calculate all possible choices of the opponent and my possible answer to them. If we assume in a simplified way that there are 10 moves to choose from in each situation, we have 103 = 1000 operations to perform. If we want to calculate one turn ahead, it is already 105 = 10,000 and so on. It is clear that this cannot be carried on arbitrarily, since the problem of the “combinatorial explosion” comes to bear. In chess programs, this is solved by so-called board evaluations (with which move do I have the best possible position after three rounds, e.g., guaranteed more pieces on the board than the opponent). Mediocre players can be beaten with such a preview already with reasonable computing power and simple board evaluations; for grandmasters, however, it took until 1997 when Deep Blue was able to defeat the then world chess champion Garry Kasparov.

The Power of Propositional Logic

It is important to emphasize that for problems where the computational effort increases exponentially with the problem size, symbolic methods have a scalability problem. This is true in many areas: finding models for logical formulas, creating an optimal shift schedule of workers, designing timetables, computing routes in a traffic network, or for expert systems of different kinds. Since it was clear that any progress in the computing power of chips would not withstand exponential growth, symbolic AI methods were not considered to have much potential for solving problems on the scale needed in industrial applications. However, the tide turned in the mid-1990s when Kautz and Selman (1992) proposed to reduce problems of this type to one that is as easy to handle as possible (but still has to deal with the combinatorial explosion) and to use search methods that are as efficient as possible for this problem. This problem is the satisfiability problem of propositional logic (SAT).

In this logic, atomic propositions (which can be true or false) are combined via connectives. The truth value of the compound formula is then given by the assertions to atomic propositions and the semantics of the connectives. Let us have a simple example with the atomic proposition “ai” (standing for “one should study artificial intelligence”) and “dh” (standing for “one should study digital humanism”). The state of an agent might be represented by the following formula:

$$ \left(\mathrm{ai}\ \mathrm{OR}\ \mathrm{dh}\right)\ \mathrm{AND}\ \mathrm{NOT}\left(\mathrm{ai}\ \mathrm{AND}\ \mathrm{dh}\right) $$

stating the fact that one should study AI or digital humanism or both (the part “ai OR dh”), but at same time—maybe due to time constraints—one should not study both at the same time (the part “NOT (ai AND dh)”). We have four possible assertions to the atomic propositions: setting both ai and dh to true; setting ai to true and dh to false; setting ai to false and dh to true; and, finally, setting both to false. Without giving an exact definition of the semantics of the connectives “AND,” “OR,” and “NOT,” it should be quite intuitive that only two of the assertions make the entire formula true, namely, those stating that one should study either AI or digital humanism. The formula is thus satisfied. Suppose now we add the knowledge that one should study AI whenever studying digital humanism and, likewise, one should study digital humanism whenever studying AI. Formally, this leads to the formula

$$ \left(\mathrm{ai}\ \mathrm{OR}\ \mathrm{dh}\right)\ \mathrm{AND}\ \mathrm{NOT}\left(\mathrm{ai}\ \mathrm{AND}\ \mathrm{dh}\right)\ \mathrm{AND}\ \left(\mathrm{dh}->\mathrm{ai}\right)\ \mathrm{AND}\ \left(\mathrm{ai}->\mathrm{dh}\right). $$

This formula is now unsatisfiable, since whatever assertions are provided to the atomic propositions, the formula does not evaluate to true. What makes the SAT problem computationally challenging is the fact that the possible assertions to be checked grow exponentially in the number of atomic propositions present in the formula.

However, it turned out that by using clever search heuristics, exploiting shortcuts in the search space, and by using highly bred data structures, certain formulas with millions of variables can be solved, but other, randomly generated, formulas cannot (Ganesh & Vardi, 2020). However, the formulas that can be solved well are often those found in the “wild.” This is partly explained by the fact that they have certain structural properties, which are used by the search procedure—if one now reduces, e.g., routing problems in traffic networks to such formulas, then the formulas have “good” properties, because in the real world, traffic networks have, e.g., maximum node degreeFootnote 1 10 and are not arbitrary graphs. This led in the past years to a success story of SAT-based methods in many areas, especially in the verification of specifications in hardware and software.

Since these applications are often no longer attributed to AI, here is an example where SAT has actually led to the solution of an open problem in mathematics, namely, the problem of Pythagorean triples: the question here is whether the natural numbers can be divided into two parts in such a way that neither of the two parts contains a triple (a, b, c) with a2 + b2 = c2. For the numbers 1 to 10, this is still possible, because I only have to avoid putting the numbers 3, 4, and 5 into the same pot. If we have to divide numbers from 1 to 15, more caution is already needed since now 5, 12, and 13 must not end up in the same pot as well, but it still works. The question is now as follows: Is this division always possible no matter how big the range of numbers is? The SAT solver said no. The numbers from 1 to 7825 can no longer be divided in this way! We refer to Heule et al. (2016) for further details on this project.

The Limits of Propositional Logic

We have thus seen that (propositional) logic can be used to solve problems that exceed human capabilities. In fact, the pioneers of symbolic AI considered logic a central vehicle to describe and simulate human thinking. However, apart from the problem of combinatorial explosion outlined above, another obstacle has arisen here. Human thinking does not always follow (classical) logical steps; we have to deal with uncertainties, process contradictory information, or even revise conclusions once made. In fact, in classical logic, it is already immensely complex to represent plausible facts like “if I put block A on B, the position of all other blocks remains unchanged”; see Hayes (1973). In the course of this, in the 1970s and 1980s, symbolic AI has been centrally concerned with other types of logic systems that allow formalizations of “common-sense reasoning.” The numerous varieties cannot be enumerated here comprehensively, but it should not remain unmentioned that these are today often subsumed under the term “knowledge representation and reasoning” (van Harmelen et al., 2008) and offer a rich portfolio of methods that could find relevance in future AI applications—in particular if it comes to explainability.

2.2 Machine Learning

General Considerations

The defining characteristic of algorithms in machine learning (ML) is that they are self-learning, meaning that the algorithm improves itself, or learns, using data. Traditionally, classical chess programs were explicitly programmed using rules that describe the advantage or disadvantage a player has in terms of points. For example, taking a rook is worth about five points, and dominating the center of the board is advantageous. Self-learning algorithms, by contrast, draw their own conclusions by watching many chess games; there is no programmer who tunes built-in rules. Hence, in ML, the availability of larger and larger data sets makes time-consuming and error-prone fine-tuning of internal rules or parameters of the algorithm superfluous.

In other words, the machine learns, while the human designs the learning algorithm. It was already recognized at the Dartmouth Workshop in 1956 that self-improvement and self-learning are central notions of intelligence.

In the modern view of ML, the data that are used for learning are supposed to be drawn from a probability distribution. Therefore, any learning is stochastic by nature, which gives rise to fundamental considerations. Because the number of data samples is always finite, although it may be huge, we may never observe samples that are important, or we may observe samples that are not representative. The first issue means our learning result can only be probably correct. The second issue means that our learning results can only be approximately correct. Therefore, the best learning results are “probably approximately correct” (PAC) statements about the quality of a learned result.

To illustrate these considerations, let us consider the example of searching for black swans. The black swan (Cygnus atratus) lives in southeastern and southwestern Australia. We must always sample the whole space, but if the number of samples is insufficient, we will never encounter a black swan. This is the first issue. The second issue is that the first black swans that we encounter may have an uncharacteristically light color, misleading us in our approximation of its color.

ML is a large field, but it broadly consists of supervised learning, unsupervised learning, and reinforcement learning. We consider these three large subfields in turn and mention some important applications of ML.

Supervised Learning

Supervised learning (SL) is concerned with finding functions that correctly classify or predict an output value given an input value. These functions are called classifiers or predictors and are chosen from a predefined class of functions and parameterized by parameters to be learned. For example, in image recognition, the inputs are the pixels of an image, and the output may be whether an object that belongs to a certain class of objects (cats, dogs, etc.) is visible or whether the image satisfies a certain property. In SL, the learning algorithm uses training data that consists of inputs and outputs and hence the name. The outputs are often called labels; e.g., an input sample may be a photo of a dog, and the corresponding output may be the label “dog.” In classification tasks, the set of all outputs is finite, whereas in prediction tasks, the set of all outputs is infinite (real numbers).

Many algorithms have been developed for SL, and we mention some of the most important ones: artificial neural networks (ANN), decision trees, random forests, ensemble learning, k-nearest neighbor, Bayesian networks, hidden Markov models, and support vector machines.

Without doubt, nowadays, the most prominent approach to SL is the use of ANNs as classifiers/predictors (Heitzinger 2022, Chapter 13). ANNs are functions that are arranged in layers, where linear functions alternate with pointwise applied nonlinear functions, the so-called activation functions (see Fig. 1). ANNs have a long history, having been already discussed at the Dartmouth Workshop in 1956. A first breakthrough was the backpropagation algorithm (which is automatic backward differentiation), because it enabled the efficient training of ANNs.

Fig. 1
A flowchart of the A N N model. The nodes of the input layer are mapped to the nodes of the hidden layer 1, followed by hidden layers 2 and 3, and the output layer. Each node of the hidden layer 3 is mapped to the nodes of the output layer.

This schematic diagram shows how an ANN works. On the left-hand side, a vector of real numbers is the input to the ANN. In the hidden layers, whose number is the depth of the network, the input vector is transformed until in the final output layer, the output vector is calculated. In each hidden layer, the previous vector is multiplied by a matrix (the weights), another vector (the bias) is added, and a nonlinear function (the activation function) is applied element-wise. All these weight vectors, bias vectors, and activation functions must be adjusted such that the ANN solves the given classification/prediction problem. The arrows indicate how one parameter influences other parameters. In this example, the output vector consists of three numbers. The largest of the three signifies one of the three classes if the network is used as a classifier. (Figure from Heitzinger (2022, Chapter 13))

Why are ANNs so successful? Although classification is a discrete problem, ANNs are differentiable functions, and, as such, they have gradients, which are the directions of fastest change of a function. Knowing this direction is extremely useful for solving optimization problems, as the gradient provides a useful search direction. For training in SL, it is hence expedient to use the gradient of the classifier/predictor in order to solve the error minimization problem. In ANNs, calculating the gradient is surprisingly fast due to the backpropagation algorithm, taking only about twice as long as evaluating the ANNs.

ANNs are very flexible data structures, and many different ones have been employed, since the number of layers and their sizes can or must be adjusted to the SL problem at hand.

If the number of layers is small, but the sizes of the layers become larger, any continuous function can be approximated, resulting in the famous universal approximation property of ANNs. However, this property of wide ANNs is misleading. In practice, increasing the number of layers helps image recognition and many other applications, resulting in deep, not wide, ANNs. This is the main observation behind deep learning, which is learning using deep ANNs.

A breakthrough recent development are transformers, which are a certain kind of ANN that uses the so-called attention mechanism to learn relationships between words across long distances in a text. Transformers originated in machine translation (Vaswani et al., 2017), yielding the best and fastest machine translation at that time. They were adapted for use in InstructGPT, ChatGPT, and GPT-4 and are a milestone in natural language processing.

The attention mechanism solves two main challenges in natural language processing, both for translation and for text generation. The first challenge is that natural language presupposes a lot of background knowledge—or a world model or common sense—in order to make sense of ambiguities in natural language. The second challenge is the use of pronouns and other relationships between words, sometimes over large distances in a text. The attention mechanism addresses both challenges surprisingly well and can learn the grammar of most natural languages.

Unsupervised Learning

In contrast to SL, there are no outputs in unsupervised learning (UL). In UL, the learning task is to find patterns in input samples from untagged or unlabeled data. Often, the input samples are to be grouped into classes according to their features, or relationships between the samples are to be found.

These relationships are often expressed as graphs or by special kinds of ANNs such as autoencoders.

Common approaches in UL are clustering methods, anomaly detection methods, and learning latent variable models. Clustering methods include hierarchical clustering, k-means, and mixture models. An example of a clustering problem is taking a set of patients and clustering them into groups or clusters according to the similarities of their current states. Anomaly detection methods include local outlier factors and isolation forests. Latent variables can be learned by the expectation-maximization algorithm, the method of moments, and blind signal separation techniques.

Reinforcement Learning

Reinforcement learning (RL) is the subfield of machine learning that is concerned with finding optimal policies to control environments in time-dependent settings (Sutton & Barto, 2018). In each time step, the actions of the agent influence the environment (and possibly the agent itself), and the new state of the environment and a reward are then communicated to the agent. The learning task is to find optimal policies that maximize the expected value of the return, i.e., the discounted sum of all future rewards that the agent will receive while following a policy.

RL is a very general concept that includes random environments as well as policies whose actions are random. It encompasses all board games such as Go, chess, and backgammon, where the rewards are non-zero typically only at the end of the game. The agent receives a reward of +1 for winning, a reward of −1 for losing, and a reward of 0 for a draw. Other applications occur in robotics, user interactions at websites, finance, autonomous driving, medicine (Böck et al., 2022), etc.

Reinforcement learning problems are hard in particular when lots of time passes between taking an action and receiving positive rewards due to this action or a combination of actions. This is the so-called credit assignment problem.

In deep reinforcement learning, deep neural networks are used to represent the policies and the so-called action-value functions. In this context, deep neural networks serve as powerful function approximators in infinite state (and action) spaces. In distributional reinforcement learning as an extension of the classic approach, not only is the expected value of the return maximized, but the whole probability distribution of the return is calculated. This makes it possible to know the risk that is associated with an action that may be taken in a given state.

An early success of RL in the 1990s was solving the board game of backgammon with a huge search tree and a large random component (Tesauro, 1995). Starting in the 2010s until today, reinforcement learning has been the field that enabled a string of milestones in the history of AI. The string of publications (Silver, 2016, 2017, 2018) showed in progressively simpler, but at the same time more powerful, algorithms that Go, chess, shogi, and a large collection of Atari 2600 games can be solved by self-learning algorithms. Quite impressively, a single algorithm, AlphaZero, can learn to play these games at superhuman level. It also learns starting from zero knowledge (tabula rasa), hence Zero in its name.

In the following years, more complicated computer games were solved by similar approaches. Computer games and card games such as poker pose their own challenges, as they contain a considerable amount of hidden information, while all state information is observable by the agent in board games such as chess and Go.

RL is also the reason that InstructGPT (Ouyang et al., 2022)—a precursor—and ChatGPT/GPT-4 (OpenAI, 2023) work so well. A generative pre-trained transformer (GPT), having been trained on vast amounts of text, can generate beautiful text, but it is hard to make it give helpful answers.

The final, but crucial, step in training InstructGPT and ChatGPT is reinforcement learning from human feedback (RLHF) (Ouyang et al., 2022), where four answers to prompts are ordered and these orderings are used as the reward model in the final RL training step. This RL training step aims to align the language model to the needs of the user.

The needs of the user are essentially the 3H (OpenAI, 2023). The first H is for honest; the language model should give honest/correct answers. (Truthful or correct would be better names, as the inner belief of the language model is unknown.) The second H is for helpful; the answers should be helpful and useful. The third H is for harmless; the system should not give any answers that may cause harm. Unfortunately, these three goals are very difficult to achieve in practice and even contradictory. If we ask a language model how to rob a bank, it cannot be helpful and harmless at the same time. Much of ongoing research on AI safety is concerned with satisfying the 3H requirements.

Applications of ML

Because its algorithms are very versatile, ML has found many applications, and its range of applications is still expanding. Due to the speed of using ML algorithms and due to the algorithms having reached near-human or superhuman capabilities in many areas, they have become practically important in many areas.

Applications include bioinformatics, computer vision, data mining, earth sciences, email filtering, natural language processing (grammar checker, handwriting recognition, machine translation, optical character recognition, speech recognition, text-to-speech synthesis), pattern recognition (facial recognition systems), recommendation systems, and search engines.

2.3 Combination of Methods

It is evident that human intelligence relies on different cognitive tasks with the separation into fast and slow thinking (Kahneman, 2011) being a popular approach. Fast thinking refers to automatic, intuitive, and unconscious tasks (e.g., pattern recognition), while slow thinking describes conscious tasks such as planning, deduction, and deliberation. It is evident that machine learning is the method to simulate fast thinking, while symbolic approaches are better suited for problems related to slow thinking. Consequently, the combination of both approaches is seen as the holy grail for next-level AI systems.

In recent years, the term neuro-symbolic AI has been established to name this line of research. However, it comes in many different flavors, and we shall just list a few of them. First, the famous AlphaGo system is mentioned as a prototypical system in this context: the symbolic approach is Monte Carlo tree search to traverse the search space (recall our consideration on chess in Sect. 2.1), but the board evaluation is done via ML techniques (in opposite to Deep Blue where board evaluation was explicitly coded and designed by experts).

A second branch are neuro-symbolic architectures where the neural nets are generated from symbolic rules (for instance, graph neural networks—GNN). Finally, approaches like DeepProbLog offer a weak coupling between the neural and the symbolic part; essentially, deep neural networks are treated as predicates that can be incorporated, with an approximate probability distribution over the network output, into logical rules, and semantic constraints can be utilized for guided gradient-based learning of the network. However, it has to be mentioned that such hybrid architectures do not immediately lead to human-like cognitive capabilities or even consciousness or self-awareness.

3 Reflections

3.1 AI4Good

Through the lens of digital humanism, one might ask where AI provides us with a valuable tool to support human efforts toward solutions to vexing problems. Such applications are often termed “AI for Good,” and there are indeed many examples where we benefit from AI. Such applications range from applications in medicine (treatments, diagnosis, early detection, cancer screening, drug design, etc.) to the identification of hate speech or fake news (cf. chapter by Prem and Krenn) and tools for people with disabilities. A more subtle domain is climate change: while AI techniques can be used to save energy, control herbicide application, and many more, it is AI itself that requires a certain amount of energy (in particular, in the training phase). For a thorough discussion on this important topic, we refer to Rolnick et al. (2023).

3.2 Is ChatGPT a Tipping Point?

ChatGPT is without doubt a major milestone in the history of AI. It is the first system that can interact in a truly helpful manner with users, as demonstrated by its scores on many academic tests (Eloundou et al., 2023). It shows surprising powers of reasoning, considering that it is a system for generating text. Its knowledge is encyclopedic, since during learning, the GPT part has ingested vast amounts of text, probably a good portion of all existing knowledge.

Interestingly enough, ChatGPT’s creativity is closely coupled to its so-called temperature parameter and can therefore be adjusted easily. During text generation, the next token or syllable is chosen from an ordered list of likely continuations. At a low temperature, only the first syllables on the list have a chance of being selected, but at a higher temperature, more syllables down the list also stand a chance. Thus, a higher temperature parameter during text generation increases creativity. Again, the Dartmouth Workshop turned out to be prescient, since creativity in the context of intelligence was a major topic discussed there.

ChatGPT can also be used to solve mathematical or logical problems. However, as a program for generating text, it is no substitute for specialized programs such as computer algebra systems and SAT solvers. However, it is straightforward to couple such specialized programs to ChatGPT. ChatGPT can be trained to become proficient in using programming languages, and therefore, it can generate input to those coupled programs. We expect that such interfaces will become more and more refined and will gain importance, supplementing ChatGPT’s capabilities with procedural domain-expert knowledge.

Therefore, we predict that ChatGPT will revolutionize human-computer interfaces (HCI). The reason is that it can act as a knowledgeable translator of the user’s intention to a vast array of specialized programs, reducing training time for human users and resulting in interactive usage with lots of guidance and easily accessible documentation.

Its potential for revolutionizing HCIs in this sense may likely turn out to be its true power and a tipping point, but its effects to society should not be underrated, and critical reflection over different disciplines is needed.Footnote 2

3.3 Pressing Issues

Media often associate the danger of AI with robot apocalypses of various kinds. However, we see the main issue in the ever-growing use of data (see Sect. 2.2) to train more and more advanced AI models. This leads to several problems related to copyright, privacy and personalized systems, and low-cost workers for labeling data and training the models. Due to limited space, we will not discuss other important issues here, such as education and the impact of AI on the working world.

In fact, ChatGPT has been fueled with sources such as books, (news) articles, websites or posts, or even comments from social networks to perform its core function as a dialogue system. No one has asked us whether we agree that our research papers, blog articles, or comments in social media shall be used to train such an AI. While this kind of copyright issues has always been pressing on the Web, the use of private data becomes even more problematic with personal AI assistants. We know that social media platforms are designed to keep the user on the platform (and thus to present more advertisements) using data about her personal preferences, browser history, and so on. As side effects, we have seen what is called filter bubbles, echo chambers, etc., leading to political polarization and undermining democratic processes in the long run.

All these effects have the potential to be multiplied when AI assistants start to use knowledge about their users to give answers they want to hear, supporting them in radical views, etc. We should have learned our lessons and be extremely careful in feeding AI systems with personal data! Finally, it should not be overseen that AI often relies on hidden human labor (often in the Global South) that can be damaging and exploitative—for instance, these workers have to label hate speech, violence in pictures and movies, and even child pornographic content. For ChatGPT, it has been revealed that the fine-tuning of the system in order to avoid toxic answers has been delegated to outsourced Kenyan laborers earning less than $2 per hour.Footnote 3 For the first time, this led to some media echo in this respect, thus raising awareness to a broader public. However, this particular problem is not a new one and seems inherent to the way advanced AI systems are built today (Casilli, 2021).

4 Conclusions

There are three main factors that have resulted in the current state of the art of AI. The first is the availability of huge data sets and databases for learning purposes, also due to the Internet. This includes all modalities, e.g., labeled images for supervised learning and large collections of high-quality texts for machine translation. The second is the availability of huge computational resources for learning, in particular graphic cards (GPUs) and clusters of GPUs for calculations with ANNs. The third factor are algorithmic advancements and software tools. While ANNs have appeared throughout the history of AI, new structures and new gradient-descent algorithms have been and still are instrumental to applications. Another example are the advancements in RL algorithms and SAT solvers.

A division of AI into symbolic AI on the one hand and into machine learning or non-symbolic AI on the other hand can also be viewed as a division of all methods employed in AI into discrete and continuous methods. Here, continuous methods are methods that use the real numbers or vectors thereof, while discrete methods do not and often focus on logic and symbolic knowledge. Furthermore, many problems in AI can be formulated as (stochastic) optimization problems; for example, in supervised learning, an error is to be minimized, and in reinforcement learning, optimal policies are sought.

Among optimization problems, continuous optimization problems can be solved much more efficiently than discrete optimization problems due to the availability of the gradient, which indicates a useful search direction and which is the basis of the fundamental gradient-descent and gradient-ascent algorithms. Thus, the formulation of learning problems as problems in continuous optimization has turned out to be tremendously fruitful. An example is image classification, a problem in supervised learning, which is discrete by its very nature: the question whether an image shows a dog has a discrete answer. Using ANNs and a softmax output, this discrete problem is translated into a continuous one, and training the ANN benefits from gradient descent.

Since the Dartmouth Workshop, AI has seen tremendous, albeit nonlinear, progress. Throughout the history of AI, we have witnessed AI algorithms becoming able to replicate more and more capabilities that were unique to human minds before, in many cases surpassing human capabilities. ChatGPT is the recent example that revolutionizes how AI deals with natural language. It is remarkable that it can compose poems much better than nearly all humans. Also, systems such as AlphaZero and ChatGPT took many people, including AI researchers, by surprise.

We expect these developments and the quest for superhuman capabilities to continue. The recent breakthroughs will see some consolidation in the sense that learning algorithms will become more efficient and better understood. At the same time, many open questions and challenges remain, and the three driving factors of AI discussed at the beginning of this section will remain active.

Research will continue at a fast pace, and more and more human capabilities will be matched and surpassed. The defining characteristic of humans has always been that we are the smartest entities and the best problem-solvers. This defining characteristic is eroding. It will be up to us to improve the human condition and to answer the philosophical question of what makes us human; it will not be our capabilities alone.

Discussion Questions for Students and Their Teachers

  1. 1.

    Which are, in your opinion, the major opportunities and positive effects of AI technology?

  2. 2.

    Provide a list of cognitive tasks humans are capable to do, and discuss which AI method would be the one to solve it.

  3. 3.

    Which are, in your opinion, the major risks of AI technology?

  4. 4.

    Which types of questions can be answered well by large language models such as ChatGPT? Which cannot be answered well?

  5. 5.

    For which types of questions and in which areas do you trust the answers of large language models such as ChatGPT?

  6. 6.

    What do you expect to use computers for in 5 years’ time for which you are not using them nowadays? In 10 years’ time?

  7. 7.

    In their book Why Machines Will Never Rule the World, Jobst Landgrebe and Barry Smith argue that human intelligence is a capability of a complex dynamic system that cannot be modeled mathematically in a way that allows them to operate inside a computer (see also the interview here: https://www.digitaltrends.com/computing/why-ai-will-never-rule-the-world/). Find arguments in favor and against their claim.

  8. 8.

    For a provocative article on machine learning and its limits, see Darwiche (2018). Discuss this article in the light of recent developments.

Learning Resources for Students

  1. 1.

    Marcus, G., and Davis, E. (2019) Rebooting AI—Building Artificial Intelligence We Can Trust. Pantheon.

    This is a popular science book by a psychologist and a computer scientist; it offers an analysis of the current state of the art and discusses the need for robust, trustworthy AI systems.

  2. 2.

    Russell, S.J., and Norvig, P. (2021) Artificial Intelligence, a Modern Approach. 4th edition. Pearson.

    This is a standard textbook on artificial intelligence, which comprises 7 parts (artificial intelligence; problem-solving; knowledge, reasoning, and planning; uncertain knowledge and reasoning; machine learning; communicating, perceiving, and acting; conclusions) on more than one thousand pages. The two authors, highly accomplished researchers, provide comprehensive treatments of all major strands of AI.