1 Introduction

Artificial intelligence (AI) research can benefit immensely from insights gained in application projects. For this special issue, we focus on two application areas, namely healthcare and public sector projects. One can hardly deny, however, that mere application of so-called “AI methods” dominates application-oriented AI research these days. Unfortunately, in this context identifying a contribution to the development of solutions to genuine AI research problems is often neglected. To put it in other words, insights from specific application projects are rarely propagated back to AI as a science. Therefore, it is our understanding that we should view application-oriented research in these fields from the perspective of its impact on foundational (“genuine”) AI research by propagating application insights and results back to potential solutions for foundational AI problems. Enabling this type of backpropagation to occur, we should first somehow try to verbalise what the overarching long-term goal of AI research is or should be. This is known to be all but easy and might raise arguments, but we cannot leave the definition of genuine AI research problems entirely to common sense. From long-term research goals, in sync with international AI research as well as regulation activities, and formulated at a high level of abstraction, we can systematically argue how the papers in this special issue contribute to genuine AI research even if the grand vision of AI is only partially realised or taken care of in a particular application paper. Thus, we can contribute to a long-term perspective for application-oriented AI research, which is at the same time mid-term grounded in practical applications that are already directly relevant for application partners.

2 Social Mechanisms and Research in AI

As we argued before, suggesting to backpropagate the results of (healthcare or public sector) applications to genuine AI research without giving a tentative definition of AI as a science would be a rather weak suggestion. Let us give it a try at least. The long-term research object of AI is to enable the development of flexible systems that, based on explicitly or implicitly given task descriptions and internal models about the environment (the “world”), are able (i) to appropriately interpret so-called (dynamically) perceived percepts (of which task descriptions might be a special case) and (ii) to reactively determine and execute actions in the world to best solve posed tasks. As a result, interpretations of task descriptions in a social context must meaningfully be evolved by the systems themselves (possibly by exploiting feedback) when performing actions to the benefit of all participants [18] in a social context, also called social mechanism (cf. [6, 24, 26]). Since the systems described above act in a social mechanism, and thus on behalf of humans and with impact on (other) humans, the systems addressed are called agents. Actions of agents can be of a communicative nature (and thus influence the recipient of a communicated message) or of a physical nature (and, for example, influence the physical environment).

Research in AI as a science at an international level includes the analysis of social mechanisms with possibly a large number of agents and their interactions with each other and the benefits for humans in terms of goals, constraints, and regulations imposed by society. Due to general complexity considerations, which are independent of any ontological and epistemological commitments [17, p. 254f.] used in internal models, an agent can only act optimally in a limited way. Acting as close to perfect (or optimal) as possible from a local perspective (rationality), well-reflecting natural limits (bounded optimality), defines the concept of intelligence used in AI, but it is meaningless without reference to an agent. The concept of agent intelligence as optimal acting under (strong) resource constraints while continuously adapting to task descriptions posed to the preferences of all humans in a social mechanism with cooperative action defines a long-term umbrella of research goals of genuine AI as a science, considering manifold ontological and epistemological commitments for modelling as well as the social interaction context [6] in which humans interact with intelligent agents. Descriptive and normative ethics research as well as other fields of philosophy are intimately related with AI as a science [14].

Given the idea of task descriptions that can be given to agents, and incorporating well-established research to realise agents that can appropriately interpret task descriptions in a social context, genuine AI research provides long-term support for future system development projects and can actively counteract brittleness effects of contemporary IT systems. When we talk about trustworthiness and similar concepts, it becomes apparent that the social interaction context needs to be addressed as well in much detail. The social mechanism of using AI technology is indeed human-defined and, after all, is supposed to serve a purpose; AI does not just ‘come upon us’ (or does it?).

AI projects should be able to analyse social contexts, and – from this perspective – AI projects are able to avoid, e.g., discriminative behaviour of agents with respect to humans. There might be a social context in which a local technical system of AI agents would be trustworthy to the human of the context, whereas the same system might be questionable in another context (e.g., when there might be humans involved without having received dedicated instructions about pros and cons of system behaviour). It may very well be that system developers use the latest and greatest AI techniques (e.g., contemporary foundation models of the LLaMA, PaLM, or GPT families and respective interactive front-end systems), but overall cannot prove, justify, or at least make plausible that the social mechanism in which humans and intelligent agents interact has certain desired properties or properties required by law. Then, it should be argued that one simply cannot implement the mechanism, even though it might offer advantages to individual humans involved (and these humans might give others doubtful incentives to participate, such as those, e.g., exhibited with click-baits these days). This is exactly where regulation comes in to block unwholesome incentives to humans in a mechanism. In any case, trustworthy AI may not even go far enough for genuine AI research; the social interaction context must be addressed more carefully, i.e., the focal points of certification, regulation, and control should be named.

3 Representation Formalisms in AI

Genuine AI research also needs to address ontological and epistemological commitments of representation formalisms. On the one hand, as an ontological commitment when modelling real-world phenomena, it has, after all, been proven to be beneficial in computer science to speak about objects, i.e., linguistically tangible things, phenomena, or effects. For the appropriate handling of incoming information to an agent, the ontological commitment is made explicit internally by dealing with objects, relations, and facts. Also fitting into this ontology are attributes of (multiple) objects, where interrelationships of attribute values may be described via statistical notions and/or notions used in calculus (e.g., differential equations). On the other hand, by modelling uncertainty as an epistemological commitment, it is possible for an agent to model (subjective) estimates about attribute values or facts in terms of distributions to optimise its own actions to accomplish certain internal goals stemming from tasks communicated to the agent.

Ontological and epistemological commitments of representations are made explicit when latent embedding representations used in, e.g., transformers are employed for object recognition (a combination of object localization and classification approaches) in percept sequences. Various mathematical representation approaches beyond embedding vector spaces and (piece-wise) linear mappings qualify for making some aspects of percept embeddings explicit in an embedding-explicit representation. For instance, probabilistic circuits have been shown that they can be equally successful as transformers for object recognition based on learning technology [10]. Furthermore, they can be used to control transformers or steer diffusion models for inpainting tasks [12, 27]. From a systems perspective indeed, any formalism employed is to be understood in the context of ontological and epistemological commitments, as mentioned above.

On the one hand, it becomes clear that embedding data in vector space representations is a useful but very old technique (cf. TF.IDF [23], locality-sensitive hashing [7], or locality-preserving hashing [28]). On the other hand, it also becomes apparent that the conceptions of symbolic representations are sometimes based on wishful thinking. Non-specialists fall into the trap believing that a human understanding of the meaning of symbols is also built into reasoning systems. For instance, the common-sense semantics of names used for nodes in a Bayesian network is not captured by the formal semantics of the Bayesian network (i.e., the joint distribution of respective random variables). This effect could be reduced by employing a foundation model (e.g., from the LLaMA, PaLM, or GPT families) to also capture common sense aspects of the meaning of a formal symbolic representation, e.g., a Bayesian network or a set of logical formulas that are assumed to be true (knowledge base, belief base). Recent AI research explores this kind of symbol grounding in embedding spaces built with foundation models (symbol-implicit representation).

Very few approaches in the literature investigate the combination of embedding-explicit and symbol-implicit representations from a semantics-based perspective, but see [19] and [9] as notable exceptions. The real problem of boosting embedding-based models with formal representations still needs to be investigated in genuine AI research, however, possibly with important insights gained from backpropagating results from application projects to better shed light on the heart of the embedding-explicit-symbol-implicit problems to be solved. If we explicitly encode ontological structures in embedding spaces, e.g., describe the world via variables represented as attributes of objects, their relations, and facts, then the central question is how to preserve the gradual influence possibility during model adaptations (learning) on the level of the vector space embedding with respect to the explicit structures encoded in the embedding space. Whether one uses Treat(Alice, Paracetamol) is true or in the form of Medication(Alice) = Paracetamol also depends on pragmatic considerations of modelling. But here it shines through why machine learning in symbolic formalisms, as opposed to learning in more “uniform” vector space formalisms (based on linear algebra), is still particularly challenging, despite many fruitful advances (e.g., in inductive probabilistic logic programming [1, 16]). The change from one to the other modelling form as in the above-mentioned example is in some way discontinuous or disruptive concerning the employed terminology w.r.t. the given ontology of objects, relations, facts, and temporal structures (based on logic and, sometimes, (differential) equations as constraints).

An adaptation of the ontology used by an agent at certain points in time in its operational activities (does it switch to propositional logic for reasons of computational efficiency, or does it temporarily take vagueness into account?) or an adaptation of epistemological commitments (e.g., is uncertainty explicitly represented?) is rarely investigated. However, this would be very purposeful for AI research, and with respect to the concept of intelligence mentioned above: optimal action under (severe) resource constraints while constantly adapting the task descriptions that govern the action to the preferences of all people in a social mechanism.

Focusing on “what” the research is all about, the “how” becomes a secondary issue, which helps gain insights. For instance, transformers such as those used in GPT-alike systems have clearly started a new era of AI. It has become clear that what matters are automatically computed mappings of (possibly high-dimensional) data into respective embedding spaces together with the exploitation of properties of embedding spaces (e.g., notions of point proximity in the destination space). With transformers or probabilistic circuits and respective massively parallel compute being available, AI as a science has finally emancipated itself successfully from (computational) neuroscience. Data employed can indeed be very diverse, e.g., signals in general, videos, images, natural language strings, graphs, etc. Differently structured embedding spaces are defined for different purposes as usual in mathematics. These days, fortunately, AI research focuses on the “what” again, namely, on solving largely ill-posed inverse problems with encoder mappings, and with decoders used for learning only, or with decoders used for solving also generative problems (generative AI). Often, the functionality of decoders can be formally specified, can be formulated algorithmically, and no longer always needs to be computed by expensive, imprecise, and hard-to-control function approximation processes based on huge input–output datasets (machine learning).

Directly using foundation models of the GPT family in application programs defines a new software engineering paradigm. With very interesting recent developments, such as GPTs or AgentGPT,Footnote 1 it has become clear that very high-level symbols, namely agents, are indeed very useful abstractions for building intelligent systems (without reference to an agent, the notion of intelligence is ill-defined even). In addition, with GPTsFootnote 2 being used in industrial applications, even inside agents, well-established AI concepts such as, e.g., plans, are back to mainstream industrial AI and are used for computing useful prompts for internal GPT calls. Interacting with foundation models in this way, agents can even derive new plan operators on the fly for complex task descriptions the agents need to deal with. This might reduce brittleness in agent operation. How trustworthiness (e.g., with graceful degradation in performance) can be obtained in this setting in a social mechanism is an open AI research question, though.

4 Closing Thoughts

In healthcare, empirical AI approaches of the past are hardly applicable in the large due to certification and regulation requirements, which indeed consider the social mechanism of agent interaction. But even the largely empirical way of doing research in AI as exercised yesterday has been overcome in the era of post-neural AI we live in now. Nowadays, the “what”, to be computed internally by an agent to map given task descriptions to actions, is again focused on more carefully, e.g., formulated as query answering problems on dynamic probabilistic relational models. For instance, image inpainting as an example for generative AI can be understood as solving the most-probable explanation problem (aka maximum a posterior problem). Given the “what” is specified, the “how” can really be evaluated systematically. For instance, practical computational architectures such as convolutional neural nets, transformers, or probabilistic circuits compute approximations of problem specifications with well-understood error bounds for solving the inverse problems underlying lower-level task descriptions.

5 The Special Issue Articles Viewed Under a Backpropagation Lens

We now survey the articles of this issue and shortly review them with respect to their contribution to genuine AI research problems as described above.

Analysing Semantically Enriched Trajectories

Jana Seep [21]

Understanding movements of persons in the public sector, e.g., to reduce street traffic accidents or support better orientation of people in complex buildings, the article analyses so-called semantically enriched trajectories that are used to describe observed movements. While it is widely known that trajectories can be mapped into embedding spaces, the article proposes a new model based on an extended finite state machine, which allows for the representation of information about the context of the trajectory in terms of symbolic (“semantic”) attachments based on the state of the associated automaton. Then, as trajectories are clustered in the embedding space, semantically enriched representative trajectories are computed. Clustering considers context information attached to the trajectories (an extension of k-means is proposed in the paper to accomplish this). The author shows the feasibility of the approach by evaluating the possibility to provide decision support for domain experts in two different public sector related contexts.

From the perspective of AI research, the article investigates how temporal location information (trajectories) and events represented as states of an automaton can be embedded into a common space. Thus, the paper provides deep new insights into a combination of embedding-explicit and symbol-implicit representation formalisms, and it demonstrates possible use cases using the public sector as an example.

Partial Image Active Annotation (PIAA): Efficient Active Learning using Edge Information in Limited Data Scenarios

Md Abdul Kadir, Hasan Md Tusfiqur Alam, Devansh Srivastav, Hans-Jürgen Profitlich, Daniel Sonntag [8]

Active learning (AL) algorithms are being increasingly used to train models with limited data for annotation tasks. However, the selection of data for AL is a complex issue due to the restricted information available on unseen data. To tackle this problem, the authors introduce the Partial Image Active Annotation (PIAA) technique, which employs edge information of unseen images to measure uncertainty. Uncertainty reasoning is the key to explore for which parts of input data additional annotations should be acquired. The approach is investigated in an application setting in which agents need to deal with multi-class Optical Coherence Tomography (OCT) segmentation tasks. Based on state-of-the-art embedding architectures, uncertainty measurement is implemented based on convolutional neural nets with edge detection operators. The results presented in the article are very promising. Thinking outside the box of the application, it becomes clear that the insights described are indeed very relevant for genuine AI research in general, because it is shown how agents can now reason about investments into their own performance, e.g., by contacting humans to label agent-computed datasets or by contacting suitable expert agents that might help reduce uncertainty in segmentation, and, as a consequence, decision support.

Requirements for a Social Robot as an Information Provider in the Public Sector

Thomas Sievers, Nele Russwinkel [22]

The article deals with a dedicated social mechanism in the public sector, and it asks whether it might be possible to integrate a humanoid social robot into the work processes or customer care in an official environment, e.g. in municipal offices. Using an application example from the Kiel City Council, Germany, the article illuminates what kind of skills a robot needs when interacting with human customers. One of the most important insights gained in the project was that a humanoid robot with natural language processing capabilities based on large language models as well as human-like gestures and posture changes (animations) proved to be much more preferred by users compared to standard browser-based solutions on tablets for an information system in the City Council. Another important insight is that the simulation of cognitive processes to drive the dialog between human and robot. In particular, it is argued that equipped with respective domain knowledge, the ACT-R cognitive architecture can enhance user-robot communication. Providing information is seen as acting on the mental state of the human communication partners. Consequently, the article directly contributes to genuine AI research questions, namely appropriate dialog planning and conversation management to avoid explanation requests in beforehand such that a fluent style of communication is achieved that is tailored to the respective state of knowledge of the human communication partners. The analysis of the exact social mechanism in which the technology is used is also very important since the AI mental modeling technology might not easily capture all aspects of human psychology. Acting on the mental state of elderly people in a retirement home, for example, requires early anticipation of bad emotions, say, to avoid unwanted developments of the conversation. Emotion recognition for emphatic assistants is discussed in the next article of this volume.

Auditive Emotion Recognition for Emphatic AI-Assistants

Roswitha Duwenbeck and Elsa Andrea Kirchner [2]

The article investigates how to use speech and other bio signals for emotion recognition to improve remote and direct healthcare. The article looks at use cases, goals, and challenges, of implementing a possible solution with the following subgoals: speech emotion recognition, stress detection and classification as well as emotion detection from physiological signals. Possible pitfalls and difficulties in specific social contexts are outlined, and it is emphasised that the use of continual learning needs to be integrated into the architecture of the envisioned system for a multimodal emotion recognition system. The article directly constitutes an important contribution to genuine AI research because, as we argued before, despite effective communication planning, emotion recognition is urgently needed to compensate for the unavoidable lack of user model information. Emotion recognition is a fundamental part of agent perception and acts as a safety net in anticipatory decision making. Emotion recognition might also help in agent decision making about providing (multimodal) explanations in certain conversation states as discussed in the next article.

Multimodality in Explanations: Lessons Learned from Image Classification for Medical and Clinical Decision Making

Bettina Finzel [3]

Depending on the application domain, data modality, and classification model, the requirements for the expressiveness of explanations vary, and they depend on the respective information need of explainees, which is not necessarily made explicit in certain communication situations. To address the explanation gap, the article motivates multimodal explanations and demonstrates the need for combined and expressive approaches. The article is based on two image classification use cases: digital pathology and clinical pain detection using facial expressions. Various explanatory approaches that have emerged in this context are categorised in expressiveness according to their modality and scope. The article highlights open challenges and suggests future directions for explanation frameworks, and it thereby directly contributes to central AI research questions, namely how to use the communication bandwidth available with a combination of modalities effectively, while still completing the overall task, such that, e.g., decision making support is still accomplished. Appropriate decision support is often highly error-prone due to the variability in knowledge of different stakeholders, however. This is an issue dealt with in the following article as well.

Building an AI Support Tool for Realtime Ulcerative Colitis Diagnosis

Bjørn Leth Møller, Bobby Zhao Sheng Lo, Johan Burisch, Flemming Bendtsen, Ida Vind, Bulat Ibragimov, Christian Igel [15]

The Mayo Endoscopic Subscore (MES) index is the standard for measuring the severity of ulcerative colitis during endoscopic evaluation. However, MES is subject to high inter-observer variability, possibly leading to misdiagnosis and suboptimal treatment. In order to mitigate observer variability effects, the article proposes a machine-learning based MES classification system to support endoscopic processes. The system runs in real-time in a clinical context and augments doctors’ decision-making during endoscopy. The approach is evaluated with a combination of a standard non-clinical model test and a first clinical test of the system on a real patient. From an AI perspective an agent is realised that, given the task description of classifying endoscopic videos in real-time, can reason about percepts (video subsequences) and besides classification results can give feedback to the environment on success estimation, e.g., propose different viewpoints for the endoscope in case classification results become very uncertain. Classification results as well as camera viewpoint feedback etc. are shown to the human operator the agent is expected to support. For computing image classification results, the agent is configured with a convolutional neural network architecture, and fine-tuning of the agent for MES is done based on dedicated endoscopic image data labeled by different experts (with majority voting). The paper shows that image processing agents can be successfully set up for a clinical setting with a reasonable amount of effort.

EpiPredict: Agent-Based Modeling of Infectious Diseases

Janik Suer, Johannes Ponge, Bernd Hellingrath [25]

Decision-makers in the public-health sector faced the challenge of selecting effective countermeasures for a newly emerging disease with limited historical data and little understanding of its dynamics. To evaluate these decisions, infectious disease modelling has proven to be a valuable tool, providing insights into disease dynamics as well as on predicting future outcomes for different scenarios. Agent-based models, which simulate populations at an individual level, are suited to capture the complex individual behaviors and the arising aggregated system evolution, making them fitting tools to evaluate disease progression within highly heterogeneous populations. From an AI perspective, the article investigates the interaction of multiple agents in a mechanism using simulation technology. While a single agent is modeled just as simple reflex agents, the interaction of multiple agents as models for humans can be used to analyse epidemic effects.

A Toolchain for Privacy-Preserving Distributed Aggregation on Edge-Devices

Johannes Liebenow, Timothy Imort, Yannick Fuchs, Marcel Heisel, Nadja Käding, Jan Rupp, Esfandiar Mohammadi [11]

To facilitate the analysis of sensitive location data, the article presents a toolchain for a distributed, privacy-preserving aggregation of localisation data by taking the limited resources of edge-devices into account. The toolchain realises an agent with distributed sensors, and the paper analyses differential privacy in data processing of such a kind of agent. Distributed aggregation is based on secure summation, and in this way other parties cannot learn sensitive data of single human interaction partners. The article performs an evaluation of the power consumption, the running times as well as the bandwidth overhead on real as well as simulated devices. The article demonstrate that the realisation of differential privacy is indeed possible to realise designs involving agents with distributed sensors.

Automated Computation of Therapies Using Failure Mode and Effects Analysis in the Medical Domain

Malte Luttermann, Edgar Baake, Juljan Bouchagiar, Benjamin Gebel, Philipp Grüning, Dilini Manikwadura, Franziska Schollemann, Elisa Teifke, Philipp Rostalski, Ralf Möller [13]

Failure mode and effects analysis (FMEA) is a systematic approach to identify and analyse potential failures and their effects in a system or process. The article provides a formal framework to allow for automatic planning and acting of agents using FMEA models. More specifically, from the viewpoint of a task description, an FMEA model is given a semantics as a Markov decision process (MDP). Based on information about concrete patient and using MDP semantics it is shown that agents can derive optimal therapies for the treatment of patients based on FMEA models supplied to it as a task description. Furthermore, the article provides techniques for formaliing the knowledge state of an acting agent in the MDP context.

AutoRAG: Grounding Text and Symbols

Tim Schulz, Malte Luttermann, Ralf Möller [20]

In safety-critical domains like healthcare, agents serving humans with a certain information need might be used to generate accurate and reliable diagnoses and treatment recommendations employing natural language question-answering systems (based on large language models). In the social mechanism in this context, humans advised by an agent can benefit from a formal (symbolic) model of (aspects of) the application domain (cf. the MCEA approach discussed in the previous paragraph). Thus, agents communicate with humans in a multimodal fashion, i.e., with natural language output complemented with fragments of the formal model that is relevant for the automatically generated text. Inspired by retrieval-augmented generation, which requires a language model to ground its output in verified knowledge from a formal model, the article introduces AutoRAG. AutoRAG aims to build an internal feature space with formal model supervision by autoencoding text and concepts from the formal model’s domain. This allows AutoRAG to explain the concepts it associates with the input and provide a graphical depiction from the formal expert-defined-model, enabling feasible text sanity checks. Thus, the symbols used in the form model are grounded in the AutoRAG large language model and vice versa. In the future, agents can use these techniques with reinforcement learning to better ground their models and computed actions into the real-world environment.

Dissertation Abstract: Taming Exact Inference in Temporal Probabilistic Relational Models

Marcel Gehrke [4]

The article argues that real-world processes involving agents are best modeled using a temporal probabilistic relational formalism. From an ontological perspective, this allows agents to reason about objects, their attributes, and the relationships between them, all within a temporal context. This approach also enables agents to handle reasoning under uncertainty. To illustrate this, the article presents the Lifted Dynamic Junction Tree (LDJT) algorithm as a temporal probabilistic relational inference technique, demonstrating its application in the analysis of an epidemic scenario. The key point made is that the ontological commitments, such as the representation of objects and relations, are necessary to avoid the pitfalls of combinatorial explosion that can arise in agent reasoning. Overall, the article illustrates how LDJT preserves groups of indistinguishable objects over, as it provides an expressive and stable high-level reasoning framework that can effectively model the complexities of real-world processes involving intelligent agents.

Lifting in Support of Privacy-Preserving Probabilistic Inference

Marcel Gehrke, Johannes Liebenow, Esfandiar Mohammadi, Tanya Braun [5]

The aim of privacy-preserving inference is to avoid revealing identifying information about individuals during inference. Lifted probabilistic inference works with groups of indistinguishable individuals, which has the potential to prevent tracing back a query result to a particular individual in a group. Therefore, this article investigates how lifting, by providing anonymity, can help preserve privacy in probabilistic inference. Specifically, it shows correspondences between k-anonymity and lifting and present s-symmetry as an analogue as well as PAULI, a privacy-preserving inference algorithm that ensures s-symmetry during query answering.