1 Introduction

As the field of catalysis and chemical engineering is quite diverse and research data are hardly ever disseminated in the catalysis disciplines, there are many drivers for increased interest in ontologies and respective higher data quality. For example, the need to create interfaces for generative AI for faster catalyst design, or interconnectivity of applications from laboratory experiments to process simulations. [1] As digitalization continues to advance, there is a constant need for methods that simplify and accelerate data processing by removing hurdles. A prominent approach in this domain is the FAIR principles [2], advocating for enhanced data processing through metadata utilization and standardized data structures. FAIR principles prioritize findability, accessibility, interoperability, and reusability, aiming to standardize data descriptions via metadata, thus facilitating improved data set reusability and critical data selection.

Moreover, discussions often revolve around leveraging ontologies and knowledge graphs alongside FAIR principles to enrich metadata value through structural organization. Ontologies serve as decentralized standards, simplifying data handling by employing descriptive logic-based world modeling, thereby avoiding excessive content intricacies. This work explores ontologies, for modelling reactions and catalysis, as a primary data description option, with further exploration of their benefits to follow.

Datasets described using ontologies and accessible in knowledge graphs are inherently interoperable and reusable at the processing level, benefiting from a uniform domain model. The extensive toolkit surrounding ontologies and knowledge graphs allows for data quality and integrity checks, including Shape Constraint Language (SHACL) shape validation [3], SPARQL queries [4], and consistency checks via inference engines [5].

Ontologies and knowledge graphs are increasingly integrated into the realm of research data management, offering significant advantages. Their knowledge modeling, akin to object-oriented approaches, simplifies both data processing and comprehension. Notably, tools like the Python package owlready2 [6] facilitate modeling ontologies and their semantic artifacts as objects within programming languages, enabling streamlined processing.

In computer science, particularly within the semantic web, ontologies serve to describe the real world in a machine-interpretable manner [7]. They conceptualize the world as a series of delineated mathematical spaces, governed by description logic and interconnected by rules. While maintaining semantic clarity, ontologies ensure human-readable interpretations through definitions, descriptions, and references.

However, the widespread adoption of ontologies faces challenges, primarily concerning their availability across different domains and the varying degrees of semantic richness within each application area. Discrepancies in ontology expressivity arise not solely from their developmental stage or domain specificity but also from factors such as intended application and the desired benefits of description logic [8]. For instance, applications range from simple thesauri employing Ontology Web Language (OWL) syntax [9] to complex ontologies incorporating additional description logic. Furthermore, while ensuring logical consistency is desired, the question of whether this validation should occur solely at the ontology level or also extend to the knowledge graph level remains unanswered.

This work not only explores the use of ontologies and corresponding metadata for describing data records but also delves into the direct utilization of OWL syntax for inference purposes.

Thus, it is appropriate to provide an introductory overview of description logic. Description logics exist in various forms [10], varying in expressivity and inferencing options, ranging from simple markup languages like RuleML [11] to more sophisticated ones like SWRL [12] or SPIN [13]. Among the most significant ones for the semantic network are OWL [9] and its associated SHACL. OWL and SHACL differ in their “world view,” based on either an open-world assumption or a closed-world assumption [14, 15], respectively. The former assumes that further knowledge, data, and relations can be introduced, whereas the latter assumes that all knowledge, and hence all data, is available. These assumptions are processed using different inference engines, taking into account not only the performance of description logic inference but also the type of description logic and the acceptance of mixed description logic [16].

Semantic artifacts, ranging from XML[17] to RDF[18] and OWL syntax, are interconnected and can be combined with related artifacts. For instance, axioms in an ontology can be written using SWRL rules, necessitating the use of an SWRL-compatible reasoner for inference [16]. It’s essential to note that ontologies inferred with different reasoners may not be compatible, especially when one reasoner lacks features present in another. To the best of the author’s knowledge, there are no ontologies present that model the process of chemical reactions with regards to reactants, products and catalytic components, in description logic, as also addressed in previous work [19]. As the application domain of the ontology presented in this work revolves around the modeling of chemical reaction networks in the context of catalysis research with ontologies, the following section gives a brief introduction to the topic.

2 Methods

2.1 Modeling Reaction Networks with Ontologies

Before delving into the description of a catalyzed reaction and its ontology-based data, it’s crucial to clarify a few fundamental aspects that serve as constraints for the modeling process, which will be recurrently addressed directly or indirectly in subsequent discussions.

In the context of the ontology presented here, efforts were made to establish robust connections with existing ontologies to leverage and interconnect with pre-existing knowledge. Simultaneously, the aim was to maintain the ontology’s adherence to factual information. This approach is essential not only for aligning with most top-level ontologies, such as the BFO [20], chosen for its extensive repository of reusable ontologies, but also for ensuring logical consistency.

This requires that all information contained within the ontology is at least based on empirical evidence. However, given the intangible nature of many concepts in catalysis research, certain compromises must be made. For instance, defining a reaction can be approached either as a single molecular interaction or as the aggregate of numerous such interactions. Thus, it’s crucial to address whether modeling reactions in ontologies is necessary and what benefits such modeling should entail.

Presently, several databases catalog reaction conditions, catalysts, and related information, albeit with limited automation. Researchers and digital agents typically obtain necessary information through targeted search queries, requiring manual or semi-automated evaluation by the researcher or programmer.

To address this challenge, the ontology aims to facilitate answering various competency questions more easily, such as “Which side reactions can I expect in a mixture consisting of my specified components?”, “Which of my materials have a catalytic effect for a reaction in my system?”, or “Which reactants cause which side reaction?”. Table 1 provides a list of the competency questions, allowing for both simplified classification of primary and secondary reactions and enhanced automated evaluations.

Table 1 Listing of the competency questions used to design the Reac4Cat ontology

Having outlined the competency questions the ontology seeks to address, the concept of a reaction within the ontology is focused. Given that an ontology delineates specific, individually defined mathematical spaces, it becomes necessary to define reactions both conceptually and mathematically/logically. The general understanding of an observed reaction — a mixture of reactants reacting over time under defined conditions to produce a product — is therefore used to define a reaction.

Furthermore, establishing the boundary for what constitutes a chemical reaction is crucial. To accommodate all reactions, including those involving often very low concentrations in biocatalysis, users are entrusted with the task of reaction selection. To facilitate this, the class hierarchy of the designed ontology is specifically structured in such a way that all information is modeled at the “data level” of the ontology using individuals.

With the framework of functions and objectives in place, a review of the ontology used is warranted before proceeding further. The RXNO [21] and its successor the MOP[22] serve as the foundation for describing chemical processes, while the ChEBI [23] ontology forms the basis for characterizing chemical substances.

All three mentioned ontologies belong to the OBO Foundry [24], utilizing the BFO as the top-level ontology and pursuit the guidelines of the OBO Foundry, for ontology development. Although the ontology, which will be presented here, incorporates semantic artifacts from the OBO community, such as relations from RO [25], it does not claim conformity with the OBO community guidelines [26]. This divergence primarily stems from the intention to integrate the concept into ontologies that may not strictly adhere to OBO standards or employ a different top-level ontology, such as the EMMO [27]. However, ChEBI, RXNO, and MOP are particularly valuable, as they already encompass a wide array of substance and reaction classifications. Consequently, there is no need to recreate these classifications independently when reusing them.

To elucidate the ontology’s workings further, an example based on a Haber-Bosch reaction is employed, illustrating how individual components within the ontology operate. The Haber-Bosch reaction is defined as the reduction of nitrogen and hydrogen to ammonia, catalyzed most often by nickel and iron.

Although this example provides only a simplified illustration, it demonstrates the ontology’s capability for more intricate modeling of reactions beyond those of explicitly named reactions. Notably, the ontology facilitates the representation of not only explicitly named reactions but also unnamed reactions and entire reaction groups.

For a description of a reaction within an ontology, it’s essential to note the relatively open definition of a reaction as “a process in which a mixture of reactants reacts partially over a certain period under defined conditions to form products”. This definition inherently implies a unidirectional process, given the constraints of OWL syntax, which primarily supports unary operations. Given the desire to assign various reaction types to an experiment, this must be modeled accordingly. To automate this process, “reaction roles” have been devised, which can be assigned to an experiment. For instance, if “Reaction_1” satisfies all conditions indicative of a Haber-Bosch reaction, it should be assigned the “HaberBoschReactionRole”. This role also encompasses specific information, which will be elaborated upon later.

As direct association of reactants to products in a given reaction experiment is often not possible, initially, a reaction experiment is modeled to indicate only the mixture subjected to the reaction, termed as the “EductMixture”, and the resulting “ProductSet” that could be measured. Given the limitations of the OWL syntax, which was not designed for complex calculations [28], critical component quantities are not considered and instead all measured and thereby modeled components as considered inside the logic. The terms “EductMixture” and “ProductSet” refer to individuals that can be categorized under the “ProcessMixture” class and are just related differently to the individual representing a reaction experiment. This modeling necessitates assigning individual components not directly to a reaction but to a mixture instead. This approach, representing a mixture as an N‑ary structure [29], prevents incorrect substance assignments in a reaction and facilitates the reuse of “EductMixture” and “ProductSet” as a “ProcessMixture” for measurement series, among other applications. Each substance individual can then be assigned to the respective classes as single individuals. For instance, “EductMixture_1,” representing the reactant mixture of “Reaction_1,” would include substances like “H2” and “N2”, denoted as individuals of the corresponding classes in ChEBI.

Fig. 1, illustrates parts of the class hierarchy, known as the Terminology Box (TBox), showcasing the classes and individuals utilized in the example. The upper section illustrates the reactant and product aspects of the reaction, while the lower left section relates to additional materials influencing the reaction. The lower right segment exhibits the reference to the “HaberBoschReactionRoleIndividual” along with additional relations indicating which components can catalyze a Haber-Bosch reaction.

Fig. 1
figure 1

Illustration of the terminology box of the reac4cat ontology containing the example individuals of a Haber-Bosch reaction, containing the most important classes in beige, example individuals in violet and relations as directed arrows between classes and individuals

To keep the illustration as clear as possible, more complex class axioms and rules are only shown as a dotted line, direct hierarchical assignments are shown with a labeled arrow and nested hierarchical assignments whose complete depth should not be shown are represented with an arrow with a double head. Relations that have an exact object property have the relation written directly on the arrows. Finally, the dashed arrow represents the relation “hasReactionRole” which is to be automatically inferred.

Automatically inferring that a reaction experiment embodies a specific role, thus representing a distinct reaction, can be accomplished through the utilization of a logic approach rarely employed in ontologies, known as left-hand-side logic. In this framework, rather than assigning a complex relation to a predefined class or individual, an object satisfying a complex axiom is designated with a class, individual, or straightforward relation. Because of its formulation, this technique is typically less prevalent in serialization formats like Turtle syntax (TTL) [30] or ontology editing tools such as Protégé [31], where it is referred to as General Class Axiom (GCA). Within this ontology, left-hand side logic is utilized to directly deduce [32], via a reasoner, that a reaction possessing all requisite reactants and products for a given role is indeed assigned that particular role. The intention of this is to avoid naming the precise concept, as the number of named reactions in context of catalysis as well as the number of substances that can be used to model a named reaction are quite high.

However, since the reactants and products are initially only linked to the actual reaction experiment via the “EductMixture” and “ProductSet” individuals, the “hasEductComponent” or “hasProductComponent” subproperty chain is used to link them if the concatenation “hasEductComponentMixture” followed by “hasComponent” or, respectively, “hasProductComponentSet” followed by “hasComponent” is possible. The left-hand-side logic can now check for all individuals (this should only be applicable for reaction experiments) whether they have all educts and products to be assigned to a respective reaction. In the Haber-Bosch reaction, for example, this would be “N2” and “H2” as reactants and “NH4” as product. This means that a reaction experiment that fulfills this left-hand side should have the relation “hasReactionRole” assigned to a “HaberBoschReactionRoleIndividual”. In Protégé, this GCA can then be written as:

(hasEductComponent some dinitrogen) and (hasEductComponent some dihydrogen) and (hasProductComponent some ammonia) SubClassOf hasReactionRole some ({HaberBoschReacRole_Ind})

Unfortunately, the inferencing cannot be written in a more generalized way here, as the open world assumption of the OWL syntax would not generate a unique inferencing. The curly brackets in the GCA indicate that this is not a reference to a class but to the individual “HaberBoschReacRole_Ind”.

This poses the challenge that all GCAs that are to show a similar structure must be brought into the ontology either manually or automatically. How this is realized will be discussed later in the context of automation.

Since catalyzing materials, as in the example of a Haber-Bosch reaction, are not always found in the reactants or products, it is also important to model catalyst and reactor materials as influencing materials. The example shown in Fig. 1 does not exhaust all possibilities for modeling catalyzing effects. Nevertheless, iron, as the material of the reactor wall, and nickel as the material in a catalyst sample are listed here. The individual “HaberBoschReactionRoleIndividual” already defines that nickel and iron can have a catalyzing effect on a Haber-Bosch reaction. To be able to classify a reaction as a catalyzed variant of itself, both the relation “isCatalizedBy” of the reaction role and a relation based on the reaction experiment itself must refer to one and the same substance individual. Similar to the “hasEductComponent” and “hasProductComponet” relations, the “hasCatalystSampleComponent” and “hasReactionVesselComponent” relations are set up for this purpose. All relations are integrated as sub-relations of the “hasReactionComponent” relation.

The relationships described are illustrated in Fig. 2, offering a simplified overview of the entities involved. Blue arrows represent the “hasReactionComponent” relations discussed earlier, while green arrows indicate additional relations inferred within the reaction experiment using GCAs. Determining which component functions as a catalyst can be deduced through an additional GCA. This GCA can be interpreted as follows: “If a reaction experiment includes a component that acts as both an effective reaction component and is linked to the experiment via the isCatalyzedBy followed by the hasReactionRole relation, then this component functions as an active catalyst.” In Protégé, for example, this relation might be expressed as:

Fig. 2
figure 2

Illustration of the intended knowledge graph structure after reasoning, expressed via the individuals used for the example of a Haber-Bosch reaction. The central class of “Haber Bosch” is given in beige, the individuals in violet and relations in colored arrows as indicated via the legend

(hasReactionComponent   some ({Sub_Fe})) and (isPotentiallyCatalyzedBy   some ({Sub_Fe})) SubClassOf hasCatalyst some ({Sub_Fe})

It’s important to note that explicit reference is made to an individual rather than using a generalization in the form of a class, as the OWL syntax prohibits passing variables from the left-hand side to the right-hand side. Therefore, only an explicit statement can establish this relation.

However, the “hasCatalyzed” relation enables the inference of each reaction experiment from its non-catalyzed counterpart to its catalyzed subclass. Therefore an additional GCA is used which can be written in Protégé as:

'Haber Bosch reaction' and (hasCatalyst some   (('material entity' or   'chemical entity')   and   ( inverse (hasReactionComponent)     some 'Haber Bosch reaction'))) SubClassOf 'catalysed Haber Bosch reaction'

This GCA checks whether a reaction experiment has a component individual connected via the “hasCatalyst” relation and if this component is also connected back to the reaction experiment via the “hasReactionComponent” relation.

By using the above-mentioned relationships, axioms, classes, and individuals, an ontology is built that is able to model the knowledge structures behind the aforementioned competency questions and answer them using, for example, simple SPARQL queries. To further showcase the use of this, a real-world example of a knowledge graph is combined with this approach.

2.2 Reaction in a Knowledge Graph for Process Simulation and Laboratory Data

As reaction networks can be modeled quite universally, they can also be used to model biocatalytic reactions. In the process industry, (bio)chemical processes involve the controlled manipulation and conversion of substances so that the reactions can be transferred from the laboratory to a larger scale and thus brought into widespread use. The reactor is a central part of these processes, typically requiring precise control of parameters like concentrations, or pressure to optimize the yield, purity, and efficiency of the process.

Developing new bioprocesses that integrate these biocatalytic reactions into industrial production processes is a complex task. To aid in this task, process simulators can be used to accelerate the development phase of such industrial processes, saving both time and costs. The open-source process simulator DWSIM [33] facilitates the desired computation of process streams and with it enables the user to model experiments before their execution in the laboratory. However, process simulation requires input from real-world experiments to calculate realistic results. In these real-world experiments, structured data uptake ensures FAIR research data integration. For biocatalytic experiments, the XML-based data exchange format EnzymeML[34] utilizes ontology classes from the Systems Biology Ontology (SBO)[35].

Thus, laboratory data was integrated into process simulation in previous work [36],utilizing both EnzymeML and DWSIM. Here, data from laboratory experiments regarding the process design of a biocatalytic redox reaction with Laccase was taken up in a flow reactor. Part of this data was recorded in spreadsheets based on EnzymeML[34], thus complying to the SBO. The pyEnzyme module of the EnzymeML-framework allows for direct import of the recorded data into Python objects[37], thus allowing for direct import of the SBO-related data contained in the spreadsheets. As not all concepts necessary to the description of flow (bio-)chemistry are described by the aforementioned spreadsheets, another one was set up for ease of data recording. The second spreadsheet mapped the data according to its object properties and several ontologies, like the metadata4ing [38] and the OBO Relation Ontology [25]. With this, the process of mixing two liquids and consecutive biocatalytic reactions in a flow reactor is described sufficiently by the concepts presented in this work and in [36].Hence, a partially automated workflow for integrating laboratory data into a process simulation using standardized ontological concepts is facilitated. Enzyme-catalyzed reaction data and data of process simulation results is parsed into an ontology-based knowledge graph. Fig. 3 shows an excerpt of the resulting knowledge graph revolving around the reaction and process streams of the process simulation.

Fig. 3
figure 3

Excerpt of the knowledge graph presenting the individuals of the process flow diagram with streams and unit operations, as well as the reaction. In the knowledge graph, the reaction is classified as “Biochemical Reaction”. Using the modeling of reaction networks with ontologies, the individual “ABTS Oxidation” will be classified as “Redox reaction with Laccase”

The class “Biochemical Reaction” is assigned as a general class of the specific reaction taking place in the reactor. As this reaction is a “Redox reaction with Laccase”, the approach presented in the previous section is applied to the knowledge graph presented here, to assert a more detailed classification of the reaction. To achieve this, the Reac4Cat ontology is imported manually into the knowledge graph using the ontology editor Protégé. The necessary GCAs are then implemented automatically via a Python code, to help future automation of this method. Finally, to accelerate reasoning, the knowledge graph is stripped from unused classes that are included by ontology imports using the OBO ROBOT TOOL [39]. With this automated workflow, the semantic implementations of the modelling of reaction networks with ontologies are coupled with a real-world knowledge graph on laboratory and process simulation data.

3 Results

The Reac4Cat ontology presented in this work, the knowledge graph of the laboratory and simulation data, and the code to automatically implement the necessary GCAs with Python are found in the GitHub repository at https://github.com/AleSteB/Reac4Cat.

3.1 Inferring Knowledge of Reaction Networks

In the context of ontologies, it is of course interesting to generate a representation of the world that is as complete and error-free as possible, but this usually comes with high costs in aspects such as required computing capacities, extensibility of the model and intuitive understanding. Therefore, one usually limits oneself to a simplified representation of the world and prefers to check whether it is logically error-free and simple to implement. For this reason, let’s take another look at the ontology with a somewhat larger data set. In the ontology provided with examples [40], two reactions can be found in catalyzed and non-catalyzed form, as well as permutations of reactant mixture compositions and different product mixtures. This ontology (with 844 axioms) needs 718 milliseconds with the reasoner HermiT [41] to perform all inferences. To test whether any inferencing problems occur, several permutations were set up from the examples. Thus, Reaction_1 should represent a Haber-Bosch reaction, Reaction_2 both a catalyzed Haber-Bosch reaction and a catalyzed methanation reaction, Reaction_3 and Reaction_4 a regular methanation reaction, and Reaction_5 again a catalyzed methanation reaction. Similar permutations were also carried out in the reactants and products, for example, to test whether a catalyst can also be present in the product or reactant. An example excerpt from these consistency tests is found in Fig. 4, which shows an excerpt from the Protégé software. The axioms highlighted in white and written in bold are asserted relations, while the axioms highlighted in yellow and written in thin type are axioms independently inferred by the ontology. Fig. 5 shows the knowledge graph created in the reasoning step with the most relevant relations. In order not to overfill the representation with relations, some were intentionally hidden as can be seen in the legend on the right. As can be seen from the relation “hasCatalyst” between the individual “Reaction_2” and the individuals of the substances iron and nickel, this was correctly inferred and thus, as can be seen in Fig. 4, the “Reaction_2” was also correctly classified as catalyzed methanation and catalyzed Haber-Bosch reaction.

Fig. 4
figure 4

Excerpt from the ontology visualization tool Protégé after reasoning with HermiT.Asserted relations are given in bold font while inferred axioms are highlighted in yellow

Fig. 5
figure 5

Excerpt from the Knowledge Graph created by reasoning via HermiT, depicted via the ontology visualization tool OntoGraf in Protégé

3.2 Application of Reac4Cat on Process Simulation-related Knowledge Graphs

To show the benefit of the Reac4Cat ontology, it is implemented on the knowledge graph containing laboratory and process simulation data of Laccase-catalyzed red-oxidation. With this, the knowledge graph is refined and the reasoning of the ontology leads to new inferred axioms, thus classifying the individuals of reactions accordingly.

Besides the conceptual work done to implement the semantics as presented, Python code was generated to automate the creation of the necessary GCAs to ease the process of the ontology creation.

Focusing on the classification of the reaction individuals, Fig. 6 shows an excerpt of the inferred knowledge graph with HermiT, which took 360.7 seconds to infer. Here, the relation “hasReactionRole” is assigned to the reaction individual, pointing to the correct reaction role of a redox reaction with Laccase. This helps to automatically classify data in a knowledge graph with regards to specific reactions that took place in a reactor.

Fig. 6
figure 6

Excerpt of the inference on the reaction individual, showing the benefit of the posed GCAs by classifying the reaction to be a “indv_redox reaction with Laccase”

3.3 Limitations of the Current Reaction Model

After outlining the structure of the ontology, it is crucial to briefly address its known limitations. Most of these limitations stem from the inherent constraints of OWL syntax, primarily its predominantly unary Description Logic nature and the restricted capacity to handle complex mathematical expressions. Reactions and catalysts, for instance, are conceptual entities that manifest only under specific environmental conditions with can only suitably be described with math.

While there exist methods to model such environmental conditions, they often entail a significant increase in the number of axioms, leading to heightened computational demands [42]. Given that the modeling of reactions within this ontology is already axiom-intensive, incorporating additional conditions could potentially outweigh the benefits. Additionally, these mathematical expressions [28] would currently require the addition of other logic syntaxes such as SWRL or more sophisticated reasoning engines, and would thereby interfere with some reasoning engines[16]. While reasoners exist that can infer this compounded description logic, ways of how to model mathematical relations, which are quite important for limiting reactions and catalysts to certain reaction conditions, are not listed here.

Furthermore, it is essential to reassess the fundamental aspects of modeling. Since inference engines operating with OWL syntax can only identify logical loops to a limited extent, explicitly setting up ring closures is not advisable. This limitation poses a challenge, especially in the context of complex reactions, where a reaction system may consist of multiple sub-reactions exhibiting cyclic behavior. The issue is exemplified by simple equilibrium reactions, which inherently entail ring closures. Consequently, modeling subsequent reactions becomes problematic, as they can also result in ring closures. Many useful modeling options thereby cannot be effectively represented.

Lastly, the foundational principle that the ontology should only store factual information will be re-discussed. It becomes apparent upon close examination that certain aspects, such as intermediate reactions, elude effective modeling, implying inherent limitations to the factual data that can be represented. Consequently, users intending to construct a knowledge graph using this ontology bear the responsibility to include information adequate for their current level of detail.

4 Summary and Outlook

Different goals can be achieved with the ontology and the code provided for it. On the one hand, a knowledge graph can be created for reaction and catalysis research that helps researchers answer their questions quickly and easily, similar to the competence questions described. However, the ontology can also be used in automation processes or the semantic network for the interaction of and with digital agents.

The use of the ontology and its application of left-hand-side logic show great potential to simplify automation but also to enable digital process intensification through extensions to adjacent domains. Domains into which these semantic structures can be introduced are, for example, experiment planning or the modeling of chemical processes. Extensions could include, for example, a chemical unit operation ontology that automatically suggests separation processes and the associated media and process conditions.

As shown by implementation of Python codes, the left-hand-side logic and its GCA can be introduced automatically, which could find application in future use of the semantics. However, since large knowledge graphs can currently still be very computationally intensive, it makes no sense to set up a single knowledge graph for the whole area of reaction and catalysis. Consequently, it may not be advisable to establish a singular knowledge graph encompassing the entire realm of reaction and catalysis. Instead, by tailoring specialized knowledge graphs to specific domains within catalysis, such as biocatalysis, computational resources can be optimized. For instance, a biocatalysis-focused knowledge graph may choose not to include modeling of heterogeneous catalysts to conserve computing capacity. Some of the explicit GCA could be introduced into the ontology via a combination of semantic artifacts, for example by creating merge-able graphs via SPARQL queries or by inference via SHACL rules.

In future, this approach could be used in connected databases such as DataVerses to enhance the value of the stored data by automated ontology-based classification. Furthermore, querying of the resulting knowledge graphs would be enhanced, as the GCAs implement relations in a structured way. To elevate the use of the presented approach even more, the implementation of LinkML [43] to the data uptake should take place. This directly maps the data to ontology concepts, streamlining the overall data workflow even more. Finally, this would enable for automated classification of metadata in the realm of catalysis and reaction engineering.