1 Introduction

1.1 Caenorhabditis elegans as a model for systems neuroscience

A major goal of modern neuroscience is to explain the relationship between environmental inputs and complex behaviors in terms of the properties of their underlying neural systems. C. elegans has been a productive model for neuroscience due to its wide range of easily measured behaviors, genetic tractability, and highly stereotyped anatomy. The function of individual C. elegans neurons has been studied by a variety of methods, including selective neuron ablation, either with laser microbeam irradiation [8, 9, 18, 49] or genetically encoded cell killing [34, 65]. These physical studies, complemented by genetic screens resulting in mutant animals with distinct behavioral and neuronal phenotypes, have implicated specific neurons in behaviors [7] and identified genes and neurons required for responses to environmental or pharmacological inputs [72]. Technological advances, such as cell-specific application of optogenetic and chemical perturbations [36, 55] in combination with calcium imaging of individual neurons [22], have begun to outline the causal relationships between neurons, both locally and via long-range connections [60], while calcium imaging allows the effect of physical inputs on neural activity to be determined. Thus, causal relationships can be traced from inputs through neural circuits to behavior. In addition, traditional molecular genetic methods enable the biochemical basis of these causal relationships to be elucidated. Understanding molecular participants is particularly important for the functional description of extra-synaptic connections, because they cannot be described by anatomy or gene expression alone, yet they exert powerful effects on neuronal activity [6, 11, 51]. In combination, the physical and molecular data allow detailed description of C. elegans neural circuits underlying particular behaviors.

1.2 The GO–CAM framework can be used to represent causal relationships in biology

Given the volume of biological knowledge, a method to integrate diverse types of data into causal models of biological systems, expressed in a common, machine-readable language, is highly desirable. A promising method suitable for this application has been developed. The Gene Ontology (GO) Consortium has created a semantic modelling framework for annotating causal relationships between molecular activities in the context of functional gene annotation, known as GO–CAM (Gene Ontology Causal Activity Modelling) [70].

Semantic models (also known as knowledge graphs) are machine-readable representations of knowledge in a given field, in which the edges of the graph describe the logical relationships between entities that comprise a field of study. In GO–CAM, curated knowledge of gene functions annotated using the Gene Ontology and other biologically relevant ontologies are used to create activity flow models of biological systems (Fig. 1) [44]. In these graphs, the logical relationships are described via a formalism known as a semantic triple (subject–predicate–object). Footnote 1 These models can be thought of as compositions of assertions in the form of semantic triples. For instance, the assertion “[G-protein coupled receptor activity (GO: 0004930] has input [2-heptanone (CHEBI: 5672)]” is a semantic triple that could be included in a GO–CAM. The semantic triple format allows edges to connect many different kinds of entities, including anatomy terms and biological processes. For instance, “[glucose-6-phosphate isomerase activity (GO: 0004374)] part of [canonical glycolysis (GO: 0061621)] occurs in [cytosol (GO: 0005829)]” is a pair of semantic triples that connects a GO molecular function to both a higher level biological process and an anatomical compartment. The Gene Ontology itself follows a hierarchical structure described with semantic triples, e.g., “[G-protein coupled receptor activity (GO: 0004930)] is a [transmembrane signalling receptor (GO: 0004888)]” (here the relation ‘is a’ describes a child–parent relationship). This formalism allows different kinds of entities to be connected to one another in a machine-readable format, allowing combinatorial queries and other computational analyses.

Fig. 1
figure 1

Standard GO annotations and GO–CAMs. A Standard GO annotations link genes to GO Molecular Functions, GO Biological processes or GO Cellular Component terms. B Partial GO–CAM of canonical glycolysis. Gene Ontology–Causal Activity Models (GO–CAMs) arrange GO annotations into structured models of biological processes by causally linking GO Molecular Functions that make up a process. Edges represent relations which may connect nodes according to the GO–CAM data model

In GO–CAM, curated knowledge of gene functions annotated using the Gene Ontology and other biologically relevant ontologies are used to create knowledge graphs of biological systems (Fig. 1). This framework extends traditional gene function annotation by capturing the causal flow of molecular activities, e.g., protein kinase activity or ion channel activity, using causal relations from the Relations Ontology (RO) and representing these interactions in the context of the relevant biological process and anatomy [63] (Box 1). These causal networks allow more in-depth computational analyses of a system than a set of stand-alone associations between genes and ontology terms, and have the potential to bridge the gap between biochemical and anatomical networks. Here, we explored whether the causal GO–CAM framework can enable the representation of the causal relationships between environmental inputs, neural circuits and behavior at varying levels of detail.

1.3 CeN–CAM: GO–CAM representation of C. elegans neurobiological knowledge

As for standard GO annotations, assertions in a GO–CAM are supported by evidence statements, ideally experimental evidence from the published literature [3, 30, 69]. To adapt the GO–CAM framework for modelling neurobiological statements about C. elegans egg-laying and carbon dioxide (CO2)-sensing behaviors, we selected a subset of relevant papers from the C. elegans bibliography and identified author statements that could be used to support construction of semantically rigorous, causal models. For the egg-laying circuit, these statements largely involve interactions among interneurons, motor neurons, and the egg-laying apparatus, e.g., vulval muscles and epithelia. The CO2 avoidance circuit is focused on sensory neurons, their interaction with the environment, and subsequent effects on locomotory behavior.

2 Materials and methods

To model neurobiological processes, we began by collecting author statements from published references. To ensure that our findings were broadly applicable, we collected statements from the literature on two circuits, one centred on interneurons and motor neurons (egg-laying) and one centred on sensory neurons (CO2 avoidance). For the egg-laying circuit we compiled 20 papers, and for CO2 avoidance, 8 papers. We chose statements manually, according to a few criteria. To begin with, we chose statements that provided a clear interpretation and that we, therefore, expected to be straightforward to model with GO–CAM. Later, we selected statements describing phenomena (e.g., multi-sensory integration, neuromodulation) that were missing from the initial data set.

We defined an author statement as text describing: (i) either an experiment or hypotheses, (ii) an experimental observation or result, and (iii) a clear biological interpretation of the result. These typically comprised a paragraph. We then attempted to model the interpretation, along with supporting evidence using the Evidence and Conclusion Ontology (ECO) [30] wherever possible. We avoided modelling speculative suggestions that went beyond the supporting evidence.

For each author statement, we attempted to generate one or more simple assertions (i.e., semantic triples or subject–predicate–object) that accurately modelled the author statement using classes from biological ontologies (Table 1), including the GO [3, 69], the Chemicals of Biological Interest ontology (ChEBI) [35], the Environmental Conditions, Treatments and Exposures Ontology (ECTO) [19], and the C. elegans Cell and Anatomy Ontology (WBbt) [45]. In a semantic triple, these classes are connected by relations from the Relations Ontology [63] (Box 1).

Table 1 Biological ontologies used to generate CeN–CAM models

We collected author statements and their corresponding semantic triples into a dataframe, such that the triple representation can be read from left to right (unless otherwise specified). Additional file 2: Table S1, Additional file 3: Table S3 provide the full list of author statements that were modelled for the egg-laying (91 unique statements comprising 128 entries from 20 papers) and CO2-avoidance circuits (59 unique statements comprising 99 entries from 8 papers), respectively. Table 2 enumerates detailed categories of biological phenomena captured by this approach. We used this categorization process to determine whether existing ontologies contained a sufficiently rich set of classes and whether existing RO terms were adequate to describe the relations between classes. Where applicable, we generated definitions for required novel classes and their necessary parents (Table 3). We then created illustrations of several useful examples.

Table 2 Categories of neurobiological phenomena modelled with GO–CAM
Table 3 Definitions and classification for proposed new GO classes

In generating our empirical models, we sought as far as possible to ensure that all relations followed the conventions of the GO–CAM data model. Namely, two GO Molecular Functions can be linked by causal relations, whereas a GO Molecular Function (MF) and a GO Biological Process (BP) are linked by mereological relations (e.g., part of). In addition, two BPs can be linked by mereological relations when one BPs is part of another BP (i.e., a subprocess of the other). We also found it necessary, in some cases, to link distinct BPs using causal relations to accurately describe the complexity of the biology. For instance, one neuron activating another via optogenetics can be modelled by a membrane depolarization process causally upstream of another membrane depolarization process (e.g., Fig. 4A). We sought to include whichever MFs or BPs were implied by an author statement, even if the gene was missing, or the BP was not explicitly discussed, to denote missing information. We chose the most specific relation or GO term that we felt was justified in the circumstances. For instance, when modelling individual author statements, we used causally upstream of, but when modelling compilations of statements from separate papers, we were able to use the child term positively regulates. In generating our generic template models, we chose the highest level relations and GO terms that could reasonably represent a given statement category.

3 Results

3.1 CeN–CAM: GO–CAM provides a framework to model neurobiological statements

As a first step in converting information from the scientific literature to a causal model using the GO–CAM framework, we created semantic triples to represent author statements (Additional file 2: Table S1 and Additional file 3 Table S2). As an example, a statement by Banerjee et al. describing the results of an optogenetic experiment that activates membrane depolarization in uv1 neurons shows that the uv1 cells control the duration of egg deposition during egg-laying behavior. We created a semantic triples to represent this finding: [membrane depolarization (GO: 0051899)] occurs in [uv1 (WBbt: 0006791)] part of [negative regulation of egg deposition (GO: proposed)] part of [egg-laying behavior (GO: 0018991)] (Additional file 2: Table S1, local identifier EL12). In creating triples for 123 egg-laying and 98 CO2 avoidance author statements, we found that the set of relations used in the GO–CAM data model were sufficient to model all author statements in our data set. However, we required new classes in several other ontologies (the Gene Ontology (GO), the Evidence and Conclusion Ontology (ECO), and the Environmental Conditions, Treatments and Exposures Ontology (ECTO) [19]) to describe some statements in both data sets (25/123 statements in the egg-laying data set, and 84/99 in CO2 avoidance) (Table 3). These results show that author statements describing C. elegans neurobiology can be faithfully captured using the framework of the GO–CAM data model.

We also found it necessary to re-evaluate some existing definitions and classifications of biological processes under the GO class behavior (GO: 0007610). For example, since the primary term oviposition is a subclass of reproductive behavior (GO: 0019098) in GO and oviposition can be used to describe both the entire behavior of egg laying and to describe the actual deposition of an egg onto a substrate, we requested to switch the primary label of oviposition (GO: 0046662) with the GO synonym egg-laying behavior. We also requested a refinement of the definition of egg-laying behavior to ‘A reproductive behavior that results in the deposition of eggs (either fertilized or not) upon a surface or into a medium, such as water’. In addition, we created a new term egg deposition (GO: 0160027), defined as ‘The multicellular organismal reproductive process that results in the movement of an egg from within an organism into the external environment’. In this way, the mechanical process of egg deposition is clearly distinguished from egg-laying behavior, which includes its regulation by the nervous system. We requested new terms for the positive and negative regulation of egg deposition, defined as nervous system processes. In addition, we proposed definitions for new classes required to describe CO2 avoidance, including carbon dioxide avoidance behavior and its parent behavioral response to carbon dioxide (Table 3).

Many statements describe findings from genetic perturbations, implicating specific pathways, whereas others, such as cell ablation, leave open a variety of genetic mechanisms by which a phenotype is manifested. Here, we describe the use of different relations and processes to refine models according to the range of conclusions available in each case.

3.2 Statement category: linking neurons, cellular and molecular processes, and behaviors

Fully elucidating functional neural circuits requires an understanding of the cells (e.g., neurons and muscles) involved in the behavior, the molecular basis of the behavior (e.g., the relevant gene products and their activities), and the coordinated relationships among them to affect the behavior. As with all biological processes, however, the full understanding of a neural circuit and a behavior is produced from individual, granular observations that, together and over time, combine to complete the picture. Leading up to a complete understanding, we need to also have the ability to represent the current state of knowledge at the organismal, anatomical, cellular and molecular level. Thus, in our first category of statements, we aimed to capture atomized statements that link cells and genes to cellular and molecular level processes and those processes to a specific behavior.

A traditional experiment for linking neurons to behavior is to ablate a neuron of interest and observe behavioral effects, an experiment that gives us information at the cellular level [18]. When an ablation results in a behavioral change, it is interpreted that one or more processes (either in series or in parallel) occurring in that cell has a causal effect on the behavior (Table 4A). Since cell ablation disrupts unknown cellular processes, we chose to model this result using the high level GO biological process term cellular process (GO: 0009987), and the occurs in (BFO: 0000066) relation to contextualize the cellular process with respect to the ablated neuron. We then used the children of the broader causal relation causally upstream of or within (RO: 0002418) (or preferably a positive (RO: 0004047) or negative (RO: 0004046) effect child term) to tie the cellular process to a nervous system process (GO: 0050877). In the example shown in Fig. 2A (corresponding to the statement in Table 4A), this nervous system process corresponds to the Biological Process term positive regulation of egg deposition. We used the part of relation in cases, where more specific perturbations were made (e.g., neuronal activation or inhibition, genetic knockouts and rescues), allowing an assertion about the composition of the processes involved.

Table 4 Author statements collection A
Fig. 2
figure 2

CeN–CAM annotations link cells to behaviors. CeN–CAM annotations link cells to behaviors. Cells such as HSN are represented with unique identifiers in the C. elegans Gross Anatomy Ontology. A Cell ablation phenotypes can be modelled using the generic GO cellular process class to reflect the non-molecular nature of the experiment, and causally upstream of or within relations, allowing for the most inclusive description of the relationship between cellular process and nervous system process terms. Example drawn from Waggoner et al. [72] (Table 4A, this manuscript). B CeN–CAM model describing the role of the acetylcholine biosynthetic process in the VC neuron in the regulation of egg-laying [5] (Table 4B, this manuscript). Because the acetylcholine biosynthetic process can proceed independently of electrical activity, it is modeled as causally upstream of, positive effect, acetylcholine secretion, neurotransmission. C Optogenetic activation of uv1 leads to a decrease in egg-laying [4] (Table 4C, this manuscript). This membrane depolarization process is modelled as part of the negative regulation of egg deposition, a nervous system process

For an illustrative example of this distinction, it is useful to consider experiments from our collection that generated insights by deletion and cell-specific rescue of genes involved in neurotransmitter biosynthesis. We reasoned that since the biosynthesis can proceed even, while the neuron is at rest (i.e., independent of the induction of behavior), it should not be considered part of the asserted nervous system process, but causally upstream of, positive effect (RO: 0002304) (Fig. 2B, Table 4B) to a secretion process that is part of the nervous system process (on the assumption that this secretion depends on the depolarization of the neuron).

A more recent experimental technology for discerning the effect of neurons on behavior is optogenetic activation. In these experiments, a specific neuron is activated by opening the light-sensitive Channelrhodopsin ion channel, transgenically expressed in specific neurons of interest [33]. We modelled these results similar to cell ablation, except that in this case, we were able to say that the membrane depolarization that occurs in a specific cell is part of the nervous system process (in this case, the negative regulation of egg deposition) that regulates egg-laying behavior (Fig. 2C, Table 4C).

3.3 Statement category: inputs to neural activity and behavior

A second category of experiment provides insight into the molecular basis of behavior or neural activity induced by an environmental or internal stimulus. In this type of study, a behavior or neural activity that is typically induced by some environmental or experimental (i.e., pharmacological) condition is eliminated under the same conditions when a gene is inactivated. The gene activity is often tied to a cell via rescue of a behavioral mutant phenotype by cell-specific expression of the wild-type allele in the loss-of-function background.

In these cases, we can tie the rescue gene functions to cells (e.g., in Fig. 3A, [G protein-coupled serotonin receptor signaling pathway (GO: 0098664)] occurs in [VM (WBbt: 0006917)]), and to implied GO biological process terms via part of (e.g., in Fig. 3B [intracellular receptor signalling pathway (GO: 0030522)] part of [positive regulation of negative chemotaxis (GO: 0050924)]. In contrast to the case of cell ablation, where unknown cellular processes are disrupted, these more specific biological or cellular process terms can in turn be assigned as part of the nervous system process. Additional ontology terms and relations can be used to further specify processes or functions. For example, the Chemicals of Biological Interest ontology (ChEBI) contains neurotransmitter classes [e.g., serotonin (CHEBI: 28,790)], as well as environmental chemicals (e.g., carbon dioxide (CHEBI: 16,526) which may be linked to GO receptor activities or other GO molecular functions via has small molecule activator (RO: 0012001) (Fig. 3A, Table 4D, Fig. 3B, Table 4E).

Fig. 3
figure 3

CeN–CAM models of inputs to neurons and behavior. Cell-specific genetic rescue of a behavioral response to pharmacological treatment or environmental stimuli produces models linking genes, GO molecular functions, GO biological processes, and cells in the C. elegans Gross Anatomy ontology. A Carnell et al. [15] (Table 4D, this manuscript) found that VM-specific expression of ser-1 could rescue serotonin-dependent egg-laying behavior, suggesting that ser-1 is required in VM neurons to induce egg-laying in response to serotonin. The G-protein coupled serotonin receptor activity is part of the positive regulation of egg deposition, because the part of relation is transitive (i.e., there is no need for an additional part of relation connecting these nodes). B [16] (Table 4E, this manuscript) found that npr-1 expression in neuronal subsets that include URX is sufficient to rescue the behavioral response to carbon dioxide. The activity of some CO2 receptor is implied, leading to the addition of a placeholder term without an enabling gene, indicating an important piece of missing information. This activity can be included in the CO2 sensing circuit by asserting that it is part of the positive regulation of chemotaxis, along with the npr-1-dependent signaling pathway. C AFD neuron responds to CO2 removal [13] (Table 4F, this manuscript). Currently, there are no terms within appropriate ontologies to describe temporal features of chemical or physical inputs (e.g., ‘decreasing’). The required definitions are suggested in this paper

In some cases, the response to a stimulus is measured in a neuron without knowledge of the receptor molecule. For instance, AFD neurons respond to removal of CO2, but the experiment does not identify the receptor molecule [13] (Fig. 3C) (Table 4F). Because the receptor molecule is unknown, a rescue experiment cannot localise the receptor activity to a cell, meaning that the response may depend on receptor activity in another neuron. This is indicated by the absence of a relationship between the receptor activity and a neuron (similarly, a gene knockout experiment that disrupts neural activity without cell-specific rescue would tie only the membrane depolarization GO term to the neuron). These examples also demonstrate the use of a nervous system process term as an intermediate between the cellular process terms and the behavior terms. For instance, in our model of the role of npr-1 in the carbon dioxide sensing circuit, a CO2 receptor activity is implied, but not tied to a gene or cell (Fig. 3B). However, the nervous system process term (positive regulation of negative chemotaxis (GO: 0050924) provides a natural point of integration by which the receptor activity (and by implication, the cell in which it acts) can be included as part of the same neural circuit. A representative GO–CAM model can be found here.Footnote 2

3.4 Statement category: neuron-to-neuron functional connectivity

An additional type of information necessary for fully modelling neural circuits and behaviors is the functional link between neurons. We were able to model statements describing functional connectivity between neurons. For example, an optogenetic experiment in which one neuron is depolarized by a light stimulus and electrical currents are recorded in another neuron may show how a membrane depolarization process occurring in the upstream neuron results in a subsequent membrane depolarization process in the downstream neuron. To capture this relationship, we can connect two membrane depolarization (GO: 0051899) processes to one another with the causally upstream of, positive effect relation (Fig. 4A, Table 4G).

Fig. 4
figure 4

CeN–CAM models of neuron-to-neuron functional connectivity. A Optogenetic activation of HSN neuron causes membrane depolarization in HSN [25] (Table 4G, this manuscript). B Inhibiting neuron–neuron synaptic transmission in VC causes increased activity (membrane depolarization) in HSN, suggesting inhibition of HSN dependent on synaptic transmission from VC [42] (Table 4H, this manuscript). C Model of the mechanisms involved in RIA activation by BAG, based on data from Choi et al. [21] (Table 4J–K). Blue boxes represent a GO cellular component term. D Alternative, more basic model of the same statement modelled in Fig. 3C (Table 4F). Figure 3C represents the preferable method

GO also contains classes sufficient to indicate that the transmission occurs through a synapse, when this is explicitly tested by authors. For instance, Kopchock et al. [42] showed a synapse-dependent inhibitory connection between HSN and VC, using tetanus toxin to perturb synaptic transmission. This could be modelled using the GO term chemical synaptic transmission (GO: 0007268) or one of its children, and the causally upstream of, negative effect (RO: 0002305) relation to describe the inhibition (Fig. 4B, Table 4H). A similar representation would be appropriate for an experiment describing increase or loss of activity from a recorded neuron in mutants defective for synaptic transmission via mutation of unc-13 (encodes Munc13), which is required for synaptic vessel exocytosis [58]. In contrast, mutation of unc-31 (encodes CAPS), which disrupts dense-core vesicle exocytosis, is required for extra-synaptic transmission [64] Footnote 3GO does not have an explicit term for extra-synaptic signaling, or neuropeptide ligand activity. We include an example representation for an extra-synaptic peptidergic connection between two neurons (Additional file 1: Figure S2C), and provide a definition for the required new GO classes (Table 3). Finally, we include an example that illustrates how CeN–CAM models can represent sub-cellular phenomena involved in neuron-to-neuron functional connectivity in molecular detail (Fig. 4C) (Table 4J–K). This model compiles findings from Choi et al. [21], who use the connection between RIA and BAG neurons to investigate mechanisms by which neurotransmitters are loaded into synaptic vesicles.

3.5 Generic data models for statement categories

In modeling author statements, we found it possible to construct models with varying levels of detail, e.g., cell types, gene products, etc. For instance, Fig. 4D represents a ‘minimal model’ of the same statement described in Fig. 3B, representing the rescue of CO2 avoidance by expression of the npr-1 gene in URX. We sought to provide a set of standards for the ideal model of a given category of experimental finding. In our view, a satisfying model will have a structure that corresponds to the conceptual framework of the field (here, the causal flow from inputs to circuits to behavior), and will explicitly illustrate missing knowledge. By modelling the biology that results from different categories of experimental studies, we were able to produce such generic data models for every category (Additional file 1: Figures S1, S2). In these models, the availability of GO terms and RO relations is constrained by parentage, i.e., only the generic term in the model or one of its children should be used. Importantly, the models are intended to be flexible, i.e., editable using the Noctua GO–CAM modelling software [70]. In particular, high-level cellular process and nervous system process terms can be attached to as many GO molecular functions and genes as required to represent the biology. These generic models could accommodate results from both the egg-laying and CO2 circuits, suggesting that they may be more broadly applicable to C. elegans neurobiology. These models can serve as useful starting points for researchers or biocurators to generate representations of the experimental results, with minimal prerequisite knowledge of the underlying data model.

3.6 GO–CAM can model neural circuits

Systems neuroscience seeks to understand the causal relationships between neural circuits, the behaviors they control, and the inputs that stimulate these circuits, in molecular detail. Having established that a wide variety of author statements describing neurobiological knowledge can be represented in semantic triples, and describing the required GO classes, we generated a model that captures some of the causal relationships within a single circuit. This graph represents interactions between four of the cells that influence egg-laying behavior, from a limited subset of statements in our collection (Fig. 5).

Fig. 5
figure 5

CeN–CAM representation of several cells in the egg-laying circuit and their interactions. Drawn from several statements in Additional file 2: Table S1 [1, 15, 25, 28, 42]. Compared with models of individual author statements, more specific relations can be used here, given the biological context (for instance positively regulates, rather than causally upstream of, positive effect)

Though this diagram does not contain all cells, or all known connections that contribute to egg-laying, it illustrates several useful features of using the GO–CAM framework to model this biology. For instance, the influence of AWC in the circuit is connected to the rescue of HSN inhibition through AWC-specific expression of tax-4 [28]. This presumably involves chemical output from AWC that depends on its electrical activity; however, the author statement does not assert this specifically. Similarly, the serotonin synthesized in HSN is likely to be causally involved in the activation of VC, via activity-dependent release into the synapse connecting these two neurons, but this has not been demonstrated directly—only that exogenous serotonin can substitute for the absence of HSN, where there is evidence for tph-1-dependent serotonin biosynthesis [74]. Finally, we used two nodes to represent serotonin, because it allows the possibility that the HSN-VC serotonergic connection may be synaptic, while the HSN-VM connection is extra-synaptic. Thus, CeN–CAM models can represent causal flow within anatomical networks in molecular detail, at the level of what is known, supported by statements in the published literature, and as a result, also indicate what knowledge is missing.

In addition, we show that it is possible to use more informative relations in the context of a model that integrates various findings from the egg-laying literature, compared to those used to model individual author statements. In the case of representing author statements, our models were restricted to the use of information contained in those statements. Here, in the larger CeN–CAM model, we are able to use relations that reflect an overall interpretation of the biology, such as positively regulates (RO: 0002213) (a child of causally upstream of, positive effect) to describe interactions between processes in different neurons.

3.7 GO–CAM can model simple circuit phenomena

Many studies of neural circuits investigate the mechanistic basis for information processing capabilities in the brain, such as the integration of inputs from multiple sensory modalities, and changes in behavior that depend on memory of past experience. We extended our modelling efforts to represent some of these findings, primarily from our CO2 avoidance behavior data set.

3.7.1 Context-dependence and multisensory integration

An important function of nervous systems in any organism is the ability to execute behavioral responses in a context-dependent manner. This requires integrating multiple kinds of environmental information, ‘computing’ on that information and eliciting an appropriate response. This integration may commonly be performed either by individual neurons responsive to multiple inputs, or by small circuits of three or more neurons, e.g., single interneurons that integrate input from multiple sensory neurons [29]. Capturing this type of integration requires relations that imply the necessity of multiple conditions toward a single response, sometimes referred to as AND logic.

We found a relevant example in our CO2 avoidance data set. In one study, tax-2-dependent rescue of CO2 avoidance was found to depend on the presence of food [13] (Table 5A). We considered whether any of the GO–CAM relations can be interpreted as conveying necessity, in particular the relation part of. When considering processes, such as those represented in a model, if one process is part of another process, then the latter process necessarily has the former process as a part (or subprocess), meaning that in these contexts part of and has part (BFO: 0000051) are inverse relations [63]. Figures 6A–C shows how the necessity for AND logic might be modelled. The cell-specific rescue of CO2 avoidance via tax-2 expression in BAG neurons, along with the inferred CO2 receptor activity, constitutes one ‘branch’ of the model. A second ‘branch’ represents the involvement of food, via an inferred signal transduction (GO: 000716) process. These two branches converge on a proposed GO term signal integration process via part of relations, capturing their joint necessity. We chose to include a new GO biological process for their integration (rather than having them converge on positive regulation of CO2 avoidance) to represent that the mechanism enabling the AND logic should be asserted. This representation leaves open many possible biological models for the mechanism by which the asserted integration might occur (for example, one in which food and CO2 are sensed by distinct sensory neurons, and integrated in a third interneuron) while capturing AND logic.

Table 5 Author statements collection B
Fig. 6
figure 6

Modeling signal integration with the part of relation. Bretscher et al. [13] (Table 5A, this manuscript) found that restoring tax-2 expression to BAG neurons rescued CO2 avoidance on food, but not off food, suggesting that tax-2-dependent avoidance behavior requires food input. This implies an AND-gated interaction to integrate food and CO2 signals. A One of the relations in GO–CAM, causally upstream of does not capture the necessity of each input, whereas B part of does imply necessity, as required to capture the AND logic involved in sensory integration. C GO–CAM representation of the author statement listed in Table 2A (this manuscript). D This relation may also be useful for modelling co-ordination of behaviors. Kopchock et al. [42] (Table 5B, this manuscript) found that optogenetic activation of the vulval muscles was insufficient to induce an egg-laying event; instead co-ordination of VM activation with a particular phase in the body bend during locomotion was required

3.7.2 Co-ordination of neural activity and physical features of behavior

Our egg-laying data set contains statements describing a mechanical feature of egg-laying regulation, namely, neural activity in the vulval muscles is co-coordinated with the phase of body bending during locomotion [42] (Table 5B). We suggest that part of and a new GO Biological Process class behavior co-ordination process could be used to model this (Fig. 6D). This representation captures the author’s interpretation that neural activity in VM drives oviposition conditional on features of locomotory behavior.

3.7.3 Neuromodulation

An important goal of neural modeling is to capture neuromodulatory effects, which may be defined as changes in neuronal excitability or dynamics, due to changes in internal state or external context [6]. We found a small number of entries in our egg-laying data set that described changes in membrane excitability (e.g., Table 5C, D). We chose to model these with the GO term regulation of resting membrane potential (GO: 0060075), with the view that changing the ability of the cell to maintain its resting potential is the primary mechanism for regulating neuronal excitability. However, it may be more appropriate to use a parent GO class that can model changes in excitability, rather than implying any mechanism. For instance, one might imagine induced changes in receptor expression that could alter excitability or responsiveness, without changing the resting membrane potential [61]. The term regulation of membrane depolarization (GO: 0003245) and its children may be more appropriate when the mechanism is not known.

3.8 Extending existing ontology classes for modelling neurobiology

We found that the use of existing ontologies provided the correct classes for building our models of neural circuitry. However, in some cases we found that additional classes would be useful for a complete and accurate description of the type of biology we are modeling. These proposed additional classes would be added to GO, the Evidence and Conclusion Ontology (ECO) and the Environmental Conditions, Treatments and Exposures Ontology (ECTO) and are listed in Table 3. Evidence supported by four types of experiments, chemical inhibition of neurons via histamine chloride [55], inhibition of synaptic transmission [67], mechanical perturbation, and long-term exposure experiments require additional classes in ECO. The categories below describe the biological phenomena that require new GO Biological Process terms, GO Molecular Function terms and ECTO terms to model. Inclusion of these new classes would enrich the kinds of queries that could be supported by CeN–CAM (for instance, we may want a list of all interneurons whose activity is known to be modulated by peptidergic output from ASI neuron). Particularly useful would be the addition of the previously mentioned requirement for GO terms describing extra-synaptic neuropeptide signaling and neuropeptide activity. OBO ontologies are carefully managed, and ontology developers provide processes for the addition of new classes. For instance, we were able to add a GO term for carbon dioxide receptor activity (GO: 0170015) via the GO GitHub repository by providing the necessary information for its incorporation into the ontology (see https://github.com/geneontology/go-ontology/issues/24994). We discuss other proposed classes below.

3.8.1 Fine temporal dynamics of neural activity and behavior

Many statements described neural activity in fine temporal detail. Experimental treatments are sometimes reported to result in changes to either magnitude, duration and/or frequency of membrane depolarization or hyperpolarization (e.g., Table 5E). In some cases, these phenotypes lead authors to the interpretation that these parameters of a neuron’s behavior are under selection in wild-type organisms, and required to perform the given behavioral task (for instance, changes in the frequency of calcium transients in neurons of the egg-laying circuit are thought to reflect shifts from ‘active’ to ‘inactive’ states of the circuit, reflecting phases of the behavior [25]). However, the GO class for membrane depolarization (GO: 0051899) does not distinguish these variations, and related terms such as positive regulation of membrane depolarization (GO: 1904181) explicitly groups these phenomena together under one term. In the future, it may be useful to have these classes separated into explicit categories for a more comprehensive and informative view of how neural activity is regulated.

Likewise, many assays of egg-laying behavior document its temporal features, dividing it into active and inactive phases, and measuring the effect of various perturbations on their duration and frequency (e.g., Table 4D). In the CO2 avoidance literature, a small number of entries described fine details in motor output as a result of neuronal perturbations, such as changes in rates of reversal or frequency of omega turns [13] (Table 5F). We were unable to model these features due to a lack of sufficiently fine-grained GO terms in the Biological Process ontology. However, we note that WormBase has a phenotype ontology to describe behavior in many of the appropriate ways (for example turning frequency increased [WBPhenotype: 0002313)] (Schindelman et al. 2011). Since these are mutant phenotypes and not Biological Processes, these ontology classes are a poor fit for CeN–CAM. Conversion of these phenotype classes into meaningful GO Biological Processes would be helpful to create more fine-grained models of behavior.

3.8.2 Temporal features of environmental input

In modelling environmental inputs, we found it necessary to model several temporal features. For instance, PATO lacked terms required to model changes in input concentration or intensity over time, as required to model the OFF response to CO2 in ADF neurons (Fig. 5B). We found that terms in the Environmental Conditions, Treatments and Exposures Ontology (ECTO) came closer to these requirements [e.g., exposure to decreased methane (ECTO: 4000005)], but a specific exposure term for many chemicals, such as carbon dioxide, does not exist. We propose and define new classes specifying temporal properties that could be hosted in ECTO (Fig. 7) (Table 3).

Fig. 7
figure 7

Proposed classes for addition to the environmental conditions, treatments and exposures ontology (ECTO) to represent temporal features of environmental inputs. The new classes increasing amount and decreasing amount can be used in combination with the relation has quality (RO: 0000086)

4 Discussion

Given the size, scope and rapid growth of the biological literature, new methods are required to integrate, represent and interpret accumulating knowledge at varying levels of detail. One method for achieving this is to integrate objects from relevant ontologies in semantic graphs [38, 70]. In this work, we demonstrate the applicability of a gene ontology-based semantic modelling framework, GO–CAM, for representing knowledge of neural circuits in C. elegans. By capturing author statements in select papers, we were able to construct simple semantic statements and then link those statements together to begin building causal models of two C. elegans behaviors, egg-laying and carbon dioxide avoidance. We found that the existing Relations Ontology (RO) relations used in GO–CAMs are adequate, but new classes are required in several ontologies, including the Gene Ontology (GO), the Evidence and Conclusions Ontology (ECO) and the Experimental Conditions, Treatments and Exposures Ontology (ECTO) to fully represent the statements in our collection.

In general, the GO contains a rich vocabulary for neurobiology, in part due to projects, such as SynGO [41], which expanded GO’s representation of synaptic function, and deposited corresponding annotations in the GO repository as GO–CAM models. In addition, the Reactome knowledgebase contains pathways for synaptic transmission, and these have been converted to GO–CAMs [31]. To complement the synaptic transmission part of the ontology, new terms will be required to describe features of extra-synaptic (i.e., peptidergic) connectivity. We also anticipate a more widespread need to model temporal details of sensory neuron input, since chemotactic behaviors typically involve sensing of spatial gradients, experienced by sensory neurons as change over time, resulting in movement toward or away from the odour source. For instance, the sensory neuron AWA adapts to a given concentration of diacetyl, requiring increasing concentration for continued depolarization and associated positive chemotaxis [43]. Adding these temporal details to the inputs of individual neurons would allow for more expressive representations. In addition, many of the GO Biological Process terms that we propose as additions to the Gene Ontology are the result of describing the processes at the level of an organism or cell and are not derived from attempts to annotate gene function. Such temporal details are often derived from phenotypic measurements resulting from non-genetic perturbation (e.g., cell ablation, pharmacological inputs, in anticipation of the involvement of gene activities in the programmed regulation of these processes. In practice, new GO BP terms based on these observations will likely need genetic evidence before they can be included in the GO; however, we include them here as suggestions, which may guide future proposals as the need arises.

In addition, the models presented here go beyond the minimal requirements for the conversion of author statements into semantic triple format. According to our criteria, a satisfying model should reflect the conceptual framework of the field (in this case, representing causal flow from inputs through circuits to behavior). In this way, the models indicate which knowledge is missing. For instance, Fig. 3B depicts the role of npr-1 in the URX neuron in carbon dioxide avoidance behavior. By including a nervous system process term indicating the involvement of neural circuit, it is possible to indicate that a carbon dioxide receptor, whose encoding gene and cellular site of action require identification, are part of the circuit. The data modelling work presented here also provided us with an empirical basis for creating generic models or templates for each of the statement categories described above (Additional file 1: Figs. S1, S2). In constructing these generic models, we followed structures that reflected the relevant conceptual framework into which particular classes of experimental results should fit. For instance, the full description of a peptidergic connection between neurons should involve the relevant ligand(s), receptor(s), ion channel(s) and encoding genes (Additional file 1: Fig. S2C). Including the overarching biological process term neuron-to-neuron signaling by neuropeptide allows a database to be indexed for these types of connections. In this way, scientists and biocurators can collaborate to generate models with a common understanding of their proper criteria.

We also tried to capture simple ‘computations’ important for nervous system function, and arrived at some modelling principles that are noteworthy. First, when representing the AND logic involved in multisensory integration, it is important to use relations that convey necessity, and have separate causal flows that converge on a single biological process. We note that the proposed GO Biological Process terms (signal integration process and behavior coordination process) describe an information processing event that could in theory be carried out via any molecular mechanism that satisfies the task. Representing similar kinds of neurobiological knowledge in the GO may require further understanding of the types of molecular mechanisms that typically underlie this type of nervous system process [29].

In this study, we focused on modelling interactions within neural circuits, and their relationship to broad features of behavior, rather than the detailed mechanics of motor programs that they control. In principle, it is possible to link neural activities to the mechanical outputs of neural activity, where both are considered part of the organismal behavior under study. In the case of egg-laying, this motor output is simple, involving only the contraction of the vulval muscles. However, CO2 avoidance involves a complex series of locomotory processes, each of which is regulated by specific patterns of neural activity (for example, see [13]. As discussed above, inference of new biological process terms by conversion of the appropriate terms from the C. elegans Phenotype Ontology will allow modelling of these features of behavior. These motor outputs could then be modelled as part of the organismal behavior carbon dioxide avoidance behavior (i.e., they are the targets of the regulation of chemotaxis term in the models diagrammed here).

One limitation not previously discussed is that GO–CAM currently has no way of incorporating negative data. In some cases, this prevented documentation of important discoveries from our literature search. For instance, [62] found that in the absence of VC neurons and HSN neurons, spontaneous Ca2+ transients continued in the vulval muscles, suggesting that these neurons are not necessary for VM activity (Table 5G). These are arguably important omissions from these knowledge graphs.

With these adjustments, this work demonstrates the possibility of creating a machine-readable knowledge base for neurobiology that can return information based on queries. An important part of this resource will be to generate a representation of the C. elegans brain that is computable, since the current anatomy ontology does not contain synaptic or gap junction connections between neurons [45]. Incorporating connectome data that contains the appropriate neuron to neuron relations and property chain algebra [i.e., (Neuron A synapses to Neuron B) and (Neuron B synapses to Neuron C) implies that (Neuron A connects with Neuron C)] will allow queries that include or depend on synaptic connectivity information.

The application and widespread use of this technology depends on the amount of information incorporated into the knowledgebase, much of which at this point is directly dependent on manual input by curators. Given our definition of an author statement as a passage of text following a stereotyped form (hypothesis, observation, interpretation), it is possible to envision how author statements could be identified automatically. We envision a scenario in which machine intelligence could be applied to identify not only author statements, but identify the category of experiment they describe, and the GO terms that correspond to words within them. Using the generic data models described here as templates could help to ensure that machine-generated models are constrained by a desirable structure. With these capabilities, a large volume of the C. elegans neural circuit literature could potentially be converted into CeN–CAM models computationally. The author statements that we collected as part of this work will serve as training data to pursue this type of approach.

It is also important for biologists to have usable and intuitive ways of interacting with and analysing synthesized knowledge. One way to achieve this is by representing compiled neurobiological data in an anatomical context. For instance, the Virtual Fly Brain project has used an ontology-based approach to integrate connectivity and single-cell gene expression data, which can be visualised in a three-dimensional visualization of the brain, using a semantic integration framework [26, 52]. This allows users to run queries to explore gene expression and phenotype data in an anatomic context. We are exploring the possibility of functionally annotating the C. elegans connectome in molecular detail using CeN–CAM (Fig. 8). The relevant data are the same as those captured by the statement categories for which we have generated templates, namely, causal relationships between inputs to neurons, neurons to behavior, and causal connection between neurons. In addition to populating template data models, the GO terms in the relevant author statements could be used to populate a dataframe of the kind used by visualization software, such as Cytoscape [59] (Additional file 1: Figure S3), ideally in an automated manner. This visualization could serve as an intuitive entry point for exploring neural circuit function on a connectome scale, where evidence behind individual elements of the graph could be accessed by linking to the corresponding CeN–CAM models. An anatomical visualization that includes functional and connectivity data would allow predictions to be made about functional relationships between different circuits. For instance, CO2 has been shown to inhibit egg-laying [28] in an AWC-dependent manner. Representing neurons that respond to CO2 along with neurons that control egg-laying in a connectome context (Fig. 8) suggests that ASH is a CO2 responsive neuron synaptically linked to HSN. Indeed, ASH was later shown to inhibit both egg-laying and HSN activity [73]. Functional connectome annotation would also enable various kinds of system-wide analysis of the C. elegans brain—a research avenue that has so far been pursued in the absence of functional information [1, 37, 57]. We also envision the ability to make useful predictions using the underlying semantic models. For instance, the graphs may include causal links between molecular functions and behaviors that result from synthesis of disparate literature, leading to new predictions about how genetic or pharmacological perturbations may affect behavior. Thus, the work described here provides semantically and biologically rigorous foundations for an integrated systems neuroscience resource combining knowledge representation, connectome annotation and associated computational analyses of C. elegans nervous system function.

Fig. 8
figure 8

Functional annotation of the C. elegans connectome allows visualization of causal relationships within and among neural circuits. Black or grey arrows indicate synaptic connections from electron microscopy of serial sections. Coloured solid arrows indicate activating (green) or inhibiting (blue) synaptic connections. Coloured dotted arrows indicate activating or inhibiting indirect or extra-synaptic connections. Bold outlines on neurons indicate the sign of the neuron on the specified behavior. Fill colour on neurons indicates the effect of the specified input on neural activity. A Synaptic connectivity for subset of neurons involved in both CO2 avoidance behavior and egg-laying behavior. B Neurons activated or inhibited by CO2 are indicated by fill color, neurons contributing to CO2 avoidance behavior indicated by outline color. C Neurons activated or inhibited by serotonin are indicated by fill color, neurons contributing to egg-laying behavior indicated by outline color