At the core of development is the predictable production of functional and differentiated tissues from early, less well-defined tissues. It would therefore be sensible if, when one person uses, for example, the term “E14.5 mouse left atrium” in his systems model of heart development, another person using the same term in her model can link to that of the first. The way that such linkage is done for proteins is to use an ID from a standard database (e.g., the protein ID from Uniprot, http://www.ebi.uniprot.org), and because proteins are all amino-acid strings and hence of the same rank, they can readily be stored in the tables of relational databases.
Anatomical tissue organization, in contrast, is hierarchical in nature: The vertebrate hindlimb, for instance, is obviously partitioned into regions (thigh, knee, calf, foot), each of which has its own parts, and the concept of “hindlimb” would naturally be expected to include these subordinate parts, together with information about their relationship to the hindlimb and to one another. While it is obviously straightforward to assign a unique ID to a given tissue at a given developmental age, it is clear that the hierarchical organization of tissues poses some organizational problems beyond those needed for handling sequence data.
The way that such hierarchical information is most appropriately handled is through ontologies. These are domains of knowledge formalized in a way that allows them to be computationally accessible. In practice, ontologies are built up by linking facts in a hierarchical way. Here, a fact is a triad of the general form <term><relationship><term> and terms can have parents and children (e.g., the E14.5 left atrium is part of the E14.5 heart; the E14.5 heart is part of the E14.5 cardiovascular system, etc.). Although they are tedious to produce (even the simplest organ system has a great many tissues and a lot of organization), there are now part-of ontologies for the tissues of all the main model adult organisms and for the developmental anatomy of the mouse, zebrafish, and Drosophila (accessible from the Open Bio-Ontologies site, http://www.obo.sourceforge.net). Every term in these ontologies carries a standard ID of the form <abcd><ijkl>, where abcd gives a short letter code for the ontology (e.g., EMAP for mouse development) and ijkl gives the number for a specific tissue at a specific developmental age (e.g., EMAP:7917 is the ID for the E14.5 mouse left atrium, with EMAP standing for the Edinburgh Mouse Atlas Project, http://www.genex.hgu.mrc.ac.uk). It is these IDs that allow for interoperability because they represent defined concepts (or terms) that can be used anywhere, even as synonyms.
It is worth noting that such an anatomical ontology is more than just the list of the parts as it includes a great deal of knowledge about how these parts are organized into larger structures and these larger structures into organ systems (e.g., Fig. 1). Such an ontology may also include additional knowledge built on other relationships such as derives from (an ontology of developmental anatomy would well include lineage relationships) and type data (e.g., the femur is a bone). There is also no reason why a child should have only a single parent in the ontology: For example, it is equally appropriate to describe the femur as <part of><the skeleton> as <part of><the hindlimb>, and a rich ontology could well include both relationships (and this multiparenting of terms means that it would be called by the technical term Directed Acyclic Graph, or DAG). This is not the place to include a detailed discussion of how anatomical ontologies are built and used (the interested reader should consult Bard 2005), but it should be mentioned that the internal organization of an anatomy ontology is usually rather complex (the structure needs to be able to handle many relationships as well as definitions and links) and is best read in a browser program such as OBO-Edit or COBrA (Figs. 1 and 2; Aitken et al. 2004; Harris et al. 2004) that is visualized in a GUI rather than as a list on paper. There are several languages in current use for handling ontologies (the best known are OBO and OWL) and they can be translated into each other using the COBrA tool.
In the context of systems developmental biology and in addition to the appropriate anatomy ontology, there are two general ontologies that are also useful. The first is the Cell-Type Ontology (Bard et al. 2005) and the second is the Gene Ontology (Ashburner et al. 2000; Harris et al. 2004). The former, unlike the anatomy ontologies, not only includes all the common and many of the uncommon cell types that are found across the phyla but it is essentially species-independent and so facilitates cross-species analyses and comparisons. This ontology is structured to include our knowledge of the many properties of these cell types and each is separately coded under function, morphology, ploidy, development, etc., using two relationships, is-a and descends-from (see Fig. 2). This ontology is thus a terse summary of a great deal of knowledge about cell types and their properties.
The Gene Ontology or GO is by far the best known and most used of the standard bio-ontologies (it is used for protein annotation in Uniprot). Unlike Uniprot, it does not include sequence information but focuses on the properties of proteins and includes hierarchical knowledge about (1) cellular locations, (2) molecular functions, and (3) the functional processes in which they are involved. For systems developmental biology, it is the latter that is the most important and the process hierarchy includes a wide variety of developmental processes (although they are distributed across the ontology rather than integrated under a single heading [Fig. 3]), each of which, of course, has a unique ID. Of particular interest here is the database of proteins that is linked to the GO so that a user can easily identify all the stored proteins associated with a GO term, or the GO terms associated with a chosen protein (although it should be said that keeping this database up-to-date is a major task).
One important factor about ontology terms is that they can be associated with data (usually held in a standard relational database and linked to the ontology via the appropriate IDs); examples include the proteins that satisfy the definition of a GO term (http://www.godatabase.org), the genes expressed in a particular mouse tissue at a particular time, (http://www.informatics.jax), and the micrographs associated with a pathologic state (http://www.pathbase.net). Here, the hierarchical knowledge within the ontology comes into play: If, for example, a user requires the genes associated with the developing mouse forelimb at E12.5, the response comes from searching the ontology to identify the constituent tissues in the limb and using their IDs to collect all the associated data. This can be done because this type of part of relationship has the property known as upwards propagation. This means that if a term has data associated with it, then these data can be associated with the parent (e.g., a gene expressed in the tarsus is also expressed in the hindlimb). Propagation is associated with some bio-ontology relationships (e.g., part of, is a) but not with others (e.g., develops from; one would not expect pigment cells to have the same properties as their neural-crest-cells precursors).