I now turn to a description of how function is conceived of, and represented in practice, in the Gene Ontology.
2.1 Gene Products, Not Genes, Have Functions
In order to understand how gene function is represented in the GO, some basic molecular biology knowledge is required.
-
A gene is a contiguous region of DNA that encodes instructions for how the cell can make a large (“macro”) molecule (or potentially multiple different macromolecules).
-
A macromolecule is called a gene product (as it is produced deterministically according to the instructions from a gene), and can be of two types, a protein (the most common type) or a noncoding RNA.
-
A gene product can act as a molecular machine; that is, it can perform a chemical action that we call an activity.
-
Gene products from different genes can combine into a larger molecular machine, called a macromolecular complex.
Each concept in the Gene Ontology relates to the activity of a gene product or complex, as these are the entities that carry out cellular processes. A gene encodes a gene product, so it can obviously be considered the ultimate source of these activities and processes. But strictly speaking, a gene does not perform an activity itself. Thus, when the Gene Ontology refers to “gene function,” it is actually shorthand for “gene product function.”
2.2 Assertions About Functions of Particular Genes Are Made by “GO Annotations”
The Gene Ontology defines the “universe” of possible functions a gene might have, but it makes no claims about the function of any particular gene. Those claims are, instead, captured as “GO annotations.” A GO annotation is a statement about the function of a particular gene. But our biological knowledge is extremely incomplete. Accordingly, the GO annotation format is designed to capture partial, incomplete statements about gene function. A GO annotation typically associates only a single GO concept with a single gene. Together, these statements comprise a “snapshot” of current biological knowledge. Different pieces of knowledge regarding gene function may be established to different degrees, which is why each GO annotation always refers to the evidence upon which it is based.
2.3 The Model of Gene Function Underlying the GO
The Gene Ontology (GO) considers three distinct aspects of how gene functions can be described: molecular function, cellular component, and biological process (note that throughout this chapter, bold text will denote specific concepts, or classes, from the Gene Ontology). In order to understand what these aspects mean and how they relate to each other, it may be helpful to consider the biological model assumed in GO annotations. GO follows what could be called the “molecular biology paradigm,” as described in the previous section. In this representation, a gene encodes a gene product, and that gene product carries out a molecular-level process or activity (molecular function) in a specific location relative to the cell (cellular component), and this molecular process contributes to a larger biological objective (biological process) comprised of multiple molecular-level processes. An example, elaborating on the example in the original GO paper [9], is shown in Fig. 1.
To reiterate, GO concepts were designed to apply specifically to the actions of gene products, i.e., macromolecular machines comprising proteins, RNAs, and stable complexes thereof. In the GO representation, a region of DNA (e.g., a regulatory region) is treated not as carrying out a molecular process, but rather as an object that gene products can act upon in order to perform their specific activities.
2.4 Molecular Functions Define Molecular Processes (Activities)
In the GO, a molecular function is a process that can be carried out by the action of a single macromolecular machine, via direct physical interactions with other molecular entities. Function in this sense denotes an action, or activity, that a gene product performs. These actions are described from the two distinct but related perspectives commonly employed by biologists: (1) biochemical activity, and (2) role as a component in a larger system/process. Biochemical activities include binding and catalytic activities, and are only functions in the broad sense, i.e., how something functions, the molecular mechanism of operation. Component role descriptions, on the other hand, refer to roles in larger processes, and are sometimes described by analogy to a mechanical or electrical system. For example, biologists may refer to a protein that functions (acts) as a receptor. This is because the activity is interpreted as receiving a signal, and converting that signal into another physicochemical form. Unlike biochemical activities, these roles require some degree of interpretation that includes knowledge of the larger system context in which the gene product acts.
2.5 Cellular Components Define Places Where Molecular Processes Occur
A cellular component is a location, relative to cellular compartments and structures, occupied by a macromolecular machine when it carries out a molecular function. There are two ways in which biologists describe locations of gene products: (1) relative to cellular structures (e.g., cytoplasmic side of plasma membrane) or compartments (e.g., mitochondrion), and (2) the stable macromolecular complexes of which they are parts (e.g., the ribosome). Unlike the other aspects of GO, cellular component concepts refer not to processes but rather a cellular anatomy. Nevertheless, they are designed to be applied to the actions of gene products and complexes: a GO annotation to a cellular component provides information about where a molecular process may occur during a larger process.
2.6 Biological Processes Define Biological Programs Comprised of Regulated Molecular Processes
In the GO, a biological process represents a specific objective that the organism is genetically “programmed” to achieve. Each biological process is often described by its outcome or ending state, e.g., the biological process of cell division results in the creation of two daughter cells (a divided cell) from a single parent cell. A biological process is accomplished by a particular set of molecular processes carried out by specific gene products, often in a highly regulated manner and in a particular temporal sequence.
An annotation of a particular gene product to a GO biological process concept should therefore have a clear interpretation: the gene product carries out a molecular process that plays an integral role in that biological program. But a gene product can affect a biological objective even if it does not act strictly within the process, and in these cases a GO annotation aims to specify that relationship insofar as it is known. First, a gene product can control when and where the program is executed; that is, it might regulate the program. In this case, the gene product acts outside of the program, and controls (directly or indirectly) the activity of one or more gene products that act within the program. Second, the gene product might act in another, separate biological program that is required for the given program to occur. For instance, animal embryogenesis requires translation, though translation would not generally be considered to be part of the embryogenesis program. Thus, currently a given biological process annotation could have any of these three meanings (namely a gene activity could be part of, regulate, or be upstream of but still necessary for, a biological process). The GO Consortium is currently exploring ways to computationally represent these different meanings so they can be distinguished.
Biological process is the largest of the three ontology aspects in the GO, and also the most diverse. This reflects the multiplicity of levels of biological organization at which genetically encoded programs can be identified. Biological process concepts span the entire range of how biologists characterize biological systems. They can be as simple as a generic enzymatic process, e.g., protein phosphorylation, to molecular pathways such as glycolysis or the canonical Wnt signaling pathway, to complex programs like embryo development or learning, and even including reproduction, the ultimate function of every evolutionarily retained gene.
Because of this diversity, in practice not all biological process classes actually represent coherent, regulated biological programs. In particular, GO biological process also includes molecular-level processes that cannot always be distinguished from molecular functions. Taking the previous example, the process class protein phosphorylation overlaps in meaning with the molecular activity class protein kinase activity, as protein kinase activity is the enzymatic activity by which protein phosphorylation occurs. The main difference is that while a molecular function annotation has a precise semantics (e.g., the gene carries out protein kinase activity), the biological process annotation does not (e.g., the gene either carries out, regulates, or is upstream of but necessary for a particular protein kinase activity).