Advertisement

Assembly of Logic-Based Diagrams of Biological Pathways

  • Tom C. Freeman
Chapter
Part of the Systems Biology book series (SYSTBIOL)

Abstract

The networks of molecular interactions that underpin cellular function are highly complex and dynamic. The topology, behaviour and logic of these systems, even on a relatively small scale, are far too complicated to understand intuitively. Furthermore, enormous amounts of systems-level data pertaining to the nature of genes and proteins, and their potential cellular interactions, have now been generated, but we struggle to interpret these data. There is therefore general agreement amongst biologists about the need for good pathway diagrams. However, the challenge of creating models that reflect our current understanding of these systems and displaying this information in an intuitive and logical manner is not trivial. The modified Edinburgh pathway notation (mEPN) scheme is founded on a notation system originally devised a number of years ago, but through use has now been refined extensively. This has been primarily driven by the author’s attempts to produce process diagrams for a diverse range of biological pathways, particularly with respect to immune signalling in mammals. Whilst requiring a considerable effort, the assembly of pathway models provides a resource for training, literature/data interpretation, computational pathway modelling and hypothesis generation. Here I discuss the mEPN scheme, its symbols and rules for its use and thereby hope to provide a coherent guide to those planning to construct pathway diagrams of their biological systems of interest.

Keywords

Pathway modelling Notation scheme Process diagram Graphical representation 

6.1 Introduction

Complete genome sequencing of hundreds of pathogenic and model organisms over the last decade has provided us with the parts list of life (Janssen et al. 2003). At the same time enormous amounts of data pertaining to the nature of genes and proteins and their potential cellular interactions have now been generated using new analytical platforms including, but not limited to, gene coexpression analysis, yeast two-hybrid assays, mass spectrometry and RNA interference (Reed et al. 2006). With the advent of next generation sequencing technologies and advances in other fields, this deluge of data on biological systems only looks set to continue and increase. Whilst the data from these ‘omics’ platforms can be overwhelming these analyses finally allow us to open a window on to the complex cellular and molecular networks that underpin life (Kitano 2002; Nurse 2003). The main problem we now face is how to interpret all these data and use it to better understand the structure and function of biological pathways in health and disease (Cassman 2005).

Our existing knowledge of biological pathways and systems is still largely based on the painstaking efforts of countless investigators whose work has, and continues to be, focused on a specific cell type and the function of one or a small number of proteins within that cell. Their studies have produced our current framework of understanding of how proteins and genes interact with each other to form the metabolic, signalling and effector systems that together regulate biological form and function. Much of this work, however, remains locked inside the literature where specific insights into the functional role of cellular components are subject to the semantic irregularities that come with their description by different authors. As a result, the details of a given pathway have traditionally been known only to a few experts in the field whose research is often focused on a single protein and its immediate interaction partners within that pathway. These pathways are understood more generally by their description in reviews and diagrams produced on an ad hoc basis.

To a certain degree the concept of a biological pathway is an artificial construct and in reality there is only one big integrated network of molecular interactions operating within a cell. However, it is still useful to think in terms of pathways as being connected modules of this network. As such, a pathway may be considered to consist of a specific biological input or event that initiates a series of directional interactions between the components of a system leading to an appropriate shift in cellular activity. In other words a biological pathway might be viewed as starting from the engagement of a ligand with its receptor to all the downstream consequences of that interaction. This is not to say that the cellular components utilised for such a pathway will be necessarily unique to it, only that they are connected in this context. As we begin to appreciate the complexity of these molecular networks, their topology and interconnectivity, there is increasing interest in moving away from the traditional gene-centric view of life to a systems- or pathway-level appreciation of biological function. To do this we need to create models of these pathways.

Pathway diagrams act as a visual representation of known networks of interaction between cellular components, and modelling them is fundamental to our understanding of them. At their best formalised diagrams of biological pathways act as a clear and concise visual representation of the known interactions between cellular components. However, the task of assimilating the large amounts of available data on a particular pathway and representing this information in an intuitive manner remains an ongoing challenge. Indeed, there are numerous different ways that one can represent a pathway and pathway diagrams are currently available in a plethora of different forms. Using the term in the broadest sense, they can be a picture that accompanies a review article, wall charts distributed by journals and companies, small schematic diagrams used to support mathematical modelling efforts or network graphs reflecting all known protein interactions based on the results of large-scale interaction studies or literature mining. As such, pathway models are an invaluable resource for interpreting the results of genomics studies (Antonov et al. 2008; Arakawa et al. 2005; Babur et al. 2008; Cavalieri and De Filippo 2005; Dahlquist et al. 2002; Ekins et al. 2007; Pandey et al. 2004), for performing computational modelling of biological processes (Eungdamrong and Iyengar 2004; Kwiatkowska and Heath 2009; Ruths et al. 2008; van Riel 2006; Watterson et al. 2008) and fundamentally important in defining the limits of our existing knowledge. To support these efforts there are also a growing number of databases that serve up a wide range of pathways which are either curated centrally (http://www.biopax.org/; http://www.ingenuity.com/; Kanehisa and Goto 2000; Thomas et al. 2003) or increasingly by the community (Joshi-Tope et al. 2005; Pico et al. 2008; Schaefer et al. 2009; Vastrik et al. 2007). These offer searchable access to pathway diagrams and interaction data derived from a combination of manual and automated (text mining) extraction of primary literature, reviews and large-scale molecular interaction studies. The sheer range of resources available (Bader et al. 2006) reflect the current interest in pathway science. Whilst invaluable and in many ways the best we have, a major problem with these efforts is that the information content of these diagrams is frequently limited, generic and visualisations of these systems are of variable and often poor quality; pathways are drawn using informal and idiosyncratic notation systems using a variety of shapes (glyphs) to illustrate component ‘type’. There are variable degrees of accuracy and specificity in defining what pathway components are being depicted and the relationships between them. Resources are often fragmented with some proteins or metabolites being members of numerous pathways, the concept of pathway membership being a highly subjective division. The pathways themselves are rarely available as a cohesive network and there are numerous pathway exchange formats in current use (Hucka et al. 2003; Lloyd et al. 2004; Luciano 2005). Finally, pathway diagrams are generally highly subjective reflecting the curator’s bias, such that two diagrams depicting the ‘same’ pathway may share little in common. Together these factors commonly result in uncertainty as to what exactly is being shown. All in all, despite the huge efforts in time and resources that has been poured into pathway science the state of the art leaves a lot to be desired.

As our appreciation of systems-level biology increases rapidly, there has been an increasing realisation of the need for comprehensive well-constructed maps of known pathways. Over the past 10 years a number of groups have suggested formalised notation schemes and syntactical rules for drawing ‘wiring diagrams’ of cellular pathways (Cook et al. 2001; Kitano et al. 2005; Kohn 1999; Moodie et al. 2006; Pirson et al. 2000). These have been used to construct a number of large pathway diagrams (Calzone et al. 2008; Oda et al. 2004; Oda and Kitano 2006). These pioneering efforts have all contributed to the field and more recently the Systems Biology Graphical Notation (SBGN) group has proposed a series of formalised pathway notation schemes to be adopted by all (Novere et al. 2009). Of course in principle this is an excellent idea but it remains to be seen whether the SBGN schemes are going to be widely taken up or indeed whether they are flexible enough to suit all purposes.

Our own efforts on pathway modelling stem from our interest in macrophage biology and in understanding pathways known to be activated in these cells during infectious and inflammatory disease. Therefore, the last few years we have been constructing large graphical models of macrophage-related pathways as a way of recording what is known about the signalling events controlling this cell’s immune biology (Raza et al. 2008, Raza et al. 2010). In so doing our main objectives have been to create models that
  1. 1.

    support the detailed representation of a diverse range of biological entities, interactions and pathway concepts

     
  2. 2.

    represent a consensus view of pathway knowledge in a semantically and visually unambiguous manner

     
  3. 3.

    are easy to assemble and understandable by a biologist

     
  4. 4.

    are useful in the interpretation of ‘omics’ data

     
  5. 5.

    are sufficiently well defined that software tools can convert these graphical models into formal models, suitable for analysis and simulation

     

In attempting to achieve these goals we have faced one of the central challenges in pathway biology: How exactly does one construct clear concise pathway diagrams of the known interactions between cellular components that can be understood by and useful to a biologist? In the beginning our efforts were largely based on the principles of the process diagram notation (PDN) (Kitano et al. 2005) and the original Edinburgh pathway notation (EPN) scheme (Moodie et al. 2006). However during the course of working with these notation schemes it became apparent that the available diagrams drawn using these systems were not always easy to interpret and the schemes were a challenge to implement. Furthermore, we found that these notation schemes did not support all of the concepts that we wished to represent in order to reflect the full diversity of pathway components and the relationships between them. As a result of our efforts we have significantly modified these existing schemes and created what has now been named the ‘modified Edinburgh pathway notation’ (mEPN) scheme (Raza et al. 2008, Freeman et al. 2010). Below I describe the basic principles behind the mEPN scheme and illustrate how it can be used to depict a wide variety of biological pathways.

6.2 Definition of the Modified Edinburgh Pathway Notation (mEPN) Scheme

A pathway may be considered to be a directional network of molecular interactions between components of a biological system that act together to regulate a cellular event or process. In this context a component is any physical entity involved in a pathway that contributes or influences its activity, e.g. a protein, protein complex, nucleic acid (DNA, RNA), molecule. The mEPN scheme is a collection of formalised symbols that form the constituent parts of a graphical system for depicting the components of a biological pathway and the interactions between them. The mEPN scheme is based on the node and edge principles of depicting networks. This allows one to use ideas and tools previously developed in graph theory and applied more recently to computational systems biology. Cellular components are represented as nodes (vertices) in the network and specific glyphs (stylised graphical symbols) are used that impart information nonverbally on the class of biological entity portrayed, e.g. protein, gene, biochemical. The processes that connect components are also represented by nodes using different glyphs and the connectivity between them is defined by edges (lines/arcs). Edges represent interactions or relationships between one component and another usually where one component influences the activity of another, e.g. through its binding to, inhibition of, catalytic conversion of. The network of interactions between cellular components and processes thereby defines a pathway.

Depiction of Pathway Components

When drawing pathways one has to decide about the level of biological detail that you wish to depict. It is not uncommon in pathway depiction to use component glyphs that infer structural or functional characteristics of the entities depicted. For instance, receptors may be shown using a glyph with a specific ligand binding site or possibly as a protein containing membrane-spanning domains. Whilst on one level this approach is appealing to the eye and imparts visual information on the nature of the molecular species depicted, it can lead to complications. After all both depictions described above may be appropriate for any receptor and a protein may also have other functional domains which could be graphically depicted. If one tries to impart all this information visually it leads to a notation system that is difficult to implement and to remember. Such a system also requires the development of specific pathway editing tools that support it. In contrast we have used a set of standard shapes to represent different classes of components (molecular species) and in so doing created a notation scheme that is supported by generic network-editing/visualisation tools, in particular the tool of choice for all our work has been the freely available yEd (yFiles, Tubingen). There is, however, a variety of other pathway and network-editing tools available (Pavlopoulos et al. 2008). It is worth remembering that the ability to graphically depict a wide variety of pathway concepts depends not only on the tool used to construct and display them but also on the pathway notation scheme employed.

The mEPN scheme as described here is based on the concepts first described for the process diagram notation (PDN) scheme (Kitano et al. 2005). However, our experience in building large-scale pathway models of a variety of biological systems has required us to depict concepts that were not supported by the original PDN scheme. Furthermore, lack of available pathway editing tools when we began this work and the scale of our diagrams have both played their part in determining our approach to pathway depiction. As a result there are a number of important differences that exist between the mEPN scheme described here and the other PDN schemes. First, in common PDN, the mEPN uses simple shapes to define the class of a component but only a labelling system to define the exact identity of components (nodes). Other schemes use circles overlaid on nodes to depict protein modifications. We have found this a considerable overhead to implement which can interfere the clarity of what is depicted rather than enhance it. Furthermore, the PDN scheme is not supported by many of the general purpose network visualisation tools, e.g. yEd, Cytoscape, Biolayout Express 3D (Freeman et al. 2007; http://www.yworks.com; Yeung et al. 2008), requiring instead the use of dedicated pathway editing software, e.g. CellDesigner (Funahashi et al. 2008). Second, we have avoided the use of different styles of arrowheads to depict the nature of interactions (edges) which limits the vocabulary of edges and is a system that can be challenging to remember. Instead where appropriate, we have chosen to use inline annotation nodes to depict the meaning of edges; these carry a visual clue (a letter symbolising the meaning of the edge, e.g. A for activation, I for inhibition) and can potentially support a wider range of edge meanings. Again the use of a wide variety of arrowheads is not supported by many pathway/network-editing software packages. Finally, we explicitly state the nature of interactions by the use of labelled process nodes. Under other PDN-based schemes process nodes are used but generally not as a means to convey the nature of interactions except in the case of protein binding (association) and dissociation. When pathways are large and the distance between interacting species may be great, having a visual clue as to the nature of interactions is very important in our experience.

The full set of glyphs employed in the mEPN scheme is shown in Fig. 6.1. Under the scheme peptides, proteins and protein complexes are all represented by a rounded rectangle and genes depicted using a rectangle. Parallelograms may be used to show a specific DNA sequence known to play a specific functional role, e.g. promoter sequence. This may be shown on its own or associated with a gene or other genomic feature. Simple biochemicals, e.g. sugars, amino acids, nucleic acids, metabolites, are represented using a hexagon. It is often the case that an interacting component of a pathway is not an exact molecular entity but rather a molecular class or complex entity such as a virus or other pathogen. In this case we use a flattened circle (ellipse) to depict any generic entity. A small molecule or biologic known to affect a biological system is shown using a trapezoid. These may be licensed as a drug or used for experimental manipulation of biological components, e.g. enzyme inhibitor, siRNA. Finally, ions, e.g. Ca2+, Na+, Cl, or other simple molecules H2O, NO, O2, CO2 are represented using a diamond-shaped glyph.
Fig. 6.1

List of the glyphs used by the modified Edinburgh pathway notation (mEPN) scheme Unique shapes and identifiers are used to distinguish between each element of the notation scheme. The notation scheme essentially consists of the following categories of nodes representing cellular components, processes and Boolean logic operators. Edges are used to denote the interactions between components, the nature of the relationship between them being described using process nodes and Boolean operators and edge annotations. The cellular compartment in which these components reside is depicted by their spatial localisation in the network and background colour

Component Annotation

Multiple component names are often available to describe any given component. For example, the same protein may be called several different names in the literature. In other cases the same name has been used to describe different proteins and some protein names are quite different from the gene name. Other names sometimes used for labelling components in pathways do not represent any specific entity at all, e.g. NF-κB. Therefore, when non-standard nomenclature is used to name pathway components it frequently leads to ambiguity as to the exact identity of what is being depicted. Use of standard nomenclature to denote a component’s identity removes this uncertainty and also assists in the comparison and overlay of experimental data with pathway models. Under mEPN we recommend the use of standard gene nomenclature systems, e.g. human genome nomenclature committee (HGNC) or mouse genome database (MGD) systems to name human or mouse genes/proteins, respectively. These nomenclature systems now provide a near-complete annotation of all human and mouse genes. Their use in the naming of proteins as well as genes provides a direct link between the two. Therefore, when a protein or gene is discussed within a paper almost the first act is to search the databases in order to record the identity of the component according to standard nomenclature. Where other names (‘alias’) are in common use these name(s) may be shown as an addition to the label on the glyph representing the protein and included after the official gene symbol in rounded ( ) brackets. Protein complexes are named as a concatenation of the proteins belonging to the complex separated by a colon. Again if the complex is commonly referred to by a generic name this may be shown. There are no strict rules as to the order in which the protein names are shown in the complex and are often shown in the order in which the proteins join the complex, in the position they are likely to hold relative to other members of the complex (where known) or position relative to cellular compartments, e.g. with receptor proteins in a membrane-bound protein complex protruding into the extracellular space. Where a specific protein is present multiple times within a complex, this may be represented by placing the number of times a protein is present within the complex in angular brackets < >. If the number of proteins in the complex is unknown this may be represented by <n>. The particular ‘state’ of an individual protein or a protein within a complex may be altered as a consequence of a particular process. This change in the component’s state is marked using square [ ] brackets following the component’s name, each modification being placed in separate brackets. This notation may be used to describe the whole range of protein modifications from phosphorylation [P], truncation [t], ubquitinisation [Ub], etc. Where details of the site of modification are known this may be represented as, e.g. [P-L232] = phosphorylation at leucine 232. Alternatively the details of a particular modification may be placed as a note on the node visible only during ‘mouse-over’ or when viewing a node’s properties. Where multiple sites are modified this may be shown using multiple brackets, each modification (state) being shown in separate brackets. Unfortunately, there appears to be no universally recognised nomenclature system for many of the other classes of biologically active molecules, e.g. lipids, metabolites, drugs, and therefore when included in a pathway we have generally used names commonly recognised by biologists.

Colour may also be added to the diagrams to assist in their interpretation. Components may be coloured to impart information on component’s type, location, or state, e.g. to visually differentiate between a protein and a complex, to denote cellular location or denote a component’s expression level. In addition process nodes, Boolean operators, compartments and edge annotations are generally coloured to improve the visual impact of the diagram. However, it must be stated that the exact choice of colours is down to individual taste and colour recognition capabilities and the mEPN scheme has been designed to work even in the absence of colour.

Depiction of Biological Processes

A process node in the context of this notation system can be defined as a specific action, transformation, transition or process occurring between components or to a component and is represented by a process node. Process nodes impart information on the type of process that is associated with transformation of a component from one state to another or movement in cellular location. They also act as junctions between components and as such may have multiple inputs or outputs to or from components. All process nodes are represented by a small circular glyph and the process they represent is indicated by a one-to-three letter code. Colour has been used as a visual clue to group processes into ‘type’ but is not necessary for inferring meaning. There are currently 31 process nodes recorded under the mEPN. Different process nodes generally have different network connectivity. For instance, a process node depicting a component’s translocation from one compartment to another will generally only have one input and output edge (Fig. 6.2a). In contrast a ‘binding’ node will have multiple inputs and one output (Fig. 6.2b); the opposite is true for a dissociation node (Fig. 6.2c). Process nodes also act as a way of collating information about a given event; for example, protein X may be converted from one state to another by a process activated by protein Y (Fig. 6.2d). However, this process may also be inhibited by such a protein (Fig. 6.2e).
Fig. 6.2

Depiction of basic concepts in pathway biology using the mEPN scheme. (a) Depiction of the transition of a component from one location or state to another, e.g. the translocation of a protein from the cytoplasm to the nucleus or transcription/translation of a gene to protein. (b) Binding (association) of two proteins to form a complex. (c) Dissociation of a complex into its constituent parts. (d) Activation of the transformation of one component by another. (e) Inhibition of the transformation of one component by another. (f) Activation of the transformation of two components by another. (g) Absolute requirement (co-dependency) of two components for the activation of a process. (h) Requirement of either of two components for the activation of a process. (i) Activation of the transformation of one component by another that requires ATP. (j) Depiction of a ‘conditional gate’ that indicates the start of potentially multiple alternative pathway outcomes which are dependent on other factors. The main octagon is labelled with the process name, e.g. G1 to S phase checkpoint, and the other smaller octagons are used to denote the factors that influence progression down one pathway or another

Boolean Logic Operators

Components in a pathway are dependent on each other. For example, if a process requires X and Y to be present for it to proceed, perhaps because they are independently acting cofactors in a given reaction then the process will not proceed unless both are present. Alternatively, if a given process can be catalysed by either X or Y, then the process will proceed if either component is present. Such dependencies can be captured using Boolean logic operators which are used to define the relationships between multiple inputs into a process. An ‘AND’ operator is used when two or more components are required to bring about a process, i.e. an event is dependent on more than one factor being present (Fig. 6.2f). In modelling of flow-through networks these act in a similar manner to ‘bind’ process nodes, i.e. all inputs must be present before a product is formed or reaction proceeds. In contrast an‘OR’ operator is used when one component or another may orchestrate the same change in another component (Fig. 6.2 g). For instance multiple kinases, e.g. MAP2K3, MAP2K6, MAP2K7, may catalyse the phosphorylation of p38 (MAPK14) and are therefore shown connecting with p38 via an OR operator. OR operators have also occasionally been used to infer that a component(s) can potentially lead to multiple outcomes.

Depiction of Other Concepts

There are a number of glyphs that represent concepts that do not sit neatly under the headings of being a component, a process or logic operator. These include the following:

Energy/molecular transfer nodes are used to represent simple co-reactions associated with or required to drive certain processes (e.g. ATP→ADP, GTP→GDP, NADPH→NADP+). They are linked directly to the node representing the process in which they take part (Fig. 6.2h).

Conditional fates are used where there are potentially multiple gates of a component and the output is dependent on other factors such as the component’s concentration and time or is associated with a cellular state (Fig. 6.2i). These have been used to depict events such as the checkpoint controls in the cell cycle where the decision to go on to the next phase cell replication is under the control of a number of factors and two or more outcomes are possible. Another example is where cholesterol, depending on its intracellular concentration, may either be exported out of the cell or trigger the cholesterol biosynthetic pathway.

Pathway modules define complicated processes or events that are not otherwise fully described. Examples include signalling cascades, endocytosis and compartment fusion. They are a short-hand way of representing molecular events that are not known, not recorded or not shown.

Pathway outputs detail the cumulative output of series of interactions or function of an individual component at the end of a pathway. Pathway outputs are shown in order to describe the significance of those interactions in the context of a biological process or with respect to the cell. The input lines leading into a pathway output node have been coloured light blue to emphasise the end of the pathway description.

Depiction of Interactions Between Components and the Use of Edges

Interactions are depicted by edges, sometimes referred to as lines or arcs (a directional edge). They signify a relationship between components/processes in a pathway and convey the directionality of that interaction. The nature of an interaction is inferred through the use not only of process nodes and Boolean logic operators but also of edge annotation nodes. An edge annotation node is characterised as having only one input (with no arrowhead) and one output and functions to describe the type of activity implied by the line, e.g. activation, inhibition, catalysis (Fig. 6.2). A number of notation schemes use different arrowheads to indicate the ‘type’ of interaction but their use has been avoided in the mEPN scheme for several reasons; first, there is a limit to the number of different types of arrowheads which potentially fall below the possible number of biological concepts one may need to depict. Second, differentiating between different arrowheads is sometimes difficult when viewed at a distance. Third, few arrowheads are symbolic or indicative of the action they are designed to describe, requiring them to be committed to memory. Finally, multiple arrowhead types are not always supported by different network-editing/visualisation software. Interaction edges may be coloured for visual emphasis but as with nodes, the definition of meaning is not reliant on colour. However, in certain instances they can be used as distribution nodes, e.g. where one component activates many others such as with transcriptional activation of a number of genes by a transcription factor it can reduce the number of edges emanating from the transcription factor and therefore simply the representation (Figs. 6.2j and 6.3). Where separate depiction of modules belonging to the same component is desirable an undirected edge (no arrowhead) is used to denote a physical connection (bond) between two or more components.
Fig. 6.3

Example of a small pathway depicted using the mEPN scheme. Interferon B (IFNB) is a cytokine released from many cell types in response to immune stimulation. It homodimerises and binds to a cell surface receptor complex composed of the receptor proteins IFNAR1 and IFNAR2 and the intracellular kinases TYK2 and JAK2. The complex is composed of two of each of these proteins. Binding causes a conformation change in the complex resulting in the autophosphorylation of JAK1. Once activated the complex catalyses the phosphorylation of STAT2 which forms a heterodimer with STAT1. This complex then binds interferon regulatory factor 9 (IRF9) forming the complex often referred to as ISGF3 and translocates to the nucleus. Here it binds to the ISGF3 element in the promotor of a number of genes including IRF2, IL12B, STAT1, IL15, TAP1, GBP1, PSMB9, initiating their transcription. For a more detailed view of this and other immune-related pathways, see Raza et al. (2008)

Cellular Compartments

Pathway components exist in different cellular compartments. A cellular compartment can be a region of the cell, an organelle or cellular structure, dedicated to particular processes and/or hosting certain subsets of components, e.g. genes are found only in the nuclear compartment. In principle a subcellular compartment can be any size or shape. Compartments are defined by a labelled background to the pathway and arranged with spatial reference to cell structure. Compartments are coloured differently for emphasis. Similar or related compartments are shown to share the same fill colour but different coloured perimeters. This has been used to differentiate between different but related compartments, e.g. different classes of vesicles derived from the endoplasmic reticulum or plasma membrane.

6.3 Collation of Information and Pathway Assembly

The assembly of a pathway diagram is an extraordinarily interesting and informative exercise. The act of converting text-based information into a visual resource forces one to understand the information that is being presented to a level that the mere reading of an article never requires. When presented with a long textual description of a process involving numerous components all interacting through a complex series of events, it is easy to read about them but far more difficult to construct an accurate picture of them in the mind’s eye. Furthermore, the semantics of the written word does not always make sense when drawn, at least not when done in a logical fashion. The art of pathway construction therefore relies on the ability to convert numerous textual descriptions where different words may be used to describe the same or similar processes between multiple components which in turn may or may not be designated the same name into a concise and unambiguous model of events.

When embarking on the construction of pathway diagram there is a need to define the specific areas that are of interest to you. This sounds obvious but in reading the literature on one system, it is common to find that other systems are discussed (the one big network scenario) and it is easy to stray from the area of original interest. This in itself is not a problem and indeed part of the learning exercise, as long as the area covered has been documented correctly before moving on. The danger is that after a mapping exercise has been ‘completed’ what results is a sketch covering many components in related systems, where the relationships between them have not been documented to a sufficient level of detail to render the diagram truly useful or informative. It is therefore better to aim for quality over quantity when engaging in this activity. It is also true that what makes sense to the pathway curator does not necessarily make sense to another individual. Great emphasis should therefore be placed on the need to discuss and justify the information represented to others. If the knowledge gained by the curator cannot be communicated clearly and effectively, then they have not done their job properly. Pathway content, adherence to the notation system and layout should always be assessed by others to ensure that the graphical depiction of pathway/interactions is intelligible and unambiguous to another individual familiar with the notation scheme. Ideally the work should also be inspected by those intimately familiar with the field of research that one is attempting to depict; this is always a good test of the accuracy and completeness of the information.

The best source of information about pathways is buried in the primary literature. However, the amount of pathway information that can be gleaned from any one paper is generally limited as a given piece of work will tend to focus only on one or a small number of components and their interacting partners. It is therefore advisable to spend some time gaining a high-level view of any given pathway or system of interest. Internet searches for images of the pathway or specific complexes within it provide a framework for understanding of the pathway of interest. Pathway databases such as Reactome or Kegg (Kanehisa and Goto 2000; Vastrik et al. 2007) can be used to gain a high-level view of the pathway. Interaction databases, e.g. String, IntAct, Ingenuity, HPRD or Bind (Alfarano et al. 2005; Hermjakob et al. 2004; Jensen et al. 2009; Mishra et al. 2006) might also be used to gain a view of molecular interactions of a given component. Our experience, however, has been that such resources present such a generic network view of pathways and often capture seemingly erroneous interactions, thereby limiting their utility for this purpose. One of the best starting points is literature reviews. Whilst they frequently discuss information of limited use to pathway construction, e.g. concerning protein structure, evolution of protein families, high-level concepts, they frequently provide graphical depictions of subsystems and are an excellent portal into the primary literature. The point is not to get too involved too early but to take snapshots of the current understanding of the system and construct a framework of understanding and sources of available information prior to going into detail.

During the course of pathway mapping exercise many papers will be read and snippets of information will be mentally recorded concerning all aspects of pathway biology. It is important to have mechanisms in place that allow the curator to record this information and its source, otherwise all this information will be lost. Evidence to support an interaction derived from the primary literature (and reviews) must be recorded in an interaction table. This must include the identity of interacting partners, the direction of the interaction, e.g. HGNC1 → HGNC2, the type of interaction (phosphorylation, cleavage), method by which the interaction was determined, PubMed ID of the paper reporting the interaction and site of specific change of state, e.g. phosphorylation of serine 123. Of course more than one paper may be used to support the same interaction and arguably two or more references are preferable to a single work reporting an interaction. Indeed no interaction should be included within the pathway without published evidence to back it up. An example of a pathway interaction table is shown in Table 6.1. Additional notes and hyperlinks to external databases are also useful in linking additional information on the biology depicted. Graphml files support this activity and pathway diagrams may include URL links to Entrez Gene (or other database of choice) for each protein or gene component in the pathway. Furthermore, component descriptions obtained from databases, PubMed IDs and textual descriptions can be included and stored on appropriate edges or nodes. These can be accessed under the properties description tab for nodes or edges or appear when hovering over a node or edge, thereby supplementing what is shown graphically.
Table 6.1

Example of the information that should be stored when recording an interaction associated with the construction of a pathway

 

Interaction no.

1

2

3

4

5

6

Interacting partner 1

Official gene symbol

ATM

ATM:IKBKG

ATM:IKBKG

BCL2

CDC37

CHUK

 

Gene ID

472

472:4214

472:4214

596

11140

1147

 

Interactant type

Protein

Complex

Complex

Gene

Protein

Protein

 

Interactant as on map

ATM[P]

ATM[P]: IKBKG[P][Ub]

ATM[P]: IKBKG[P][Ub]

BCL2

CDC37

CHUK

Interacting partner 2

Official gene symbol

IKBKG

CHUK

ERC1

NFKB1(p50): NFKB1(p50)

HSP90AA1

CHUK

 

Gene ID

4214

1147

23085

N/A

3320

1147

 

Interactant type

Protein

Protein

Protein

Complex

Protein

Protein

 

Interactant as on map

IKBKG[SU]

CHUK

ERC1(ELKS)

NFKB1(p50): NFKB1(p50)

HSP90AA1

CHUK

Interaction type

 

Binding

Binding

Binding

Activation

Binding

Binding

Interaction location

 

nucleus

cytoplasm

cytoplasm

nucleus

cytoplasm

cytoplasm

NCBI-PubMed ID

 

16497931

16497931

16497931

14668329

15371334

15145317

As a final note on pathway construction, it should be emphasised that the visualisation of specific events as well as overall layout of a diagram is everything in ensuring the pathway’s usability. Under the mEPN system each step in a given process is explicitly depicted. For example, if the activation of a given signalling pathway requires the receptor complex to go through a series of changes, e.g. binding, phosphorylation or dissociation events following ligand binding, then each intermediate stage should ideally be shown (see Fig. 6.3). Whilst this can make the depiction of events long-winded it accurately reflects what is known and may ultimately be important in understanding the pathway’s regulation. Another important rule is that although a given pathway component can play a role in numerous different processes it may only be represented once in any given cellular compartment. Whilst this rule can potentially lead to a tangle of edges due to certain components possessing numerous connections to other components spread across the pathway, the benefits of the rule outweigh the issues in adhering to it. The number of edges leaving each node gives the reader an exact indication of a component’s interactions with other components and hence potential activity, without the need for scanning the entire diagram to find other instances where the component is described. A component may, however, be shown more than once in a given cellular compartment if it changes from one state to another, e.g. from an inactive form to an active form, in which case both forms are represented as separate components.

As a general rule nodes (components, processes, operators) and edges (interactions) should be drawn in such a way as to make the diagram compact with about a minimum of crossing over, changes in direction of edges and length, i.e. edges should be easy to follow. Hierarchical relationships between components should be shown in the layout of interactions. In order to do this an orientation of pathway flow is chosen, e.g. left to right or top to bottom and where possible should be maintained throughout the diagram. Ideally the direction of interactions should follow the ‘flow’ through the pathway, although it is appreciated this becomes more difficult in larger diagrams. A certain degree of consistency should also be aimed for when depicting components and their interactions, e.g. components should be depicted using nodes of a similar size, similar pathway relationships should be drawn in a consistent manner. Visual clarity relies on a ‘clean’ layout of pathways and whilst there are a number of automated algorithms available for network layout, they are currently no substitute for a curator with an attention to detail and an artistic eye.

6.4 Summary

The networks of molecular interactions that underpin cellular function are highly complex and dynamic. The topology, behaviour and logic of these systems, even on a relatively small scale, are far too complicated to understand intuitively. Formalised models provide a possible solution to the problem. However, the challenge of creating models that reflect our current understanding of these systems and display this information in an intuitive and logical manner is not trivial. The task of constructing pathway diagrams is time consuming and laborious involving many hours of work. On the other hand, it summarises the results of investigations that may have taken many thousands of hours of time to perform and it is difficult to envisage how one could précis such a body of work in any other meaningful way. The act of creating a pathway model forces you to formalise what you know about a system and justify it using appropriate sources. It allows you to explore the nature of relationships that might have existed as mental picture but the need to graphically depict them in a formalised way is in itself highly informative. As well as defining what you do know about a system, it is equally useful in defining what you do not.

The mEPN scheme described here provides a system where pathways can be represented in a logical, unambiguous and biologist-friendly fashion, whatever the system of interest. What we would like to see and believe is essential is the support of the wider community in assembling and editing such diagrams. Such efforts are underway (Pico et al. 2008; Schaefer et al. 2009; Vastrik et al. 2007) and are already providing a vital forum for debate on the known details of pathways in different cell systems. Ideally these efforts will result in detailed models of biological systems that can be shared and assimilated. However, in order to achieve this end pathway models clearly need to be assembled using standard rules and graphical languages. We therefore hope our work will contribute to the ongoing community effort to develop such standards (Le Novère et al. 2009).

To gain a systems-level view of these pathways is to gain an insight into the molecular networks that regulate normal function and whose malfunction underpins disease pathology. Greater understanding of the overall architecture of the pathways and their susceptibility to deregulation by disease-causing agents should ultimately lead to new strategies and targets for therapeutic intervention. For my group the creation of pathway models has provided a resource for training, literature/data interpretation, computational pathway modelling and hypothesis generation. As such the approach is now central to our ongoing investigations of macrophage biology and has transformed the way we think about these cells and our interpretation of results of investigations into their immune biology.

References

  1. Alfarano C, Andrade CE, Anthony K, Bahroos N, Bajec M, Bantoft K, Betel D, Bobechko B, Boutilier K, Burgess E, Buzadzija K, Cavero R, D'Abreo C, Donaldson I, Dorairajoo D, Dumontier MJ, Dumontier MR, Earles V, Farrall R, Feldman H et al (2005) The biomolecular interaction network database and related tools 2005 update. Nucleic Acids Res 33:D418–424PubMedCentralPubMedCrossRefGoogle Scholar
  2. Antonov AV, Dietmann S, Mewes HW (2008) KEGG spider: interpretation of genomics data in the context of the global gene metabolic network. Genome Biol 9:R179PubMedCentralPubMedCrossRefGoogle Scholar
  3. Arakawa K, Kono N, Yamada Y, Mori H, Tomita M (2005) KEGG-based pathway visualization tool for complex omics data. In Silico Biol 5:419–423PubMedGoogle Scholar
  4. Babur O, Colak R, Demir E, Dogrusoz U (2008) PATIKAmad: putting microarray data into pathway context. Proteomics 8:2196–2198PubMedCrossRefGoogle Scholar
  5. Bader GD, Cary MP, Sander C (2006) Pathguide: a pathway resource list. Nucleic Acids Res 34:D504–506PubMedCentralPubMedCrossRefGoogle Scholar
  6. Calzone L, Gelay A, Zinovyev A, Radvanyi F, Barillot E (2008) A comprehensive modular map of molecular interactions in RB/E2F pathway. Mol Syst Biol 4:173PubMedCentralPubMedCrossRefGoogle Scholar
  7. Cassman M (2005) Barriers to progress in systems biology. Nature 438:1079PubMedCrossRefGoogle Scholar
  8. Cavalieri D, De Filippo C (2005) Bioinformatic methods for integrating whole-genome expression results into cellular networks. Drug Discov Today 10:727–734PubMedCrossRefGoogle Scholar
  9. Cook DL, Farley JF, Tapscott SJ (2001) A basis for a visual language for describing, archiving and analyzing functional models of complex biological systems. Genome Biol 2:RESEARCH0012Google Scholar
  10. Dahlquist KD, Salomonis N, Vranizan K, Lawlor SC, Conklin BR (2002) GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways. Nat Genet 31:19–20PubMedCrossRefGoogle Scholar
  11. Ekins S, Nikolsky Y, Bugrim A, Kirillov E, Nikolskaya T (2007) Pathway mapping tools for analysis of high content data. Methods Mol Biol 356:319–350PubMedGoogle Scholar
  12. Eungdamrong NJ, Iyengar R (2004) Modeling cell signaling networks. Biol Cell 96:355–362PubMedCentralPubMedCrossRefGoogle Scholar
  13. Freeman TC, Goldovsky L, Brosch M, van Dongen S, Maziere P, Grocock RJ, Freilich S, Thornton J, Enright AJ (2007) Construction, visualisation, and clustering of transcription networks from microarray expression data. PLoS Comput Biol 3:2032–2042PubMedCrossRefGoogle Scholar
  14. Freeman TC, Raza S, Theocharidis A, Ghazal P (2010) The mEPN Scheme: an intuitive and flexible graphical system for rendering biological pathways BMC Syst Biol 4:65Google Scholar
  15. Funahashi A, Matsuoka Y, Jouraku A, Morohashi M, Kikuchi N, Kitano H (2008) CellDesigner 3.5: A versatile modeling tool for biochemical networks. Proc IEEE 96:1254–1265CrossRefGoogle Scholar
  16. Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A, Margalit H, Armstrong J, Bairoch A, Cesareni G, Sherman D, Apweiler R (2004) IntAct: an open source molecular interaction database. Nucleic Acids Res 32:D452–455PubMedCentralPubMedCrossRefGoogle Scholar
  17. Accessed on June 1, 2010. http://www.biopax.org/. Biological Pathways Exchange
  18. Accessed on June 1, 2010. http://www.ingenuity.com/. Ingenuity Pathway Analysis.
  19. Accessed on June 1, 2010. http://www.yworks.com. yEd Graph Editor – yWorks the diagramming company.
  20. Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A, Cuellar AA, Dronov S, Gilles ED, Ginkel M, Gor V, Goryanin, II, Hedley WJ, Hodgman TC, Hofmeyr JH, Hunter PJ et al (2003) The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19:524–531PubMedCrossRefGoogle Scholar
  21. Janssen P, Audit B, Cases I, Darzentas N, Goldovsky L, Kunin V, Lopez-Bigas N, Peregrin-Alvarez JM, Pereira-Leal JB, Tsoka S, Ouzounis CA (2003) Beyond 100 genomes. Genome Biol 4:402PubMedCentralPubMedCrossRefGoogle Scholar
  22. Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, Bork P, von Mering C (2009) STRING 8–a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res 37:D412–416PubMedCentralPubMedCrossRefGoogle Scholar
  23. Joshi-Tope G, Gillespie M, Vastrik I, D’Eustachio P, Schmidt E, de Bono B, Jassal B, Gopinath GR, Wu GR, Matthews L, Lewis S, Birney E, Stein L (2005) Reactome: a knowledgebase of biological pathways. Nucleic Acids Res 33:D428–432PubMedCentralPubMedCrossRefGoogle Scholar
  24. Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30PubMedCentralPubMedCrossRefGoogle Scholar
  25. Kitano H (2002) Computational systems biology. Nature 420:206–210PubMedCrossRefGoogle Scholar
  26. Kitano H, Funahashi A, Matsuoka Y, Oda K (2005) Using process diagrams for the graphical representation of biological networks. Nat Biotechnol 23:961–966PubMedCrossRefGoogle Scholar
  27. Kohn KW (1999) Molecular interaction map of the mammalian cell cycle control and DNA repair systems. Mol Biol Cell 10:2703–2734PubMedCentralPubMedCrossRefGoogle Scholar
  28. Kwiatkowska MZ, Heath JK (2009) Biological pathways as communicating computer systems. J Cell Sci 122:2793–2800PubMedCrossRefGoogle Scholar
  29. Le Novère N, Hucka M, Mi H, Moodie S, Shreiber F, Sorokin A, Demir E, Wegner K, Aladjem MI, Wimalaratne SM, Bergman FT, Gauges R, Ghazal P, Kawaji H, Li L, Matsuoka Y, Villéger A, Boyd SE, Calzone L, Courtot M, Dogrusoz U, Freeman TC, Funahashi A, Ghosh S, Jouraku A, Kim S, Kolpakov F, Luna A, Sahle S, Watterson S, Wu G, Goryanin I, Kell DB, Sander C, Sauro H, Snoep JL, Kohn K, Kitano H. (2009) The systems biology graphical notation. Nat Biotechnol 27:735–741PubMedCrossRefGoogle Scholar
  30. Lloyd CM, Halstead MD, Nielsen PF (2004) CellML: its future, present and past. Prog Biophys Mol Biol 85:433–450PubMedCrossRefGoogle Scholar
  31. Luciano JS (2005) PAX of mind for pathway researchers. Drug Discov Today 10:937–942PubMedCrossRefGoogle Scholar
  32. Mishra GR, Suresh M, Kumaran K, Kannabiran N, Suresh S, Bala P, Shivakumar K, Anuradha N, Reddy R, Raghavan TM, Menon S, Hanumanthu G, Gupta M, Upendran S, Gupta S, Mahesh M, Jacob B, Mathew P, Chatterjee P, Arun KS et al (2006) Human protein reference database—2006 update. Nucleic Acids Res 34:D411–414PubMedCentralPubMedCrossRefGoogle Scholar
  33. Moodie SL, Sorokin A, Goryanin I, Ghazal P (2006) A graphical notation to describe the logical interactions of biological pathways. J Integr Bioinform 3:11Google Scholar
  34. Novere NL, Hucka M, Mi H, Moodie S, Schreiber F, Sorokin A, Demir E, Wegner K, Aladjem MI, Wimalaratne SM, Bergman FT, Gauges R, Ghazal P, Kawaji H, Li L, Matsuoka Y, Villeger A, Boyd SE, Calzone L, Courtot M et al (2009) The systems biology graphical notation. Nat Biotechnol 27:735–741PubMedCrossRefGoogle Scholar
  35. Nurse P (2003) Systems biology: understanding cells. Nature 424:883PubMedCrossRefGoogle Scholar
  36. Oda K, Kimura T, Matsuoka Y, Funahashi A, M. M, Kitano H. (2004) Molecular interaction map of a macrophage. The alliance for cellular signaling (AfCS) Research Reports, vol. 2, http://www.signaling-gateway.org/reports/v2/DA0014/DA0014.htm
  37. Oda K, Kitano H (2006) A comprehensive map of the toll-like receptor signaling network. Mol Syst Biol 2:2006 0015Google Scholar
  38. Pandey R, Guru RK, Mount DW (2004) Pathway Miner: extracting gene association networks from molecular pathways for predicting the biological significance of gene expression microarray data. Bioinformatics 20:2156–2158PubMedCrossRefGoogle Scholar
  39. Pavlopoulos GA, Wegener, A-L., Schneider, R. (2008) A survey of visualization tools for biological network analysis. BioData Mining 1:12PubMedCentralPubMedCrossRefGoogle Scholar
  40. Pico AR, Kelder T, van Iersel MP, Hanspers K, Conklin BR, Evelo C (2008) WikiPathways: pathway editing for the people. PLoS Biol 6:e184PubMedCentralPubMedCrossRefGoogle Scholar
  41. Pirson I, Fortemaison N, Jacobs C, Dremier S, Dumont JE, Maenhaut C (2000) The visual display of regulatory information and networks. Trends Cell Biol 10:404–408PubMedCrossRefGoogle Scholar
  42. Raza S, McDerment N, Lacaze PA, Robertson K, Watterson S, Chen Y, Chisholm M, Eleftheriadis G, Monk S, O’Sullivan M, Turnbull A, Roy D, Theocharidis A, Ghazal P, Freeman TC (2010) Construction of a large scale integrated map of macrophage pathogen recognition and effector systems. BMC Syst Biol 4:63Google Scholar
  43. Raza S, Robertson KA, Lacaze PA, Page D, Enright AJ, Ghazal P, Freeman TC (2008) A logic-based diagram of signalling pathways central to macrophage activation. BMC Syst Biol 2:36PubMedCentralPubMedCrossRefGoogle Scholar
  44. Reed JL, Famili I, Thiele I, Palsson BO (2006) Towards multidimensional genome annotation. Nat Rev Genet 7:130–141PubMedCrossRefGoogle Scholar
  45. Ruths D, Muller M, Tseng JT, Nakhleh L, Ram PT (2008) The signaling petri net-based simulator: a non-parametric strategy for characterizing the dynamics of cell-specific signaling networks. PLoS Comput Biol 4:e1000005PubMedCentralPubMedCrossRefGoogle Scholar
  46. Schaefer CF, Anthony K, Krupa S, Buchoff J, Day M, Hannay T, Buetow KH (2009) PID: the Pathway Interaction Database. Nucleic Acids Res 37:D674–679PubMedCentralPubMedCrossRefGoogle Scholar
  47. Thomas PD, Kejariwal A, Campbell MJ, Mi H, Diemer K, Guo N, Ladunga I, Ulitsky-Lazareva B, Muruganujan A, Rabkin S, Vandergriff JA, Doremieux O (2003) PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification. Nucleic Acids Res 31:334–341PubMedCentralPubMedCrossRefGoogle Scholar
  48. van Riel NA (2006) Dynamic modelling and analysis of biochemical networks: mechanism-based models and model-based experiments. Brief Bioinform 7:364–374PubMedCrossRefGoogle Scholar
  49. Vastrik I, D’Eustachio P, Schmidt E, Gopinath G, Croft D, de Bono B, Gillespie M, Jassal B, Lewis S, Matthews L, Wu G, Birney E, Stein L (2007) Reactome: a knowledge base of biologic pathways and processes. Genome Biol 8:R39PubMedCentralPubMedCrossRefGoogle Scholar
  50. Watterson S, Marshall S, Ghazal P (2008) Logic models of pathway biology. Drug Discov Today 13:447–456PubMedCrossRefGoogle Scholar
  51. Yeung N, Cline MS, Kuchinsky A, Smoot ME, Bader GD (2008) Exploring biological networks with Cytoscape software. Curr Protoc Bioinformatics Chapter 8:Unit 8 13Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.The Roslin Institute and Royal (Dick) School of Veterinary StudiesUniversity of EdinburghRoslinUK

Personalised recommendations