The growing catalogue of biological data includes information discovered by methods that detect interactions between different biological molecules. Some of these techniques are direct and experimental (e.g. yeast two-hybrid, chromatin-immunoprecipitation (ChIP)) while others are indirect, predictive and computational (e.g., phylogenetic profiling [1], protein binding prediction[2], and cis-element detection [3] and gene expression profiling) Instances of such interactions are the observed or predicted relationships between genes and proteins, and for the purpose of computational storage and analysis they can be represented as networks of functional association. Tools are needed which gather, display and facilitate analysis of these large data structures.

Interaction discovery techniques continue to emerge and evolve. Although they vary in accuracy, the confidence in any particular association is highest when made by a combination of measures [4]. What is true for the individual link in this case is also true for the network; that is, that the prediction of functional pathways and regulatory subunits in the cell is best accomplished by the combination of many measures of interaction, be they experimental or computational, between DNA, proteins, or any molecule in the cell. The results of Ideker and Thorsson, et al[5], Jansen, et al[6], and Yanai and DeLisi[7] suggest the potential value in combining multiple interaction types in analyzing global systems. In contrast to biological sequence databases, for which uses and applications are well established, the development of databases and associated tools for organizing, mining and analyzing molecular systems has begun relatively recently. To date, the focus in this rapidly evolving field has been mainly on tools for describing and visualizing experimental interaction networks [810], derived from gene expression and protein interaction data sources. Research on new methods, both computational and experimental, that describe associations among genes and proteins will continue to necessitate flexible data models that can grow to fit the needs of analysis and visualization. The broader problem of multiple data type integration will largely depend on the usefulness of these emerging data models.

Published databases such as BIND [11], KEGG [12], Predictome [13] and STRING[14] provide the conceptual platforms on which software for leveraging the full content of the interactome could operate. Early efforts in this area, such as the protein-protein binding databases DIP [15] and PathCalling [16], demonstrated usefulness in dynamic visualization of interaction networks, by allowing users to navigate among links in those particular data sets. Recent visualization and analysis tools such as Cytoscape [8], MintViewer [9] and Osprey [10] have expanded this concept. They include features for viewing and querying larger subsets of the interactome on a more global scale. The tools typically operate from the viewpoint of physical associations between proteins, or correlated gene expression, and include information that summarizes annotated functions, such as Gene Ontology (GO)[17] groupings, among subnetworks of linked genes or proteins. Missing from the current bioinformatics palette, though, is a generic interaction network tool capable of managing and analyzing the more abstract forms of interaction information that are available and regularly published. The VisANT tool, while not a complete solution in responding to this need, is nonetheless robust and useful for many different data types and analyses.

Important features that VisANT offers to the research community are (i) navigation of database-driven interaction and association networks, (ii) visual comparison, manipulation and storage of known networks and uploaded user-defined data, (iii) the ability to uncover orthologous networks, and (iv) the ability to perform exploratory data mining and basic graph operations on arbitrary networks and sub-networks, including loop detection, degree distribution (the distribution of edges per node) and shortest path identification between various component genes or proteins.



One of the major design goals is flexibility – both with respect to the assimilation of new types of data, and the need for evolving a graphical interface that can fit new techniques for describing biological networks. For example, if new computational methods are published for identifying cis-regulatory elements upstream of yeast genes, we want putative interactions derived from these methods to be easily compared with other interactions, such as those determined by chromatin immuno-precipitation (ChIP) assays. For some problems, users might be interested only in those experimentally determined interactions, such as protein-protein or protein-DNA binding – the physical interactome. Where experimental data is limited, or biased towards genes with well-understood function, research of gene networks can benefit from use of systematically derived interactions produced in silico[2]. The increasingly data-driven state of research biology suggests that analysis of high-throughput data is necessarily more exploratory than hypothesis-driven. VisANT is designed to allow a wide range of exploratory questions. A researcher interested in interaction data for the uncharacterized gene YLR089C in Saccharomyces cerevisiae, which has uncharacterized function, could explore the network of known interactions in different species for conserved features. Investigations of biologically directed questions, such as finding transcription factors putatively linked downstream of known receptor proteins, could be used in generating hypotheses regarding new molecular functions or modes of gene regulation [8, 18, 19].

Regardless of the motivating problem, users should be able to identify potentially meaningful features of networks such as shortest paths, dense nodes (i.e. nodes with a large number of connected edges), highly-connected subgraphs, or network motifs such as directed loops or feedforward loops which appear to be biologically important [2023].


We applied the Model-View-Controller (MVC)[24] design pattern for the architecture of VisANT. The tiered system separates the data abstraction (the process by which a particular data type is represented and stored, e.g., a two-dimensional adjacency matrix represents interactions between pairs of proteins) and retrieval layers from the presentation schema, which improves data integrity and increases flexibility. In particular, the tiered system allows us to put data control logic at the middle-tier to protect the data. Since the presentation of the stored data is separated from the data itself, users can modify the visual data (such as x, y coordinates, node size and labels) without modifying the data stored in the database, or make changes in any tier, without effecting the others, which makes the system easily maintainable and extensible.

The system is implemented using J2EE™ technology using a web service layer driven by the freely available Tomcat server. This data layer technology is both server- and platform-independent. It can be readily adapted to different computer systems and, with additional effort, other data sources. This enables other interaction databases to reuse VisANT as a visual analysis tool by implementing a web service layer with a database-specific Application Program Interface, and the VisANT data transportation format. Technical details of this implementation can be found in the source code of VisANT, as well as its user manual that is available at[25].

VisANT is accessed through its main web page,[26] using a compatible web browser and Java Runtime Engine (JRE). The visual tool has been tested on Netscape, Mozilla and Internet Explorer browsers, running Java (JRE 1.1 or greater) on Windows 2000/XP, Linux and Mac OSX. For the most reliable performance of the software, we recommend using a newer version of Java (JRE ≥ 1.4), freely available from Sun at[27]. The source code for VisANT is available on request.

Visual exploration of biological interaction network

The main interface of VisANT, the network visualization panel (Fig 1), displays a set of connected nodes or vertices corresponding to user selected gene IDs (the nodes) and the experimental methods that uncovered the connections (the connecting lines, or edges). Each vertex thus contains annotation information, and each edge stores the method used in assigning the link. Different experimental methods are captured on the screen by using edges of different color; consequently different edges can have different meanings. Some represent actual physical interactions between proteins (e.g. from yeast two-hybrid.); some connect a transcription factor to the protein encoded by the gene downstream of the regulatory sequence to which it binds (ChIP); others represent correlated functions (e.g. those determined by phylogenetic profiling[28].) Edges between transcription factors and the products of the genes they regulate are represented by arrows, to indicate causal direction. All other edges are currently undirected.

Figure 1
figure 1

Sample view of a VisAnt application. Displayed are connections in a segment of the MAPK regulatory network constructed by data from Lee et. al.,[29] (Brown lines with arrows, indicating binding of protein to DNA) and correlations in microarray experiments published by Hughes, et al[30] (green lines), as well as links established by protein-protein binding etc. Genes for membrane-bound receptors, and related pathway proteins and transcription factors linked by physical interaction and gene expression relation are shown. Protein/DNA is represented as the nodes. Red nodes represent proteins that are annotated in at least one KEGG pathway (the quick-tip of node STE12 indicates that it maps to KEGG pathway 04010). A "-" indicates that the node is fully expanded (i.e. all connections are shown) while the "+" indicates that some links have not yet been displayed. Correlations between nodal proteins are indicated by connecting lines (edges), different colors corresponding to different experimental methods.

A network is constructed by entering ORF IDs, GI numbers, or even KEGG pathway IDs for an arbitrary number of genes, and using data obtained by one or any combination of methods shown in the methods menu. Nodes corresponding to the selected genes will then appear on the screen, and by left clicking one or more times, they can be expanded into an increasingly complex set of interactions. Figure 1 is a screen shot of VisANT showing the connections in a segment of the MAPK regulatory network constructed by data from Lee et. al[29], and correlations in microarray experiments published by Hughes, et al. [30] VisANT algorithms find paths between receptors (STE2, SHO1, and MID2) and transcription factors (STE12, SWI4) in the MAPK network, revealing complex feedback relationships that possibly contribute to regulatory control in these pathways.

Additional functionality is supported by the Predictome database, which maintains look-up tables that store and associate synonyms and annotations for the same protein/gene, and which also facilitates the integrative analysis of the network with function, structure and sequence annotation. VisANT also provides functions to load user-defined interaction data with a single mouse-click, enabling easy comparison between different data sets. The number of viewable genes, proteins and interactions can range from few (as shown in Fig. 1) to thousands. To simplify and help filter the larger data sets, different layout algorithms combined with the built-in basic graph operations, such as closed loops, help to isolate network topology features that have potential biological implications. [2022, 31, 32]

The relaxing layout algorithms implemented in VisANT are all based on a similar core heuristic algorithm [33]which models a two-dimensional network of physical objects with mechanical forces operating along the edges. The source code for these algorithms is based on modifications of a layout program distributed by Sun Corporation [27]. Although the algorithms have no biological meaning, they successfully separate the graph by the density of the connections between subgroups of nodes, providing a visual method of identifying relatively dense subgraphs within larger networks. Additional graph operations are generally provided through the various filters whose functions are detailed in the user manual on the VisANT website.

Data integration

Our public VisANT implementation currently draws information from the Predictome database, based on data from 66 fully sequenced microbial genomes. Higher eukaryotes, including worm, human and mouse, are not yet supported, although we do include parsed versions of their genomes, so that networks orthologous to those in microbes can be mined. Computational methods in this database include phylogenetic profiling, gene fusion and gene proximity data. Experimental data drawn from publicly available data include protein-protein and protein-DNA interactions (S. cerevisiae), as well as gene expression correlation and association data. VisANT provides a general platform for the integrative research on interaction networks in the context of pathway, sequence, structure and associated annotation. Pathway data is provided by the KEGG database based on the KEGG Markup Language (KGML)[35] which is currently available only for metabolic pathways. The COG[36] database was used to provide homology information for relationships between species. Annotation information is drawn from KEGG and the Gene Ontology, and cross-referencing of genes and proteins to GenBank [37] and SwissProt is provided.

When an interaction/association is discovered by more than one method, the corresponding edge will be segmented with different colors corresponding to the methods. These colors can be customized using the built-in method table. An example can be found in Figure 2A, which shows that the interaction between DIG1 and FUS3 is recovered by three different experimental methods. Red nodes in Figure 2A indicate that they have been mapped to KEGG pathways. For example, quick-tips show that STE12 has been mapped to KEGG pathway 04010, while CHA1 is mapped to both KEGG pathways 00260 and 00272. KEGG pathways are directly referenced from within VisANT, with corresponding nodes highlighted as shown in Figure 2C. For S. cerevisiae proteins, annotation from SGD has also been referenced, as shown in Figure 2D. GenBank[37] sequence information has been referenced in similar fashion.

Figure 2
figure 2

Illustration of data integration in VisANT. (A) The MAPK related network constructed from receptors and transcription factors in the pheromone-response pathway. Purple rectangles demonstrate the quick-tip obtained by mouse-overs of the edge between DIG1 and FUS3, and the nodes CHA3 and STE12 respectively. Most integration data are available only after the node has been queried against the databases, and are available under the "Available Links" submenu of the node. (B) GenBank[37] record of human homology protein for CHA1 based on COG database. The homology information is available after the corresponding filter has been processed. (C) STE12 is mapped to KEGG pathway 04010 (MAPK Singling Pathway) and the pathway has been loaded with corresponding nodes highlighted. (D) Functional annotation of STE5 is loaded through the cross-reference in SGD[49] database.

Interaction networks of arbitrary genes in microbes can be projected onto groups of orthologous human genes, providing hypothetical relationships between human genes. This projection is based on the COG ortholog database, coupled with filters provided by VisANT. Figure 2A shows that CHA1 has two ortholog proteins in human (19923959 and 5803161, GI number) and Figure 2B displays the GenBank[37] record of human protein 19923959 directly referenced in VisANT.

Network storage and sharing

Visualization and comparison of different interaction networks (networks obtained with separate methods) is an important means of validation and understanding the relative contribution of different methods to functional understanding. VisANT allows users to enter customized data sets through the control panel as shown in Figure 1, and to overlay these data sets upon one another, or upon published datasets. Where multiple data sets based on similar methods have been published (e.g. yeast 2-hybrid screening in S. cerevisiae), the reference to each source is cited. The data format for user-specified data is simple tab-delimited, and can represent either directed or undirected associations. VisANT also provides password-protected saving of each customized graphical workspace to allow further analysis of a particular network at any time from anywhere on the internet. In addition, these individual workspaces can be securely shared, to promote collaboration within and among research groups.


Although networks and pathways can be visualized and navigated using clickable images [15], the data mining process requires more than visualization. Visual data mining is mediated by a collection of interactive methods that support exploration of data sets by adjusting parameters to see how they affect the information being presented. The functionalities provided by VisANT reflect this approach, especially as it applies to biological networks.

Both genome-wide and conventional interaction data can be noisy and error prone [38]. The integration of interaction data from various data sources is critical for improving the accuracy of these data [3842]. Data integration also requires the unification of heterogeneous data (such as expression data, sub-cellular localization information, and functional category etc.) into one general data model so that different analyses can be carried out easily. Clustering of gene expression, for example, may be guided by knowledge of protein localization, or participation of genes in the interaction network [43].

Other visual integration tools, such as Cytoscape [8], Osprey [10] and GenMAPP[44], are able to display varying aspects of physical interaction and expression data and relate this to functions and pathway annotation. VisANT differs conceptually from these tools in the notion that all such information – interaction, expression, function – can be represented and analyzed as a network. The dimensions of these networks can be very large, thus presenting a major and still incompletely met challenge for visual integration and computation. Table 1 summarizes the differences between the three programs.

Table 1 Comparison of VisANT against Cytoscpe and Osprey

Future Directions

The goal of the VisANT project is to provide a general platform for visually mining process-level annotation[45]. This annotation, sometimes called functional, relates the genome to cellular processes: growth, apoptosis, differentiation and so on. Our first step focuses on protein/gene interaction mining and visualization. As the interaction network turns to functional modules/pathways and networks, corresponding functions will be implemented to support further analyses, including simulation of cellular activities.

Specifically, VisANT enhancements in the near future will include the following:

  1. 1.

    Visualization and graph manipulation. The data model will be further generalized to represent different types of bio-objects and the interactions between them. Visual representation of nodes and edges will be enhanced and standardized [46, 47]. An immediate goal is to produce a data model and related functionalities that support abstract groupings and modularity based on function or experimental evidence, in order to facilitate the full integration with groupings such as KEGG pathways, GO annotations, and diverse objects such as protein complexes [23]. These groupings enable a more modular analysis of structure within interaction networks.

  2. 2.

    Inclusion of the full complement of KEGG pathways.

  3. 3.

    Support for higher eukaryotes including worm, human and mouse. Analysis and comparison of interactions across species will continue to be improved. Specifically, we are interested in the concept of cross-species mapping to facilitate direct comparison of the conservation of networks between different organisms.

  4. 4.

    The implementation of additional features for integrating data sources. For example, VisANT will be able to load microarray data either from standard databases (GEO [48] etc.), or from a user's local file. Third party open-source software, such as TM4 [34], will be integrated to enable direct analysis of expression data in context of mined networks.

  5. 5.

    VisANT's architecture will be further enhanced to enable pluggable parsers and filters, providing the flexible interfaces to facilitate the integration of heterogeneous data sources and third-party's analysis. Correspondingly, VisANT will be able to run as both a signed on-line java applet and standalone application.

We expect that these and other directions of VisANT will also be augmented and assisted by feedback from the research community.