Introduction

The use of Quantitative Trait Loci (QTL) data is increasingly used to aid in the discovery of candidate genes involved in phenotypic variation. Tens to hundreds of genes, however, may lie within even well defined QTL. It is therefore vital that the identification, selection and functional testing of candidate Quantitative Trait genes (QTg) are carried out systematically, and without bias [1]. With the advent of microarrays, researchers are able to directly examine the expression of all genes on a genome wide scale, including those underlying QTL regions.

The scale of data being generated by such high-throughput experiments has led some investigators to follow a hypothesis-driven approach [2]. Although these techniques for candidate gene identification are valid, they run the risk of overlooking genes that have less obvious associations with the phenotype. By making selections based on prior assumptions of what processes may be involved, the genes that may actually be involved in the phenotype can be overlooked. A further complication is that the use of ad hoc methods for candidate gene identification are inherently difficult to replicate and are compounded by poor documentation of the methods used to generate and capture the data from such investigations in published literature.

With an ever increasing number of institutes offering programmatic access to their resources in the form of web services, however, experiments previously conducted manually can now be replaced by automated experiments, capable of processing a far greater volume of data. By reconstructing the original investigation methods in the form of workflows, we are now able to pass data directly from one service to the next. This enables us to process the data in a much more systematic, un-biased, and explicit manner.

Methods

We propose a data-driven methodology that identifies the known pathways that intersect a QTL and those derived from a set of differentially expressed genes from a microarray study. This methodology is implemented systematically through the use of web services and workflows. For the purpose of implementing this systematic pathway-driven approach, we have chosen to use the Taverna workbench [3].

Results and Discussion

Preliminary studies into the modes of resistance to African Trypanosomiasis were carried out for the mouse model organism. These studies illustrated how the large-scale analysis of microarray gene expression and QTL data, investigated at the level of biological pathways, enables links between genotype and phenotype to be successfully established [4]. This approach was implemented systematically through the use of explicitly defined workflows.