Introduction

The microarray chip used for diagnostic purpose contain oligonucleotides specific to pathogens and may contain thousands of oligonucleotide probes. The probes designed for diagnostic assays are unique to a specific pathogen with respect to all other pathogen genomes and also to host and other non-specific genome sequence present in the clinical samples. As a result, the designing of pathogen-specific probes require computationally expensive comparison of target genomes with all known non-target sequences. Many methods have been developed for designing probes for pathogen diagnostic assays; some methods are intended for PCR-based assays, whereas others are intended for microarray-based assays [7]. Several groups have designed microarrays containing probes for microbial detection, discovery or a combination of both [2, 3, 5, 9]. The virochip discovery array was one of the first to target a broad range of pathogens; it is best known for its role in characterizing SARS as a coronavirus [10]. Chou et al. [1] designed conserved genus probes and species-specific probes covering 53 viral families and 214 genera. Palacios et al. [3] built the Greenechippm, an array targeting vertebrate viruses and rRNA sequences of fungi, bacteria and protozoa, containing approximately 30,000 probes. Viral probes were designed to target a minimum of three genomic regions for each family or genus, including at least one highly conserved region coding for polymerase or structural proteins, and two or more variable regions. Pan-Microbial Detection Array (MDA) [2] is the most comprehensive chip designed for virus and other pathogen identification. Chou et al. have computationally designed virus-specific and conserved probes for microarray-based diagnosis of viruses using a specifically designed algorithm. We had used conserved probes designed by them and found that some of them do not work experimentally and felt that there is need for a new dataset [1].

Materials and Methods

Preparation of the List of Animal Viruses

In this study, we report a new dataset of microarray probes for diagnosis of both viruses and virus genera. The viruses included in the database were compiled after exhaustive search and personal discussion with the virologist working in India in different institutes. Complete sequence of listed viruses was extracted from NCBI (National Center for Biotechnology Information) reference sequence viral database using a Perl script. In case of viruses with partial genomes known structural gene sequences were downloaded from NCBI nucleotide database.

Microarray Probe Design

Our aim to for designing microarray chip was to identify a virus at species level and to detect an unknown virus at genus level. The probe designing strategy is given flow diagram 1.

Development of Dataset

In the first phase of dataset development, the probe sequences were collected, arranged alphabetically and structured into three tables unique probes conserved probes (both computationally as well as experimentally verified probes) and rejected probes (passed in computational verification but failed experimentally); so that it could be compiled into a common database. The unique probes table is divided into two fields namely virus name and probe sequence. Similarly, conserved probes table has genus, Sub-group name and probe sequence. All this Information was compiled in MS EXCEL. In the current form, the web database has total of 20,619 unique and 3,988 conserved probes. In the phase two, structured relational database was designed for an easy access by the users of database. For designing the core relational database MS access 2007 was used. In the phase, three databases in MS access were integrated into HTML web pages so that the database could be accessed through out the internet. The dataset in the current form has the look shown in Fig. 1.

Flow diagram 1
figure 1

Designing strategy of oligonucleotides probes

Fig. 1
figure 2

The outlook of home page and dataset page

Result

The database contains probes for viruses, virus genera and a rejected probe list. The rejected probe list contains all those probes which have been experimentally shown to cross react across species but are computationally correct. These cross reactive or sticky probes have been reported by others also [4]. We have provided the list of cross reactive probes so that others can avoid using them in their probe dataset. This dataset was used to identify an unexpected case of Newcastle disease virus in sheep [8] and for identifying a mixed infection of Bovine viral diarrhea and Bovine herpesvirus in cattle [6]. The dataset can be accessed by user through the link https://dl.dropboxusercontent.com/u/94060831/avpds/HOME.html.