Introduction

Microarray technology is a promising technology used to study the expression level of various genes simultaneously or to genotype multiple regions of a genome in a particular cell type of an organism, at a particular time and conditions. This allows comparison of gene expression between normal and infected cells. The method involves placing thousands of gene sequences in known locations on a glass slide called a gene chip. Based on such easy implementation of a huge number of tests for identification of viral agents affecting humans, these tests are helpful for blood screening; they can also be used as a diagnostic tool, albeit imperfect as they provide only an indirect measure of infection but cannot advise clinicians regarding whether the infection is recent or in progress, or regarding response to therapy [52, 102, 104]. Antibody-based testing can also fail to detect current infection, because it usually takes a few days to weeks for the immune system to raise an antibody response to an infectious agent [56, 79, 125]. In the past, tests have been developed based on direct quantification of the infectious agent in a sample taken from the patient [41]. Such tests detect the presence of nucleic acids (the genetic material) of infectious agents in blood or other samples. The most common method is the polymerase chain reaction (PCR), which can detect 100 copies or more of an infectious agent in a single sample [29, 79]. PCR employs an enzymatic reaction to amplify specific nucleic acid sequences of infectious agents that may be present in the sample [59, 114]. There are several problems with this technique [78]: viral agents can mutate very quickly, and PCR primers may therefore not recognize the infectious agent, producing false or weak results [22]. The subsequent process is based on direct hybridization of the infectious agent’s nucleic acid to a synthetic nucleic acid probe [23, 62]. The hybridized infectious agent is then detected by extension using nonenzymatic methods. The main disadvantage of this technique is required an additional quantity of the patient’s sample (minimum 1 ml blood compared with 0.1 ml for PCR) [51, 70, 100]. Virus culture generally results in good specificity, but not all viruses can be cultured, and technical expertise is required to understand the cytopathic effect (CPE) and to read stained preparations. Also, this method is time consuming and labor intensive due to the long incubation period of some viruses, and it is very difficult to culture a variety of cell types at once [104, 110].

During the last two decades and today, viral diseases remain the main cause of mortality in humans. In recent times, the appearance of infectious diseases has become more serious, as represented by new pathogens such as re-emerging viruses causing acquired immune-deficiency syndrome, acute encephalitis syndrome, Hendra, Nipah, severe acute respiratory syndrome (SARS), and avian influenza [27, 28, 73]. Many factors contribute to the emergence of viral infections, potentially including genetic exchange or mutation, adaptation to new hosts or vectors, rapid transport, trade, migration of people, and changing values or lifestyles [8]. The resulting rapid epidemiological changes in the community may lead to both new and old viruses that can emerge and cause outbreaks at unexpected times and locations. The fight against such emergent viral infections requires the development of a comprehensive strategy.

Using conventional techniques to analyze gene expression, researchers are able to survey a relatively small number of genes at once [24]. Microarray profiling offers many potential advances in terms of diagnostic and therapeutic interventions in human diseases because of its unparalleled ability for high-throughput gene expression analysis. This technology provides powerful tools for the scientific community [15], and scientists are using microarray technology to try to understand fundamental aspects of growth and development as well as to explore the underlying genetic causes of many human diseases [48, 49]. However, the limitations of this technique are related in part to issues regarding the various methodologies and experimental designs, as well as difficulties in the interpretation of results. Despite these limitations, microarray technology has been used efficiently in disease diagnosis.

Acute Encephalitis Syndrome

Encephalitis means inflammation of brain matter. More than 100 different infectious pathogens and several toxins have been identified as causative agents of encephalitis, although in many cases no pathogen can be detected. Accurate etiological diagnosis is required to increase the usefulness of surveillance of acute encephalitis, especially in view of concerns about new and re-emerging infections. Viruses that infect the central nervous system (CNS) may selectively involve the spinal cord (myelitis), brain stem (e.g., rhomb encephalitis), cerebellum (cerebellitis), or cerebrum (encephalitis). Almost every acute viral CNS infection results in meningeal as well as parenchymal inflammation to varying degrees [12]. Fundamental clinical and laboratory findings are mainly similar despite the different causative agents, consisting of fever and headache in addition to distorted cerebral position, frequently accompanied by seizures and central neurologic abnormality. Cerebrospinal fluid (CSF) is abnormal in >90 % of cases, characteristically showing lymphocytic pleocytosis, slightly elevated protein level, and normal glucose [113]. In unusual diseases, such as West Nile virus (WNV) meningoencephalitis or cytomegalovirus (CMV) radiculomyelitis, polymorphonuclear cells rather than lymphocytes could be the principal cell type, providing useful diagnostic evidence. Nevertheless, in spite of these variations, standard CSF study rarely leads to exact identification of the etiologic agent [11].

A clinician may be faced with a patient of any age, at any time of year, with acute-onset fever and altered mental status including symptoms such as confusion, disorientation, coma, and inability to talk, or new-onset seizures (excluding simple febrile seizures) [61]. Other symptoms may include an increase in irritability, abnormal behavior greater than that seen with usual febrile illness, and altered mental status, enabling initial differentiation of encephalitis from noninfectious causes of brain dysfunction (encephalopathy) [11, 41, 97].

A huge number of tests to detect viral pathogens are available, but they have not been validated for identification of divergent viruses using traditional methods of assay gene expression; researchers can therefore identify only a relatively small number of genes at once [24]. Cell cultures using the traditional tube method can be used for isolation and detection of a wide variety of viruses, including unanticipated agents, mixed-culture antiviral susceptibility testing, serotyping, and epidemiologic studies. They offer increased sensitivity over rapid antigen tests, but require a long incubation period for some viruses as well as acquisition and maintenance of a variety of cell culture types in-house. Shell vials with centrifugation can also be used, but reading of pre-CPE stained preparations is both time consuming and labor intensive. Also, unanticipated agents may be missed when pre-CPE staining targets only one or a few viruses, and isolates from fixed/stained vials are not available. Nonculture antigen detection using Immunofluorescence (IF) monoclonal antibodies (mAbs) takes 40 min per sample, generally offering good sensitivity (which varies with the virus detected) and excellent specificity. CMV antigenemia testing is more sensitive than traditional or shell-vial culture for CMV in blood, but generally not as sensitive as cell culture. Also, it requires expertise for reading and is not useful for all viruses; flavivirus sensitivity is especially poor. Non-IF antigen detection requires 30 min per sample, offering generally good specificity for respiratory syncytial virus (RSV) and influenza A and B viruses. No special technical expertise is required, and results are available very rapidly, enabling application in point-of-care testing. However, it generally offers poor sensitivity compared with cell culture and is currently only available for RSV and influenza A and B viruses. Additional testing of negative samples by cell culture is recommended, the most common method being by PCR, which can detect 100 copies or more of an infectious agent in a single sample [79]. PCR uses an enzymatic reaction to amplify specific nucleic acid sequences from the infectious agent if they are present in the sample [114]. There are several problems with this method [78]. PCR uses specific nucleic acid sequences (primers) from a known sequence of the infectious agent [55, 88]. Therefore, if the infectious agent has not been sequenced, PCR cannot be used. Similarly, if the infectious agent mutates very rapidly, the primers may not recognize the infectious agent and a false-negative test will result [21, 22]. This is a major problem with detection of viruses, which undergo very rapid mutation, especially in response to drug treatment [14, 27]. Since PCR uses an enzymatic reaction, the enzyme can be inhibited by impurities in the patient sample, also leading to false-negative results [7]. In addition, the short sequences of primers are only specific to an infectious agent at definite temperatures, making the test reliant on very strict conditions [117]. The specificity of the primers also makes it difficult to detect more than one agent simultaneously in a single PCR reaction [19, 46]. Multiplex PCR reactions exist, but generally they are not quantitative and can only detect two or a maximum of three agents concurrently [6].

Detection of multiple viruses causing acute encephalitis syndrome (AES) is important for global analysis of gene content and expression, opening prospects for new molecular and physiological systems to control pathogenic diagnosis. Early diagnosis is crucial for disease treatment and control, as it reduces inappropriate use of antiviral therapy and focuses surveillance activity [85, 88]. This requires the ability to detect and accurately diagnose infection at or close to the source/outbreak with minimum delay and the need for specific, accessible point-of-care diagnosis able to distinguish causative viruses and their subtypes. None of the available viral diagnostic assays combines a point-of-care format with the complex capability to identify a large range of viruses causing AES. Biomedical research evolves and advances not only through the compilation of knowledge but also through the development of new technologies [9]. Microarray detection provides a useful, labor-saving tool for detection of multiple viruses, offering several advantages such as convenience and prevention of cross-contamination. Microarray technology aims to monitor the whole genome on a single chip, providing researchers with a clearer picture of the interactions among thousands of genes simultaneously [25]. This represents a major methodological advance and illustrates how the advent of a new technology can provide powerful tools for research [15].

Microarrays can help answer gradually more multifaceted questions and execute additional complicated experiments. Researchers may be able to infer possible functions of new genes based on similarities in expression profile compared with known genes [32, 115]. Finally, such studies increase the number of accessible gene families, providing a novel guide for synchronized gene expression across gene families as well as for completely new groups of genes. In addition, since discovery of any gene usually interrelates with that of many others, knowledge of how these genes are organized can be improved through such analyses, and exact information of these interrelationships will emerge. Use of microarrays may also accelerate identification of genes involved in the progress of various diseases by enabling scientists to examine a much larger number of genes [104]. This technology will also assist with assessment of gene expression and function at cellular level, illuminating how manifold gene products work mutually to construct substantial chemical responses to both static and changing cellular needs. Scientists may use microarray technology to try to understand fundamental aspects of growth and development as well as to explore the underlying genetic causes of many human diseases [114, 122].

Principles of Microarray Technology

Microarray technology is based on hybridization of two DNA strands formed of complementary nucleic acids linked by tight noncovalent bonds. In the hybridization technique, after washing, nonspecific bonding sequences are removed while only strongly paired strands will remain hybridized. Consequently, fluorescently labeled target sequences designed to bind to a particular probe sequence create a signal which depends on the potency of the hybridization and the number of paired bases, the hybridization conditions (such as temperature), and washing after hybridization. The overall potency of the signal from a spot depends on the amount of test sample bound to the corresponding probes at that location [84]. Hybridized targets can then be detected using one of many reporter-molecule systems. Microarrays use relative quantization in which the intensity of a feature is compared to the intensity of the same feature under a different condition, and the identity of the feature is known by its position [127].

Types of Microarray

Two types of DNA microarray are widely available for data analysis: complementary DNA (cDNA) arrays and oligonucleotide arrays (Fig. 1).

Fig. 1
figure 1

Microarray assignment workflow

cDNA Arrays

This type of chip offers a high-density microarray, most often being derived from cDNA (hence the name). Such chips are usually made by robotically spotting onto a large glass surface [76, 77]. Hybridization is carried out using fluorescently labeled messenger RNA (mRNA) corresponding to the cDNA, and hybridized duplexes are identified by the color fluorescence detection method. Thus, they can be used to study gene expression patterns in time and space. If a gene is overexpressed in a particular disease state, then more sample cDNA, as compared with control cDNA, will hybridize to the spot representing that gene. In turn, the red fluorescence of this spot will have greater intensity than its green fluorescence [22]. Once the expression profile of different genes involved in a disease has been distinguished, cDNA derived from a diseased sample taken from any person can be hybridized to conclude whether the expression profile of the gene from the individual corresponds to the expression profile of a known disease [113].

Oligonucleotide Arrays

In these arrays, the expression strength of a gene is assessed by means of a probe set consisting of 11–20 individual probe pairs. In most recent gene chips, the number of probe pairs has stabilized and is now 10–13 [27, 28, 96 ]. Every probe pair includes an ideal-match 22–25-mer oligonucleotide probe, which is planned to hybridize exclusively to a unique gene transcript, and a variance probe of the same length, which varies from the ideal match probe by a single base in the core of the sequence [46, 55, 118]. The aim of the variance probe is to calculate unfocused hybridization. Probe position algorithms created by Affymetrix read the signals from each 22–25 oligonucleotide probe set to obtain the particular values as a hybridization blueprint of the 22–25 probes. Each ideal match probe has a corresponding mismatch probe which contain the same 22–25 bases long sequences as the ideal match probe, except for the fact that the middle base (11–13) in the chain is substituted for the compliment of the 11–13th base of its consequent ideal match probe. This is meant to give an estimate of non-specific binding, which occurs when m-RNA that not targeted binds to ideal match [60, 94, 95]. In addition it has been exposed to the sign perverse by the ideal match probes not completely unfocused signal. In particular when a transcript is there at far above the ground levels, the label point strength also hybridizes to the variance probe [113, 114]. However, observed expression levels also include variation that is introduced during the process of carrying out the experiment, which could be classified as obscuring variation [57]. Affymetrix has approached the normalization problem by proposing that intensities should be scaled so that each array has the same average value. The distribution of probe intensities is the same across a set of arrays [80, 89, 96]. Propose arametric and non-parametric methods to achieve this. All these approaches depend on the choice of a baseline array. Currently it is up to the researcher to decide on the most significant result for their particular intention for high sensitivity and low variability, or a low false-positive rate [66].

Methods

Briefly, a limited sequence is immobilized on the microarray surface and binds to the target RNA during hybridization. The captured target is labeled with an additional fluorophore-conjugated DNA oligonucleotide (specifically, the label sequence). Positive control spots, in which a capture sequence hybridizes directly to a complementary label sequence, are included to aid visual analysis. After hybridization and precise washing, the microarray is scanned in a laser-based fluorescence scanner with 5-μm resolution [8].

Sequence Selection and AES Chip Microarray Design

AES virus-specific capture and label sequences may be selected [97]. The possibility of false-positive signals resulting from direct hybridization of label sequences to capture sequences is examined by incubation of the label sequences in the absence of any other nucleic acids at room temperature for 2 h in standard hybridization buffer. Capture sequences found to exhibit cross-reactivity with label sequences are removed from the array layout, along with the corresponding label sequence, and the arrays are reprinted [33]. This process is repeated until the microarray demonstrates no false-positive signals in the absence of viral RNA. The resulting array contains capture sequences and the corresponding label sequences. Each capture sequence is spotted in triplicate, and a single limited sequence with a complementary fluorescence-labeled sequence in solution is used as a positive control on each array. The positive control serves as a direct indication of whether the hybridization conditions are adequate and also as a spatial marker for ease of presentation [40, 53, 60, 67].

Microarray Slide Preparation

The substrate used for these studies comprises an OmniGrid microarray spotter with solid core pins with 550 μm pitch between spots.

Samples

Viral samples can be purified from whole blood, plasma, serum, throat swabs, cerebral spinal fluid, virus-infected supernatants, and other cell–free body fluids [101].

Chip Processing

Figure 1 shows a schematic of the dispensing protocol. The details of each processing step are described below.

Nucleic Acid Extraction

Nucleic acids may be extracted from clinical samples by using a nucleic acid purification kit, omitting RNase digestion, or manually by triazle methods according to the manufacturer’s recommended protocols. RNA is bound to an advanced silica gel membrane under optimal buffering conditions [39, 97]. A simple two-step washing protocol ensures that PCR inhibitors such as proteins or divalent cations are completely removed, leaving high-quality RNA to be eluted in Milli-Q water [12]. RNA purification processes are generally performed by TRIzol or using a kit method for degradation of the omnipresent RNases and DNA contamination from genomic DNA in the source material [79]. Purification from a viral source carries with it the possible additional challenge of low or varying viral titers. In addition to looking for a kit that can handle low titers as well as overcome the other traditional challenges, users must find a method that is easy to use and that ensures that the typical concentration of extracted RNA meets the requirements of downstream applications [61].

Primer Design, Probes, and Arrays to Confirm the Identity of Viral Strains

The most critical aspect of successful PCR is primer design. All things being equal, a poorly designed primer can result in a PCR reaction that will not work [47, 48]. The primer sequence determines numerous parameters such as the length of the product, its melting temperature, and ultimately the yield [4, 42]. A poorly designed primer can result in little or no product due to nonspecific amplification and/or primer–dimer formation, which can become competitive enough to suppress product formation [13, 19, 111]. This subsection provides rules that should be considered when designing PCR primers. More comprehensive coverage of this subject can be found elsewhere [23, 26]. Several variables must be taken into account when designing PCR primers [126, 127]. Among the most critical are: primer length, melting temperature (T m), specificity, complementary primer sequences, G/C content and polypyrimidine (T, C) or polypurine (A, G) stretches, and 3′-end sequence; each of these critical elements is discussed in turn [121, 127, 129].

Practically all viral strains carry a unique DNA sequence that differentiates it from other strains, and this sequence can be used for probe design [1, 35]. The probe DNA binds specifically to the target gene corresponding to the viral strain prepared from a clinical sample [31, 53, 120]. Using a variety of such probes, various pathogens in a clinical sample can be determined in a single trial. The DNA probe representing the selected target should be designed considering various aspects such as probe length, GC content, molar concentrations, self-hybridization possibilities, and a limit on the number of single-nucleotide repeats [60, 63]. The melting temperature, secondary structure, and binding position in the target DNA are factors that can affect the signal intensity, specificity, and sensitivity [66, 71]. Typical parameters for such DNA probes include the following: minimum length of 35 and maximum up to 40 bases, melting temperature minimum (T min) of 70 °C and maximum (T max) of 75 °C, and GC content of 45–50 % [86, 92, 93].

Exceptional to the limits of DNA-DNA hybridization models, determining the array equal to the most favorable DNA-DNA duplex on a microarray is sturdy [69, 116]. Computationally, the best possible arrangement among two DNA sequences should be clearly described in terms of a generalized, accurate distance algorithm [19, 94, 123]. Enormously set, the accurate impassiveness between two sequences correspond to the whole number of insertions, deletions, and substitutions that are needed to transform one sequence into the other. From the standpoint of DNA cross-hybridization, a substitution corresponds to a mismatched pair of nucleotides whereas insertions/deletions correspond to gaps in the duplex DNA–DNA [23, 26]. The lower the number of mismatches and gaps in the alignment, the smaller the edit distance. On the other hand, exact area does not make available enough information with regards to the effectiveness of hybridization [46, 127].

Oligonucleotide probes, are usually much shorter (9–24-mer) and are often customized to integrate an amine or thiol linker that allow covalent attachment of the oligonucleotide to a covered glass face [71, 86, 120, 121]. The alterations of probes put in substantial expense to array construction [86]. For illustration, unchanged oligonucleotide probes are able to be balanced in alkaline buffer (pH 12) and put down instinctively on to acid-washed slides; they stick on to the slide face via hydrogen bonds and electrostatic attraction and are then accessible to form duplexes with corresponding strands of target DNA [93]. This addition scheme is vigorous transversely a broad range of temperature (4–95 °C) pH (1–10), and ionic buffers (e.g., 0–4 M NaCl) [116]. The compassion of recognition can be improved if acid-washed slides are coated with epoxy-silane before probe deposition [19, 112]. Underprivileged eminence slides have rough float up and may well auto-fluorescence, thereby producing background signal that interferes with spot finding and quantification [23, 124]. Auto-fluorescence can be predominantly challenging at what time signal strength is short, since is the case with expression arrays. To keep away from these harms, premium ready slides are available [19, 66].

RNA Quantification

The RNA concentration can be used to determine the amount of sample lost while cleaning using a kit. Transcribed viral nucleic acid may be purified by using a kit or manually, and quantified by measurement of the optical absorbance at 260 nm [50, 61, 97]. The concentration of RNA in the crude transcription product is calculated beforehand [128, 129].

Many post-PCR applications require removal of unincorporated primers, primer–dimers, and other reaction components from the PCR product. Traditional purification methods such as ethanol precipitation are difficult to automate. The clean-up system can be automated to purify 96 or 384 reactions in less time. PCR clean-up is especially amenable to automation because no vacuum or centrifugation steps are required, in contrast to many filter-based methods [5, 11, 26, 31].

Sample Amplification and Labeling

Reverse-transcription (RT) reactions may be prepared using a reasonable quantity (40–200 ng) of total RNA or random hexamers/gene-specific primers (Fig. 2). The mixture is heated and immediately cooled on ice before addition of dithiothreitol, dNTPs/aa-dUTP (a mixture of dGTP, dATP, dCTP, dTTP, and aminoallyl-dUTP), Superscript III reverse transcriptase, and first-strand buffer. The mixture can then be incubated, followed by the application of specific reaction conditions [3, 20, 24].

Fig. 2
figure 2

Principle of microarray assay for gene expression. ORFs open reading frames

cDNA from cells under two different conditions is extracted and labeled with two different fluorescent labels, for example, a green dye (cyanine 3) for cells for condition 1 and a red dye (cyanine 5) for cells at condition 2 (to be more accurate, the labeling is typically done by synthesizing single-strand DNAs that are complementary to the extracted mRNA using an enzyme called reverse transcriptase) [34, 41, 48]. Both extracts are washed over the microarray. Labeled gene products from the extracts hybridize to their complementary sequences at the spots due to their preferential binding; complementary single-strand nucleic acid sequences tend to attract each other, and the longer the complementary parts, the stronger the attraction [61, 75, 79] .

Hybridization

Samples for determination probably include DNA from a number of species, at diverse concentrations depending on the relative species abundances. A microarray protocol for quantitative evaluation of species diversity has not yet been developed. However, a universal significance of microarray hybridization in area of genomic transcription levels take on vigorous hybridization of two differentially labeled samples of cDNA to the same microarray slide [82, 114]. The main principle behind microarray technology is hybridization between two DNA strands, i.e., the property of complementary nucleic acid sequences to specifically pair with each other by forming hydrogen bonds between complementary nucleotide base pairs [79]. A greater number of complementary base pairs in a nucleotide sequence means tighter noncovalent bonding between the two strands. After washing off nonspecific bonding sequences, only strongly paired strands will remain hybridized [82, 83]. So, fluorescently labeled target sequences that bind to a probe sequence generate a signal that depends on the strength of the hybridization as determined by the number of paired bases, hybridization conditions (such as temperature), and washing after hybridization. The total strength of the signal from a spot (feature) depends on the amount of target sample bound to the probes present at that spot [114, 115]. Microarrays use relative quantization in which the intensity of a feature is compared with the intensity of the same feature under a different condition, while the identity of the feature is determined by its position [43]. An alternative to microarrays is serial analysis of gene expression, where the transcriptome is sequenced, allowing absolute measurements [54]. This allows quantitative discrimination between transcription levels in two samples when the relative variation is greater than about twofold [2, 20]. Using the same technique, it should be possible to quantitate >2-fold differences in species abundance between two samples, allowing rapid and sensitive examination of differences in species abundance [8, 10, 22].

Microarray Imaging

Microarray images are acquired by a laser scanner that executes a regional scan of the slide and creates a digital map or image for each dye, showing the fluorescent intensities for each pixel [24, 45]. For a particular microarray examination, the scanner generates two 16-bit tagged image file (TIFF) layouts, one for each fluorescent dye [57, 125]. Dissimilar dyes absorb and emit light at different wavelengths [115, 116]. To quantify the amount of each of the two fluorescent dyes at each spot, the scanner applies light excitation at the different wavelengths and measures at the different emission wavelengths [44, 48]. The dyes used usually are Cy3 and Cy5, with emission in the ranges of 510–550 nm and 630–660 nm, respectively [51, 54]. These dyes enable measurement of the amount of sample bound to a spot based on the level of fluorescence emitted when excited by the laser. If the RNA from the sample in condition 1 is abundant, the spot will be green, whereas if the RNA from the sample in condition 2 is abundant, the spot will be red. If both are equal, the spot will be yellow, while if neither is present it will not fluoresce and will appear black [58, 75]. Thus, from the fluorescence intensities and colors for each spot, the relative expression levels of the genes in both samples can be estimated.

A number of conditions [e.g., scan rate, laser power, photomultiplier tube (PMT) voltage] can be adjusted by the user at the time of scanning [24, 80, 103]. New, higher-power lasers provide additional photons for excitation and generate higher signal and noise [58, 75]. Higher PMT voltage results in greater amplification of photons to electrons and generates more detector noise and signal [81, 129].

It is preferable to employ higher laser fluence rather than higher PMT voltage, as this stimulates more signal photons rather than producing more signal per photon [87, 127]. However, elevated laser power can destroy hybridized samples through photobleaching, and depending on the number of scans to be performed on each sample, the laser power needs to be adjusted accordingly [63, 64].

Image quantification limitations (e.g., adaptive, fixed sphere, spot distance) should be carefully assessed and determined for each project as a whole, also depending on the array design, slide type, and spot morphology [8, 89, 106]. It should be noted that the image quantification method should be identical for all slides in a project, whereas the image acquisition parameters, for instance, laser power and/or photomultiplier parameters, can be optimized from slide to slide [9, 10].

Commonly designed for each slide a secure alignment of laser power and PMT is chosen though scanning [23]. The choice of these two parameters is finally determined so that almost all expression on the chip can be captured [37, 90]. However, it has been observed that not all genes spotted onto a chip can be measured accurately for a single scanner setting [44], as there may be genes with expression of 50,000 or more to genes with expression as low as 200 or even less [78]. The choice of these two parameters is finally determined so that almost all expression on the chip can be captured [105]. However, it has been observed that not all the genes spotted on the chip can be measured accurately for a single scanner setting [107]. There are genes with expression ranging from 50,000 or more to genes with expression as low as 200 or even less [115]. Such a wide range of expression is impossible to capture accurately in a single scan with a fixed setting [116]. However, a single scan with particular PMT and laser settings is certainly suitable for most, but certainly not all, of the intensity range [810]. Thus, there is a need to capture various ranges of gene expression values and then combine the information from all scans before further analysis is carried out [37, 44, 78, 107].

Data Analysis

The probes represent (partial) genomic sequences of a gene, positioned or fixed onto a glass slide, whereas the aim is the surveys of gene expression, in a highly parallel and comprehensive manner [16, 18, 128]. Probe variants also provide discrete, dissimilar nucleotide sequences corresponding to the same gene [10]. Signal potency is commonly sequence dependent. For this reason, averaging of the indicated intensities is not appropriate, and probe variants should be investigated separately in the final data evaluation steps. Probe copies exhibit similar response for various instances on the chip [23]. In theory, they should exhibit the same expression, being included for signal amplification. Confidence in the reliability of gene investigation is based on authentication research [17, 25]. Probe replicates are mainly developed based on two important approaches, as described below [74].

Our preferred option to specify a sufficient number of biological samples per condition is to determine the core value for each replicated set of genetic material probes on a chip and use this as the “true value” for that gene probe [66]. The median provides a robust measure and its determination inherently ignores outlier values within a replicate set, in contrast to the arithmetic mean [30, 58].

An alternative is to consider all replica values in the considered dataset, e.g., to apply analysis of variance (ANOVA)-type methods afterwards [53, 54]. Investigational inaccuracy is then pooled with genetic variation. Microarray data may require further processing aimed at reducing the dimensionality of the data to aid comprehension and more focused analysis for each gene [123, 124]. Other methods permit analysis of data consisting of a low number of biological or technical replicates [24, 37].

Data Normalization

Normalization is an important process in the investigation of DNA microarrays when evaluating data from diverse arrays or color controls [36, 38]. Analysis of microarrays can be scientifically influenced by various effects such as nucleic acid extraction, cDNA preparation, sample labeling, assimilation, imaging, and spot detection. In addition, there are effects that are exceptional to individual arrays, such as special effects of various probes, spotting effects, region effects, and pin effects. Normalization endeavors to compensate for such effects through use of internal controls [5153].

The statistical analysis begins with the scanning file itself. Various parameters for the distribution of the pixels in a particular spot are given likes mean, mode, median, standard deviation and the main part correct reserved to elucidate the potency assessment of a précised spot in both channels [10, 109]. The scanned files provide the position values for the center intensity of both channels and their background [20]. The background noise measures the intensity of the mRNA for the slide even if no material was spotted [23]. Using all this information go to the next step, those spots with disgusting excellence that should not be used for advance analysis [34, 37]. The different incorporation properties of the dyes and their different physical characteristics make this the most important source of systematic error in two-color microarrays [38, 42]. The difference in overall intensity between different arrays can be due to real biological variation from one condition to another or just experimental noise [84]. The different expression level of a particular gene in a particular array can also be due to biological variability of the gene or to some noise [37]. If the overall intensity of the hybridized samples is different, this may also be due to some experimental error or to real biological activity [48]. This factor is therefore important in the choice of the within-array normalization method. Besides these factors, it must also be considered that some part of the probe will attach to the slide even when there is no spotted material, thereby contributing to the foreground intensity [44]. However, a reliable estimator for background intensity has not yet been provided. The data should be normalized sequentially to eliminate all nonbiological variation introduced by the investigational procedure and to facilitate assessment of the intensity values contained across slides [24].

Analysis

A brief outline of the initial data analysis of the absolute expression levels within an experiment is shown in Fig. 3. For microarray projects that are designed to study defined gene pathways and interactions, a maximum of annotation and statistical reliability is required [40, 65]. We suggest that the minimum result set for each gene should include: fold-changes of mean expression level per condition and P values from significance testing [24].

Fig. 3
figure 3

Schematic flow diagram of bioinformatics in microarray development

Data Visualization

Data points for biological chip replicates are usually the mean values of each gene for a given condition. Limited subsets of interesting genes can obviously be plotted by means of simple vertical bar charts of log(ratio) or absolute values [53]. This can be complemented by custom, project-dependent graphs, often integrating annotation data with gene expression results. Expression profiles of genes within one condition or of each gene across a number of conditions can be subjected to cluster analysis [99].

Condition Means and Confidence Intervals

These parameters are required to present the expression level of a gene, and enable better interpretation of fold-changes or variation. In most dual-dye experiments, confidence intervals can be calculated for the fold-change itself [48, 51, 53].

Calculation of 95 % confidence intervals (CI) for means can be done in various ways. If there are sufficient numbers of observations for each condition, the common formula based on a t-distribution can be used [91]. Broadly speaking, the values making up both groups of observations are randomly reassigned to each group a large number of times, with the desired statistic (mean, CI95, t, etc. [97, 98]) being calculated for each such similar run [107]. The accumulated set of newly generated statistics is then used to “estimate” the corresponding parameter for the original data [125].

Significance Tests

A simple two-sample t test or Welch t test is often the first tool of choice for statistical inference. Adjustment for multiple testing changes the obtained P values but not the order of sorted significance values [108]. If two conditions can be assumed to be dependent (e.g., cell lines) then paired t tests can increase the statistical power [119]. Nonparametric testing (e.g., using the Mann–Whitney U test) is an alternative with less power that nonetheless works better under the assumption that the underlying distributions are nonsymmetrical. However, for the very small numbers of observations (i.e., 5–7) typically available in microarray studies, the resulting P values can be less useful as a filtering tool [72]. Statistical power can generally be improved by employing bootstrap versions of significance tests. For most microarray studies, P values resulting from significance testing must be interpreted with care in cases with few biological replicates per group [8, 10, 98].

ANOVA-type methods [53] are somewhat more involved, being appropriate where there is more than one experimental factor under investigation (e.g., treatment and dose, or biological replicates and hybridization replicates) [84]. It is important to note that the expression of individual genes of interest is usually backed up by verification using other techniques such as RT-PCR, in situ hybridization, and Northern blotting [31, 48, 66].

Explorative Methods

Explorative methods can be used to identify genes or samples with similar expression profiles, indicating coregulation or sample type, respectively [84, 91]. If coregulation or time effects are of interest, (graphical) principal component analysis can be used to assess the number of clusters that may be contained within the data, which can then serve as the input parameter for the number of expected clusters in a K-means or self-organizing map (SOM) clustering approach [127]. Because of the nature of explorative methods, we recommend using several combinations of algorithms and distance measures (SOM and hierarchical, both with Pearson correlation and Euclidean distance as a minimum) to highlight different features in the data [68].

Conclusions

Available techniques to screen a broad range of viruses are intrinsically biased and thereby constrained to detection of a restricted number of candidate viruses. To overcome this difficulty, an approach to widen viral recognition based on a combination of viral genomics and long-oligonucleotide microarray technology is required. To accomplish this objective, extremely conserved nucleotide sequences within a viral family can be selected for presentation on the microarray. By using these most conserved sequences, it is hoped to maximize the possibility of detecting all members of each viral family, as well as unsequenced, unknown, or newly evolved family members. A secondary, but corresponding, ambition is to take advantage of the high resolution of microarray hybridization to distinguish among viral subtypes, which is a complex and difficult task with conventional methods.