The state of the art in the analysis of two-dimensional gel electrophoresis images
- First Online:
- Cite this article as:
- Berth, M., Moser, F.M., Kolbe, M. et al. Appl Microbiol Biotechnol (2007) 76: 1223. doi:10.1007/s00253-007-1128-0
- 2.9k Downloads
Software-based image analysis is a crucial step in the biological interpretation of two-dimensional gel electrophoresis experiments. Recent significant advances in image processing methods combined with powerful computing hardware have enabled the routine analysis of large experiments. We cover the process starting with the imaging of 2-D gels, quantitation of spots, creation of expression profiles to statistical expression analysis followed by the presentation of results. Challenges for analysis software as well as good practices are highlighted. We emphasize image warping and related methods that are able to overcome the difficulties that are due to varying migration positions of spots between gels. Spot detection, quantitation, normalization, and the creation of expression profiles are described in detail. The recent development of consensus spot patterns and complete expression profiles enables one to take full advantage of statistical methods for expression analysis that are well established for the analysis of DNA microarray experiments. We close with an overview of visualization and presentation methods (proteome maps) and current challenges in the field.
Keywords2-D gel electrophoresis Image analysis Proteome maps Warping Statistics
The last decade in life sciences was deeply influenced by the development of the “Omics” technologies (genomics, transcriptomics, proteomics, and metabolomics), which aim for a global view on biological systems. With these tools at hand, the scientific community is striving to build functional models to develop a global understanding of the living cell.
The analysis of the proteome as the final level of gene expression started out with techniques based on 2-D gel electrophoresis (O’Farrell 1975; Klose 1975) and extended its reach with semi-gel-free and shot gun gel-free liquid chromatography–mass spectrometry (LC–MS)-based techniques in recent years.
A comprehensive evaluation of the relative merits and weaknesses of gel-based and gel-free methods is beyond the scope of our review. Studies that compare the performance on similar samples (Wolff et al. 2006, 2007) indicate that the methods are complementary, i.e., their analytical windows overlap, but each has an exclusive set of proteins or modifications that were not identified with the other. Quantitative analysis based on LC–MS techniques is still in an early stage when considering available software and algorithms. Here, we focus on the computerized analysis of 2-D gels which are widely used in the scientific community. 2-D gels may separate up to 10,000 protein spots on one gel (Klose and Kobalz 1995). In a suitably equipped and experienced lab environment, 2-D gels are easy to handle, and they can be produced in a highly parallelized way. The software has meanwhile reached a level that allows for routine analysis of a large amount of samples with an investment of time that is much smaller than the efforts needed for the wet lab work.
The routine application of potent and easy-to-use software systems has been enabled by several improvements in 2-D gel image acquisition and analysis technology over the last decades (Aittokallio et al. 2005; Dowsey et al. 2003). Among these milestones, one should mention the introduction of the first computer-based analysis systems, still without a graphical user interface, in the late 1970s, among them ELSIE (Bossinger et al. 1979; Vo et al. 1981), GELLAB (Lemkin and Lipkin 1981a, b, c; Lemkin 1989), TYCHO (Anderson et al. 1981), HERMeS (Tarroux et al. 1989; Vincens and Tarroux 1987), GESA (Rowlands et al. 1988), and LIPS (Skolnick et al. 1982). This first generation of 2-D image analysis programs was followed by Elsie-4 (Olson and Miller 1988), Melanie (Appel et al. 1991; based on Elsie-4), and QUEST (Garrels 1989) since the mid-1980s. These programs used X-Windows-based graphical user interfaces on computer workstations. In the following years, the QUEST, TYCHO, and Melanie programs evolved to the commercially available PDQuest, Kepler, and Melanie III. While in the beginning such software needed exceptionally equipped workstations, in 1989, Nonlinear (Newcastle, UK) introduced Phoretix, the first 2-D gel analysis software running on desktop PCs. With the dropping hardware prices, the Melanie and PDQuest systems were also ported to PCs.
Current commercial software products for 2-D gel image analysis
Bio-Rad, Hercules, CA, USA, www.biorad.com
Compugen, Tel Aviv, Israel, www.compugen.com
DECODON, Greifswald, Germany, www.decodon.com
GE Healthcare, www.gelifesciences.com
Decyder 2D, ImageMaster Platinum*
Genebio, Geneva, Switzerland, www.genebio.com
*Melanie (ImageMaster Platinum)
Nonlinear Dynamics, Newcastle, UK, www.nonlinear.com
Syngene, Cambridge, UK, www.syngene.com
With the ever-increasing capacity of available hardware, more advanced image processing methods became feasible. They use all of the available image information instead of condensing it to spot boundaries before processing. This simplifies image comparison and speeds up analysis dramatically but still produces expression profiles with gaps which significantly impede reliable gene expression analysis. The most recent milestone introduced by us in 2003 addressed this problem: An algorithm to combine the information of several gels into a so-called fusion gel makes it possible to generate a proteome map that is representative for the whole experiment (Luhn et al. 2003). On a proteome map, one can detect all spots of a whole experiment in a single gel image, whereas the average images proposed earlier suffer from dilution effects for weak and rare spots. The spots detected there can serve as a spot consensus pattern that is valid for the whole gel set of the experiment. The consensus spot pattern is then transferred according to the warping transform and used on all gels. This allows for 100% matching spots and, in turn, complete expression profiles for reliable statistical analysis (Voigt et al. 2006; Höper et al. 2006).
Spot detection first: these are the classical packages where the image information is first condensed into a set of spot centers, boundaries, and possibly spot volumes for each image. Spot matching and subsequent creation of expression profiles are done based on the data about spot geometry and volumes.
Image warping first: these are packages where image warping is applied to remove running differences between gels, based on the whole image information. Spot detection is a separate and independent step. The creation of expression profiles is critically informed (and improved) by the data about positional differences between gels that were gained in the first step.
Historically, the “spot detection first” workflow was the only feasible way to proceed due to the limitations of available hardware. Software based on the second “image warping first” workflow, including Compugen’s Z3, DECODON’s Delta2D, and, most recently, Nonlinear’s SameSpots, is able to overcome many of the difficulties in spot matching that are hard to deal with when different migration positions add to the uncertainty of multiple separate spot detections. With the subsequent introduction of consensus spot patterns (in Delta2D 2003 and SameSpots 2006), one is able to virtually eliminate matching problems by using consistent spot patterns throughout the experiment. Recent comparisons between separate spot detection and using a consensus spot pattern (Eravci et al. 2007) show that the latter approach is able to find substantially more differentially expressed proteins, in a much shorter time. In other words, valuable information is lost due to spot matching problems that are inevitable when using the classical approach. We expect other vendors to evolve their products to use a workflow that is based on image warping and consensus spot patterns over the following years.
Performing a biological experiment or selecting a biological object of interest. The first sample preparation step is freezing the sample in the current state. This includes inactivation of all cellular processes that may change the proteome composition, preventing protease action, disintegration of the cell material, keeping or bringing the proteins into solution, removing or destroying macromolecules that may disturb the subsequent steps of the 2-D protocol (RNase and/or DNase treatment and centrifuging for cell debris removal). Alternatively or in combination with radiolabeling, covalent fluorescent labeling of proteins can be applied here.
Bringing the proteins into the gel and performing the 2-D separation by combining isoelectrofocussing in the first and sodium dodecyl sulfate (SDS) electrophoresis in the second dimension. An alternate 2-D approach uses the combination of two detergent treatments that resolve the protein molecules differently resulting in a scattered diagonal spot pattern (2D-16-BAC- or 2D-CTAB/SDS-polyacrylamide gel electrophoresis). A variety of staining techniques can be applied before or after separation to enable spot detection.
Capturing the gel images by using scanners, charge-coupled device (CCD) camera-based, or laser imaging devices. Depending on the protein labeling or staining techniques, a compatible imaging device has to be chosen. The capturing process results in one or more digitized computer images per gel that can be displayed with common image analysis software. The image capture step transforms the quantitative information of the gel into computer-readable data.
Correction of positional spot variations by image warping. 2-D electrophoresis results in spot patterns with variations in the spot positions between gels. Therefore, gel images are positionally corrected by a combination of global and local image transforms (image warping). The information about differences in spot positions that was gained in this step is reused later for image fusion and for the transfer of the consensus spot pattern.
Image fusion and proteome maps condense the image information of the whole experiment into one fusion image, also called a proteome map. The proteome map contains the information of all protein spots ever detected in the experiment.
Spot detection is performed on the proteome map. As a result, a consensus spot pattern is generated, which is valid for all gels in the experiment. It describes the position and the general shape of all protein spots from the experiment.
For spot quantitation and building expression profiles, the consensus spot pattern is applied to all gel images of the experiment (Fig. 2). The image transformation (step 4) assures that all spots of the consensus pattern arrive at their correct position. A remodeling step makes sure that the predetermined spot boundaries from the consensus are adapted to the real gray levels observed on the target image. All boundaries of the consensus pattern can be found on every gel.
Expression profile analysis identifies interesting spots which will be marked for further analysis, protein identification, and interpretation.
Ideally, a dye should bind noncovalently to the protein after a linear response curve. It should allow for a detection of very low protein amounts because protein concentrations in biological systems may vary by six or more orders of magnitude (Corthals et al. 2000). At the same time, saturation effects have to be avoided because they impede normalized quantitation. In practice, depending on the stain that was used, a 2-D gel image analysis may give quantitative or only qualitative results for most or only a subfraction of the most intense spots. Common approaches for the detection of protein amounts use dyes that ideally bind to proteins much stronger than to the gel matrix or other compounds accompanying the 2-D electrophoresis process.
The most commonly used dyes in 2-D gels
Coomassie Brilliant Blue
Colloidal Coomassie Blue
Ruthenium II tris (bathophenanthroline disulfonate)
The most striking advantage of applying covalently binding dyes before separation is the possibility to run multiple samples in one gel. Provided that the dyes induce the same positional shift in a protein’s position, samples separated in parallel will give exactly congruent 2-D patterns. As long as the number of samples equals the number of available dyes, 2-D patterns with perfect positional identity can be generated. Difference gel electrophoresis (DIGE) was the first approach that used such a sample multiplexing in Unlü et al. (1998) by simultaneously separating samples labeled with Cy2, Cy3, and Cy5 within the same gel. If the amount of samples exceeds the number of available dyes (3 dyes = max 3 samples per gel), more than one gel have to be used, giving rise to positional variation between gels just like in the traditional setups. Dye multiplexing allows for a quantitative normalization over several gels by using an internal standard, i.e., a mixture of equal aliquots of every sample under analysis (Alban et al. 2003). The internal standard is separated together with Cy3/Cy5-labeled sample pairs on every gel and serves as quantitative reference.
Protein stainings give a measurement of the current protein amount. They are well suited for knock out experiments or analyses of long-term experiments in which treatment or stimulus effects have time to manifest themselves as observable changes in protein levels. Staining techniques that measure the accumulated quantity of proteins can overlook minor changes in protein quantities because they disappear within noise or systematic errors. CVs between 25 and 45% for the same spot/same sample separated on different gels were reported by different authors (e.g., Nishihara and Champion 2002; Eravci et al. 2007). This means that a 1.3-fold change of a protein species cannot be reliably detected. That is why for short-term stimulus response experiments, in vivo labeling techniques can give a more detailed picture of a changing proteome.
Characterizing specific protein properties
There is a broad range of techniques available that are able to detect specific protein features (Fig. 3), such as phosphorylation. The general approach is to combine a conventional staining method that displays all separated proteins with a specialized stain that highlights protein features, giving separate images that can be recombined using software. The Diamond ProQ (Steinberg et al. 2003) and Emerald ProQ (Patton and Beechem 2002) stains bind specifically but noncovalently to phosphorylated and glycosylated proteins, respectively. Multiplexing the phosphoprotein or glycoprotein pattern with the total protein clearly indicates the modified protein species. For a quantitative analysis of phosphoproteins in a bacterial system, see Eymann et al. (2007). For the redox state detection of proteins, two assays for oxidized thiols based on a radiolabeled tag (Leichert and Jakob 2004) or a fluorescent tag (Hochgräfe et al. 2005) have been suggested. In addition to visualizing the protein’s thiol state, image multiplexing allows the calculation of the degree of oxidation. An assay for the detection of carbonyl groups which are an indicator for irreversible protein oxidation has been successfully applied by Dukan and Nyström (1999) or Mosterz and Hecker (2003). This approach uses immunodetection of 2,4-dinitrophenylhydrazone derivated carbonyl groups.
The spot patterns may be very different when only special protein subsets are imaged, so they are hard to compare. Fortunately, the combination of autoradiography with total protein staining can help to solve this problem. Autoradiograph and densito-/fluorograph from the same gel can be aligned very easily (Fig. 4) because, normally, only minor gel distortions can occur due to staining and washing steps. Images with the total protein amount may then help in finding correspondences between gels because total protein patterns change only to a small extent, so they can be easily aligned.
Protein degradation has been measured by using in vivo pulse chase radiolabeling as well (Kock et al. 2004). The degradation of a radiolabeled subfraction of cellular proteins is observed in a time course experiment by using a series of 2-D gels and looking for proteins whose signals disappear with time.
2-D Western blots are used for the detection of immunogenic proteins as suggested by Haas et al. 2002. Western blots are also suitable for finding modified forms of known proteins (using protein-specific antibodies) or for the detection of protein features (phosphoaminoacid antibodies). Again, superimposing the stained gel and the Western blot can highlight the relevant spots. For a more detailed overview on protein detection techniques, see Patton (2002).
Recording and preparation of raw image data
Digital images of 2-D gels are acquired using scanners or CCD-camera-based systems. Scanners produce an image by moving a light source and sensor element with CCDs over the 2-D gel. The resolution of the final image is defined by the density of sensors within the CCD element, the speed, and the frequency of the measurements. In CCD-camera-based systems, the 2-D gel is projected onto a CCD sensor array through a photographic lens so the 2-D gel is measured as a whole. Because the photographic lens is an optical system based on refraction, removal of distortions is part of the system’s image preparation. Techniques compensating for lightfield variations are often applied because the central CCD sensors collect more light than those at the borders. The number of single CCD elements on the camera chip determines the resolution of the final gel image. Hybrid systems aim to combine the speed of a camera with the resolution of a scanner by moving a camera over the 2-D gel to produce image tiles that are later assembled using image processing. The challenge is to accurately remove illumination differences and distortions caused by the photographic lens to avoid discontinuities in adjacent tiles. Some scanners have the ability to measure white light, fluorescence, and radiation within one device, as well as simultaneous scanning for different wavelengths. Scanners usually provide higher resolution than CCD cameras while consuming more processing time per image. For technical information, consult Miura (2003) and Tan et al. (2007).
The general rule in 2-D gel image analysis is that the quality of the raw data has a significant impact on the final result. Therefore, it is essential to avoid experimentally caused artifacts and to configure the scanning devices in the best possible way. Background, artifacts, and noise influence the spot detection and quantitation process. Gel disruptions may truncate spots, speckles may mislead the spot detection or distort quantitation, noise can cover low intensity spots, background increases quantity and reduces dynamic range, etc. Background may be caused by insufficiently erased imaging plates (phosphorimaging), insufficient destaining, fluorescing glass plates, gel coverings, and backings. Furthermore, misusing optical filters for fluorescence imaging may cause background. Noise can be produced by high photo-multiplier tube voltages, which leads to the amplification of random signals. Phosphor screens that have not been used for a longer time accumulate noise.
Many software packages allow for postscan image manipulations. One has to distinguish between image manipulations that do not change the quantitative information and those that do, incurring some loss of data in the process. All operations that leave pixels intact do not change the measured data, e.g., rotations in 90° steps, mirrorings, and cropping (removing areas from the images that do not contain information of interest). Linear enhancements of resolution and gray levels can be undone without data loss and do not influence quantitative data because normalization is used in spot quantitation. On the other hand, many operations that are used for image enhancement cause minor changes in spot detection and spot quantitation and should be avoided if possible, e.g., free rotations or free scaling change gray level distribution of the manipulated image.
Finally, there are operations that should definitely be avoided because they result in data loss: for example, gamma correction changes the gray levels nonlinearly, blurring, and converting to JPEG format loose data, etc. Another hazard lies in the application of general purpose image manipulation software to special purpose file formats. In the process of, for example, cropping a gel file in Photoshop, essential calibration information can be lost in the resulting file. Therefore, it is advisable to use specialized software (e.g., the software that came with the scanner, or a 2-D gel image analysis program) that understands the characteristics of the file format.
Removing variations in spot positions—warping
It turns out that variations in spot positions are localized, i.e., spots that are close together on the gels will have similar gel-to-gel variations in their position (Fig. 7b). This suggests the possibility of eliminating the positional variations by applying an image processing technique called warping. In a more general context, the image processing task is known as image registration; it is, for example, used to combine satellite images from the same region that were taken at different times and angles. In a sense, image registration compensates for variations in the lab process that could not be controlled otherwise. During image registration, similar regions or corresponding spots are searched on both gels in a more or less automated way. This results in pairs of landmark coordinates that link corresponding regions of an image pair to each other. The actual transformation of one image to fit another is not linear (i.e., no simple rotation or scaling suffices), and it can vary considerably between regions. It is important to note that automatic image registration uses the entire available image information and can be done independently of spot detection. Artifacts like gel disruptions, finger prints, and speckles may disturb the finding of the correct warp transform and should be experimentally avoided or removed by the analysis software. Image registration techniques have been introduced by Compugen’s Z3 and DECODON’s Delta2D in 2000 and are meanwhile adopted by other software packages. For surveys of image registration as applied to 2-D gels, see Aittokallio et al. (2005) and Dowsey et al. (2003).
Once suitable warpings between gel pairs have been produced and checked using, e.g., dual channel images (Fig. 7b,c), they can be combined to allow for the registration of every gel in the experiment to every other gel. By knowing the necessary transforms between the images, the software can essentially remove all differences in spot positions as needed for, e.g., precise spot matching or fusion images (see below). In rare cases, no suitable warp transform can be found, e.g., if spots switched their relative positions or if the patterns are so different that no obvious landmark pairs can be found (e.g., stimulus/in vivo labeling experiments, dye multiplex experiments). Under these circumstances, alternative experimental setups should be used. One example is the linking of gel images via comigration of differently labeled samples in the same gel (DIGE) or by using helper gels containing mixtures of two samples A and B to find an implicit warp transform between two gel images via a helper image (Eymann et al. 2007).
Spot detection and quantitation
The particulars of the methods used for spot detection are often proprietary information of the software vendors, and thus not publicly available. For a survey of published work, see Dowsey et al. (2003).
The spot detection process can be controlled by setting software-specific parameters, such as expected size of a spot in pixels, or even expected number of spots. Due to ambiguities in the gel images (merged spots, weak spots, noise), automated spot detection can only be a heuristic process in some areas. The user will, therefore, sometimes want to change the spot pattern by removing spots, splitting spot clusters, or joining spots. Manual intervention has a downside as well: Individual users have different perceptions about the “correct” spot shapes, so reproducibility between different operators of the software suffers. It is, therefore, advisable to reduce the necessary manual interventions to a minimum, e.g., by defining points as “markers” for the creation of new/splitting of existing spots and letting the software determine the adapted boundaries.
In the simplest case, one assumes that gray values found in the image file are directly proportional to image intensities and, by extension, protein quantity in the small gel area corresponding to the pixel. However, more advanced imaging equipment utilizes calibration information that should be used to arrive at correct quantities:
Calibration of the device. Another type of calibration may have to be applied to eliminate variance between imaging devices of the same model. While most of the laser scanners in the market have a built-in autocalibration, many flatbed scanners need to be calibrated manually. This can be done by using calibration wedges which are offered by several resellers, e.g., Stouffer Industries (Mishawaka, IN, USA), Danes-Picta (Praha, Czech Republic), UVP (Upland, CA, USA). Secondary calibration can be applied if the user knows about the relationship of protein amount and measured signal.
Finally, one can take into account the dye-specific response curve for protein staining. Protein concentration wedges may help to find a fluorescent or absorptive stain-specific transfer function that may help to derive the protein amount from the emitted light or the measured absorption, respectively. It is to be expected that even different protein species have different response curves resulting from their biochemical properties.
Normalization of spot quantities
A more general approach is to postulate that the spot intensity distributions (or some of the distribution’s parameters) are equal between gels/gel images. Therefore, e.g., DeCyder (GE Healthcare 2007) tries to match the distributions by adapting parameters of a normal distribution. Another approach is to use only a subset for normalization, e.g., by declaring that certain spots belong to “housekeeping” genes. It is hard to justify a subset like this, given that their being unaffected by the phenomenon under study is just a hypothesis. In principle, one could also spike the gel with a defined mixture of proteins with known quantities. We are, however, not aware of any published work that used this approach. More advanced normalization methods that were originally developed for the analysis of microarray data have been successfully applied in the context of 2-D gel analysis (Fodor et al. 2005). Spatial bias, i.e., a systematic increase or decrease in spot intensities in a gel region, can be observed in experiments and could be corrected by incorporating it in an error model analogous to those used in microarray analysis (Kreil et al. 2004).
The implicit assumption in all normalization procedures is that the majority of proteins are not affected by the phenomenon under study, or the technical variations in the process. Therefore, the global intensity distributions should be equal. The similarity of two distributions can be assessed using a quantile–quantile (QQ) plot (Fig. 12b). It is a two-dimensional plot where spots are compared based on their rank, i.e., the strongest spot on one image is plotted against the strongest spot on another image, etc. For identical distributions, the QQ plot is a diagonal line. Systematic deviations like an overall higher intensity are easily visible.
Normalization procedures that take into account only intensities on one image might be called “vertical” (using columns in an imagined spot intensity table); they modify values based on values in the same column (i.e., image). A different class of normalizations uses intensities from other gels/images; these might be called “horizontal” normalizations. The most prominent example is the DIGE setup using a common internal standard on all gels, usually in the Cy2 channel (Alban et al. 2003). The internal standard is composed by mixing all samples from the experiment in equal amounts. Spot intensities are then normalized by dividing them by the intensity of the corresponding spot on the standard image (Cy2 channel on the same gel). In addition to the gel-to-gel variations, these normalization methods have to cope with differing backgrounds and differing distributions of spot quantities between dyes. The resulting statistics are very similar to the relative expression levels that are produced in competitive hybridization microarray experiments. For a critique of the then-current DIGE normalization used in DeCyder and an alternative normalization method, see Karp et al. (2004).
As a result of the spot detection and quantitation step, the user gets a variety of data on each gel spot, including normalized spot quantity, background, spot outline, position of the spot center, and spot quality measures.
loss of sample during entry into the IEF gel,
efficiency of transfer from first to second dimension,
protein loss during staining,
a protein’s staining curve over time,
staining curve over concentration, and
Given the biochemical diversity of protein molecules, it is to be expected that there are some proteins with a nonlinear relation between concentration and intensity. As a result, one expects to obtain only quantitative results of a relative nature, referring to the same protein species, similar to “this protein is two times stronger in sample A than in sample B.” It is essential to track the protein across gels for this relative quantitation to work. Although the limiting factors mentioned above may seem a bit discouraging, recent controlled experiments (Eravci et al. 2007) show that even a 25% change in quantity can be reliably detected for a substantial fraction of the proteins provided that one controls experimental variation and software-related problems that have a diminishing effect on reproducibility (mainly spot matching, see below).
Building expression profiles
For comparing the spot intensities over a whole experiment, each spot on a certain gel has to be mapped to the corresponding spots on the other gels in a process called spot matching. The quality of the matching depends on the quality and the reproducibility of electrophoretic separation and spot detection as well as on the methods employed by the software. In the ideal analysis, spot matching would produce exactly one expression profile for every protein species that is visible on any gel.
The challenges in spot matching lie in the differences in migration positions between gels, changes in the spot pattern itself (i.e., proteins that are missing or very weak in one of the samples), ambiguities in the gel data (e.g., more or less well-resolved spot clusters). Due to these difficulties, spot matching has been, alongside with spot editing, the most time-consuming step in the analysis when using traditional software. The traditional approach consists of doing separate spot detections for each individual gel image of the experiment. This results in differing spot patterns, even for replicate samples. In a subsequent step, the user has to revise these results and, if necessary, to correct missing or false positive spots. Furthermore, regions that were detected as separate spots on one gel but as a single spot on another have to be split or joined by hand. Unfortunately, these problems grow with the number of gels (Voss and Haberl 2000), severely limiting the statistical benefits that come with larger sample numbers. For example, in a study that used plasma from five different individuals, taken at six time points (Fodor et al. 2005), there were fewer than 150 spots matched across at least 40 of the total 45 DIGE gels, out of 2,385 spots total. Although it seems possible with enough manual work to improve expression profiles, both the inherent ambiguity in the images and the sheer amount of work necessary result in the acceptance of mismatches and gaps in expression profiles with classical packages. Spot-matching-related problems affect the statistical analysis: Gaps in expression profiles have to be treated as missing values. Mismatches are the equivalent of substituting a random value into the expression profile. Both effects decrease the statistical confidence substantially.
To overcome these problems, a different approach to construct expression profiles has been suggested and implemented in recent years. The approach is based on a spot consensus pattern that is derived from all gel images of an experiment and then applied to each gel image of the analysis set (Figs. 1 and 2). The consensus pattern is produced by spot detection on a fused image that essentially contains all spots in the experiments (“union fusion,” Luhn et al. 2003). Image fusion is a method to combine multiple images into a new, synthetic image where each pixel is a function of corresponding pixels in the input images. The resulting image looks like a real gel image, and, more importantly, all spots from the experiment are represented on it. Thus, spots that are present only on a few of the gels can be located in the fused image, and properly separated from those surrounding them. When the detection and editing of the consensus spot pattern is finished, spot boundaries are transferred back to the original images. This transfer uses the transformations that were produced from the warping step and used earlier to create the fused image. The spot sizes and intensities differ from gel to gel, so a spot remodeling step is applied to make the spot boundaries fit to the gray level distributions of the original gel images. The spot quantities are then calculated as usual by summing the pixel intensities within the spot boundaries. Because each gel has the same spot pattern (modified by the warping transform) a matching of the spots is quite easy and results in 100% spot matching and complete expression profiles. For spots that are absent on some gels, the method will nevertheless produce a spot quantity which will be near zero after background is subtracted. In essence, one could describe the spot transfer approach as quantitation of corresponding gel image regions. The resulting expression profiles were shown to be superior to those produced by the classical method (separate spot detections on each gel) both in terms of quality as well as time needed for the analysis (Eravci et al. 2007).
Analyzing gene expression
Having expression profiles at hand, we can now proceed to address the important biological questions of the phenomenon under study. Overall, the task is to distinguish those proteins that are relevant or interesting from those that are relatively unchanged. It is often the case that 2-D gel studies are exploratory in nature, i.e., that one wants to generate hypotheses and find proteins whose role can later be validated using other methods. An example of this is the discovery of protein biomarkers where one searches proteins that are consistently associated with a phenotype. Another group of applications is concerned with elucidating and interpreting subsets of proteins that are involved in a biological process, e.g., coregulated proteins induced by an external stimulus (Bernhardt et al. 2003). Methods ranging from the classical statistical tests to machine learning are applied to address these questions.
As in all experimental work, the design of the experiment is of crucial importance to its success. It is expedient to ask for statistical advice early to be able to make optimal use of resources and limited amounts of sample material. For all but the most exploratory experiments, one will want to use replicates: biological replicates to address genetic and environmental variability and maybe technical replicates (i.e., multiple separations of the same sample) to address the variability in the gel electrophoresis process itself. General guidelines concerning the use of replicates and statistical analyses accompanying published data have been put forward in Wilkins et al. (2006). Of course, a higher number of replicates increase the number of proteins that can be identified as being relevant with statistical confidence. A recommendation for the simplest possible case (two sample groups) is worked out in Hunt et al. (2005). Karp and Lilley (2005) present a power and cost-benefit analysis for DIGE experiments and DeCyder software that can be used as a model for other experimental setups. The type of replicate affects the statistical model that should be used: When applying statistical tests, technical replicates cannot necessarily be viewed as independent observations, so a more refined statistical analysis, e.g., nested analysis of variance (ANOVA), can give better results (Karp et al. 2005). Pooling of samples should be kept to a minimum because it diminishes (or prevents) the ability to assess biological variation within a sample type. In Molloy et al. (2003), the effects of technical and biological variance are investigated for samples of different types (bacteria, cell lines, primary cultures, human samples). For guidance on experiment design, see also Hunt et al. (2005). Of course, experimental variability is different between laboratories, and the software and analysis method used can have a crucial influence on the outcome’s quality. The studies cited here use separate spot detections on each gel and often bias results by selecting only spots that were matched across a sufficient number of gel images.
Running differences between gels add a source of errors for spot matching, whereas in microarray data, matching is trivial because every gene is spotted at a known row and column.
Spot detection in 2-D gels is much harder because spots may not be distinct.
Gene information is not readily available for spots, so it is harder to correlate or cross-validate expression profiles with gene annotations.
The first two characteristics have the greatest impact on the quality of statistical analysis. Running differences and spot shape ambiguities (e.g., distinct spots on one gel vs overlapping spots on another) create spot matching problems that can lead to faulty assembly of expression profiles and gaps (missing values). By using consensus spot patterns (see above) that are transferred onto all images, one obtains consistent expression profiles. Of course, the ambiguities inherent in electrophoresis, e.g., masking of weak spots by strong ones in a nearby position (Pietrogrande et al. 2002; Campostrini et al. 2005), as well as operator-related variance, are still potential sources of errors and variation.
In the simplest case, the experiment is a comparison of two samples A and B (e.g., treated vs control, mutant vs wild type, etc.) The task then is finding those proteins that show significant differences in expression levels. Normally, one will have several replicates per sample, and a statistical test is employed to find differentially expressed proteins. Certainly, the most popular test in this area is Student’s t test, where the null hypothesis is that the means of expression levels in samples A and B are the same. Rejecting the null hypothesis then means that the protein under test is differentially expressed. One has to keep in mind that the t test makes the assumption that spot quantities within replicates follow a normal distribution which should in principle be tested separately. Additional tests are in use (ProteomWeaver, Bio-Rad, Hercules, CA, USA) that do not depend on the normality assumption (nonparametric tests) like the two-sample Wilcoxon test, also known as the Mann–Whitney U test (Conover 2001). In addition to statistical tests, often a fold-change criterion is imposed on the spots. The reasoning behind this is that even true changes in intensity are hard to verify independently if they are small, and that, at least intuitively, a high fold-change in a protein’s intensity is unlikely to be the result of mere chance. For small numbers of replicates, the fold-change criterion is sometimes the only one used.
When applying statistical tests to 2-D gel data, one is faced with the so-called multiple hypothesis testing problem: For each expression profile, a separate test is done. Each test has a certain probability of giving a false positive result, i.e., a protein spot is declared to be differentially expressed, while the difference was due to pure chance. The large number of tests can produce a high number of false positives. For example, in an experiment with 2,000 spots per gel, an accepted false-positive rate alpha of 5% will result in 100 proteins that are found to be “differentially expressed,” although the difference is the result of mere chance. Various procedures try to overcome this problem by adjusting alpha in an adequate way. Bonferroni correction (Weisstein 2007a) controls the probability that there is one single error made in the whole analysis (family-wise error rate), i.e., that there is at least one false-positive protein. However, this method is usually too conservative because, in practice, it is acceptable to have some false positives, depending on the cost of repeating or validating the corresponding results. The proportion of false positives in the result set is controlled by the False Discovery Rate (FDR) approach (Benjamini and Hochberg 1995; Pawitan et al. 2005). The false discovery rate is the rate of false positive results among all profiles that were tested positive. While it is difficult to estimate the false discovery rate, the approach in Benjamini and Hochberg (1995) gives a simple procedure to control it, i.e., make sure the false discovery rate is below a given bound. Overall, the FDR approach allows one to strike a balance between the need to find statistically valid proteins of interest and the additional cost that is associated with following up on false positives. For background and applications of FDR and related methods, the reader is advised to consult the reviews in Manly et al. (2004) and Pounds (2006).
For more complicated experiment designs involving multiple factors, ANOVA can be used (Weisstein 2007b; Karp et al. 2005). The basic idea of the method is to find out to what extent the observed variances between different samples can be explained by the experimental parameters, as opposed to biological or technical variation.
Hypothesis-independent methods were developed for the discovery of patterns in large quantities of possibly high-dimensional data, in the fields of data mining and machine learning. As we expect a small number of fundamental biological processes to be reflected in the expression patterns of a large number of proteins, it makes sense to apply these methods to the analysis of 2-D experiments. Again, much of the work on microarray analysis can be transferred easily because the fundamental unit of data is an expression profile. When using separate spot detections on every gel, the missing values will have to be dealt with by the statistical method, for example, by missing value imputation. A large percentage of missing values decreases the utility of all statistical methods, that is why we recommend using the consensus spot pattern approach described above.
Another method that has been applied with great success in a variety of areas like semantic text analysis (Deerwester et al. 1990) and face recognition (Turk and Pentland 1991) is PCA or the related singular value decomposition. The basic idea is to find a projection from a high-dimensional space into a low-dimensional space (e.g., the plane) such that the structure, especially the variation in the data, is preserved. The principal components are the directions along which the variation is maximal. They can be interpreted as “hidden parameters” of a process or experiment. The tutorial Shlens (2005) provides an accessible introduction, the book chapter Wall et al. (2003) gives background and motivation for microarray analysis.
When PCA is applied to the expression profiles, in our example, we would consider a point cloud of 1,500 vectors (one vector for each expression profile) with 48 dimensions (the expression levels on the 48 gels). The result is a display of the proteins where (hopefully) proteins with close positions are biologically related. Consider a time series experiment, where proteins are switched on and off in stages. If there is a “hidden parameter,” such as a stage in the cell cycle, it will have a systematic influence on the expression levels, and thus increase the variance for the genes taking part in it. This increased variance will then become part of the directions that are used for the projection (the principal components). The principal components were also called “eigengenes,” they can be seen as “typical expression profiles,” see, for example, Alter et al. (2000) and Holter et al. (2000).
Presentation and visualization: from spots to proteome maps
The next more complex display is for comparison of a pair of gel images, e.g., treated vs untreated or wild type vs knockout. There are two possibilities: The gels can be shown side by side, or the gels are shown superimposed in a dual channel image (see Fig. 5). For both types, a histogram equalization is necessary to give the user a visual estimate of relative spot quantities. Background can be removed before display, depending on the method that is used for background detection. Even after equalization, the visual impression of spot volumes can be misleading, for example, when comparing a small dark spot with a spot that is less intense but larger. Before gel images are superimposed in false colors, histograms should be equalized, and the chosen colors should be of similar apparent intensity. This makes sure that one image does not dominate the others and negatively influence the estimation of quantities. Ideally, the chosen colors are of similar luminance and located on opposite sides of the color wheel. Triple channel images are sometimes used (red-green-blue channels; see Fig. 6) but difficult to interpret for the untrained eye.
For the display of multiple gels, one often shows tiles containing spots of interest in a side-by-side display. Here, a multiway equalization of histograms is necessary to give suitable visual impressions of spot intensities. Figure 16 shows the effect of multiway equalization on the display of a series of 2-D image tiles. With proper equalization quantity, the visual impression closely corresponds to the normalized quantities shown in the expression profile chart (Fig. 16c).
Presentations of large gel series, for example, for time course experiments, or large projects containing a lot of replicates, will require too much screen space for a side-by-side display. Image warping can help in making an animation of 2-D gel series from time courses by assuring that spot positions do not jump between frames. Bernhardt et al. (2003) used a combination of dual channel images showing protein amount (stained proteins—false colored green) and current protein synthesis (35S methionine labeling—false colored red) to monitor the fate of each protein during the development of a bacterial culture along a time line (http://microbio1.biologie.uni-greifswald.de/starv/movie.htm).
Especially for the collection of protein identifications and the presentation of proteome maps, it is very useful to condense spot pattern information from a multitude of gels into a single reference image. This can be done by collecting spot identifications on a single representative gel (Eymann et al. 2004). When using image fusion with the union or max intensity combination function, a proteome map can be generated that shows all proteins that were observed in the experiment. The map shows realistically looking spot shapes and does not ignore differentially expressed spots or dilute rarely occurring spots like it would be the case with average images (Tam le et al. 2006).
A related feature was implemented in Proteomweaver (Bio-Rad) that allows for the combination of different narrow pI-gradients into a global proteome map. This composite map utilizes the much better resolution of narrow pH-gradient strips and supports image analysis as if the data came from a single, very wide gel.
A proteome map normally serves as basis for further, especially physiologically oriented research and is comparable with a DNA array layout. The proteome map defines at which positions a protein spot was identified and can be recovered during gel analysis. A variety of proteome maps of many kinds of samples is available. Most of them show about a thousand different identified protein spots. Additional data can be attached to spots using labels, e.g., protein identification or functional information. Especially in gel regions with a very dense spot pattern, it is a big challenge to display the protein information without obscuring image information with spot labels.
Color coding also works for highlighting proteins that show elevated expression levels at certain points in a time course experiment. Also here, the information of a time-course-associated gel series is condensed in a proteome map. For each time point, a color is defined in the proteome map, which is then assigned to spots that have their maximal expression level at that time point (Fig. 17b).
In contrast to a tabular display of spot quantities, a proteome map retains that spatial relations between spots as well as typical spot forms. It is, thus, easier to relate visually to newly produced gel images. The color coding of spots allows for easy visual identification of interesting subsets of the proteome.
Advances in software, algorithms, and experimental methods have kept 2-D gels competitive with and complementary to other methods for proteome analysis. While still not fully automatic, the software-based analysis is not the time-consuming bottleneck in a proteomics experiment anymore. With complete expression profiles, one is able to take full advantage of statistical methods (e.g., hierarchical clustering, PCA) that were established in the analysis of DNA microarray data. We see three main directions for future improvements of the available software: Firstly, the basic image processing algorithms, i.e., spot detection and image registration (warping), should be improved to a point where the need for a human operator is limited to checking the results. Ongoing research in automated registration and the potential for even greater processing power with grid computing (Dowsey et al. 2006) are promising steps towards this goal. The proteomics community can advance the state of the art by freely sharing raw data from a wide variety of experiments, and by establishing benchmarks. Standardization efforts of data formats and reporting requirements are being established for the gel electrophoresis process (MIAPE GEL, currently open for public feedback before publication at http://www.nature.com/nbt/consult); a similar standard for image processing and documentation of results is under development. Both are coordinated by the HUPO Proteomics Standards Initiative (www.psidev.info). The second direction for improvement is defined by the further adoption of advanced statistical methods for the analysis of large 2-D gel experiments. It seems plausible that large numbers of images from multiple studies could be aggregated to give richer “functional profiles” of proteins that show expression levels across a wider range of samples and conditions. The third direction is putting results from 2-D gel analysis into a larger context by combining spot data with functional annotations and data from other experimental techniques. Visualizing the result in biological terms (metabolic pathways, functional categories, etc.) will make it possible to gain new insights from the growing amounts of data accumulated by the “Omics” technologies.
We are thankful to Julia Bandow (Pfizer) for crucial input to the development of the 2-D gel analysis workflow based on consensus spot patterns (Bandow et al. 2007). We would like to thank the anonymous reviewers for their constructive comments.