Quantitative mass spectrometry in proteomics: a critical review
The quantification of differences between two or more physiological states of a biological system is among the most important but also most challenging technical tasks in proteomics. In addition to the classical methods of differential protein gel or blot staining by dyes and fluorophores, mass-spectrometry-based quantification methods have gained increasing popularity over the past five years. Most of these methods employ differential stable isotope labeling to create a specific mass tag that can be recognized by a mass spectrometer and at the same time provide the basis for quantification. These mass tags can be introduced into proteins or peptides (i) metabolically, (ii) by chemical means, (iii) enzymatically, or (iv) provided by spiked synthetic peptide standards. In contrast, label-free quantification approaches aim to correlate the mass spectrometric signal of intact proteolytic peptides or the number of peptide sequencing events with the relative or absolute protein quantity directly. In this review, we critically examine the more commonly used quantitative mass spectrometry methods for their individual merits and discuss challenges in arriving at meaningful interpretations of quantitative proteomic data.
KeywordsQuantitative proteomics Mass spectrometry Stable isotope labeling
Characteristics and applications of quantitative mass spectrometry methods
Quantitative proteome coverage
Linear dynamic rangea
Metabolic protein labeling
Complex biochemical workflows
Comparison of 2–3 states
Cell culture systems only
Chemical protein labeling (MS)
Medium to complex biochemical workflows
Comparison of 2–3 states
Chemical peptide labeling (MS)
Medium complexity biochemical workflows
Comparison of 2–3 states
Chemical peptide labeling (MS/MS)
Medium complexity biochemical workflows
Comparison of 2–8 states
Enzymatic labeling (MS)
Medium complexity biochemical workflows
Comparison of 2 states
Medium complexity biochemical workflows
Targeted analysis of few proteins
Label free (ion intensity)
Simple biochemical workflows
Whole proteome analysis
Comparison of multiple states
Label free (spectrum counting)
Simple biochemical workflows
Whole proteome analysis
Comparison of multiple states
The earliest possible point for introducing a stable isotope signature into proteins is by metabolic labeling during cell growth and division. Initially described for total labeling of bacteria using 15N-enriched cell culture medium , it has gained wider popularity in the form of the stable isotope labeling by amino acids in cell culture (SILAC) approach introduced by Mann and co-workers in 2002 . In the most commonly used implementation of the method, the medium contains 13C6-arginine and 13C6-lysine which ensures that all tryptic cleavage products of a protein (except for the very C-terminal peptide) carry at least one labeled amino acid resulting in a constant mass increment over the non-labeled counterpart. Protein identification is based on fragmentation spectra of at least one of the co-eluting ‘heavy’ and ‘light’ peptides and relative quantitation is performed by comparing the intensities of isotope clusters of the intact peptide in the survey spectrum. In contrast to full metabolic protein labeling by 15N, the number of incorporated labels in SILAC is defined and not dependent on the peptide sequence thus facilitating data analysis. The main advantage of all metabolic labeling strategies is that the differentially treated samples can be combined at the level of intact cells. This excludes all sources of quantification error introduced by biochemical and mass spectrometric procedures as these will affect both protein populations in the same way. Despite a number of cases that demonstrate the feasibility of total 15N metabolic protein labeling of higher organisms in vivo such as C.elegans, Drosophila melanogaster , rat , or plants , it is neither possible nor practical to apply this strategy routinely. The cost and time required for creating and maintaining these systems is often incommensurate with the value of the information provided. As a result, the main application of metabolic labeling in higher eukaryotes to date is SILAC in immortalized cell lines. Protein labeling in excess of 90% is often achieved by 6–8 passages in medium supplemented with heavy amino acids . While many cell lines can be converted quite readily, some do require special attention. For example, some cell lines require careful titration of the amount of arginine in the medium in order to prevent metabolic conversion of excess arginine into proline which in turn complicates data analysis . Cell lines that are sensitive to changes in media composition or are otherwise difficult to grow or maintain in culture may not be amenable to metabolic labeling at all. A further limitation of metabolic labeling is the restricted number of available labels. For SILAC, a maximum of three conditions can be compared in one experiment (unlabeled, 13C6, and 13C615N4-labeled amino acids) which, albeit possible, complicates the analysis of, e.g., time-course experiments. Because of the early combination of samples, metabolic labeling and SILAC in particular is probably the most accurate quantitative MS method in terms of overall experimental process. This makes it particularly suitable for assessing relatively small changes in protein levels or those of post-translational modifications [17, 18, 19]. For the latter, it should be noted though, that quantification on the peptide level is far from trivial because all information is derived from a single or a few observations.
Protein and peptide labeling
Post-biosynthetic labeling of proteins and peptides is performed by chemical or enzymatic derivatization in vitro. An elegant and specific way to introduce an isotope label into peptides is the use of trypsin- or Glu-C-catalyzed incorporation of 18O during protein digestion [20, 21]. This has originally been employed to aid de novo sequencing of peptides by mass spectrometry  but has recently also been applied to quantitative proteomic applications (for a recent review see Ref. ). Enzymatic labeling can be performed either during proteolytic digestion or, more commonly, after proteolysis in a second incubation step with the protease. Incorporation of 18O into C-termini of peptides results in a mass shift of 2 Da per 18O atom. While trypsin and Glu-C introduce two oxygen atoms resulting in a 4 Da mass shift which is generally sufficient for differentiation of isotopomers, Lys-N and other enzymes incorporate only one 18O molecule and should therefore be avoided . Acid- and base-catalyzed back-exchange with concomitant loss of the isotope label can occur at extreme pH values , but under the mild acidic conditions typically employed for ESI- and MALDI-MS 18O-containing carboxyl groups of peptides are sufficiently stable. Because peptides are enzymatically labeled, artifacts (i.e., side reactions) common to chemical labeling can be avoided. A practical disadvantage is that full labeling is rarely achieved and that different peptides incorporate the label at different rates which complicates data analysis [26, 27].
In principle, every reactive amino acid side chain can be used to incorporate an isotope-coded mass tag by chemical means (reviewed by Ong and Mann ). In practice, however, side chains of lysine and cysteine are primarily used for this purpose. In their pioneering work Gygi et al.  developed the isotope-coded affinity tag (ICAT) approach in which cysteine residues are specifically derivatized with a reagent containing either zero or eight deuterium atoms as well as a biotin group for affinity purification of cystein-derivatized peptides and subsequent MS analysis. Following the initial success of the ICAT approach, several variations on this chemical reagent class emerged to improve, e.g., recovery of labeled peptides or chromatographic properties [28, 29, 30, 31]. Other thiol-specific reagents typically contain halogen-substituted carboxylic acids or amides [32, 33, 34, 35] or employ the Michael-type addition reaction to carbonyl groups (e.g., maleiimide esters and vinylpyridine) [36, 37]. As cysteine is a rare amino acid, ICAT and related methods significantly reduce the complexity of the peptide mixture which can be advantageous when highly complex samples are analyzed. However, ICAT is obviously not suitable for quantifying the significant number of proteins that do not contain any (or a few) cysteine residues and is of limited use for analysis of post-translational modifications and splice isoforms. Despite these drawbacks, ICAT and sim ilar approaches will continue to be useful in a number of broad (e.g., body fluid) or targeted (e.g., cysteine protease) analyses.
Another group of labeling reagents targets the peptide N-terminus and the epsilon-amino group of lysine residues. Most of the time, this is realized via the very specific N-hydroxysuccinimide (NHS) chemistry or other active esters and acid anhydrides as in, e.g., the isotope-coded protein label (ICPL) , isotope tags for relative and absolute quantification (iTRAQ) , tandem mass tags (TMT) , and acetic/succinic anhydride [41, 42, 43, 44]. Isocyanates or isothiocyanates have also been employed, albeit to a lesser extent [45, 46]. In recent studies, formaldehyde has been used for methylation of lysine residues via Schiff base formation and subsequent reduction by cyanoborohydride [47, 48, 49]. This reaction is very fast, very specific, and very cheap. However, a sufficiently large mass shift between ‘heavy’ and ‘light’ labeled peptides can only be achieved with deuterated formaldehyde which in turn leads to partial LC separation of labeled and non-labeled peptides, thus complicating data analysis (discussed below).
In most of the aforementioned chemical modification techniques, relative quantification is achieved by integration of MS signal over isotopomers of ‘heavy’ and ’light’ labeled peptides in survey spectra. Isobaric mass tagging initially introduced by Thompson and co-workers  differs from this concept by introducing tags that initially produce isobaric labeled peptides which precisely co-migrate in liquid chromatography separations. Only upon peptide fragmentation are the different tags distinguished by the mass spectrometer. This permits the simultaneous determination of both identity and relative abundance of peptide pairs in tandem-mass spectra. The commercially available iTRAQ reagent  provides a further refinement of this approach, allowing multiplexed quantitation of up to eight samples. This has turned out to be particularly useful for following biological systems over multiple time points or, more generally, for comparing multiple treatments in the same experiment.
Carboxylic acids in side chains of glutamic and aspartic acid residues as well as the C-termini of polypeptide chains can be isotopically labeled by esterification using deuterated alcohols [50, 51]. This reaction is particularly attractive for the quantification of phospho-peptides because esterification has been shown to reduce binding of acidic peptides to ion metal chelate affinity chromatography (IMAC) columns, thus improving the specificity of this enrichment procedure . Other, more tailored labeling techniques have been developed, e.g., for quantification of phosphorylated and glycosylated peptides. For the former, β-elimination of phosphoric acid followed by Michael addition using, e.g., ethanedithiol derivatives is typically employed [53, 54, 55, 56]. For glycopeptides, hydrazide chemistry replaces the carbohydrate moiety with a labeled chemical group .
Broadly speaking, the chemical properties of amino acid side chains of proteins and peptides chains are rather similar. Consequently, almost all chemical labeling methods may also be applied to intact proteins. For example, the ICPL reagent  has been employed for N-terminal peptide labeling as well as lysine side chain labeling of intact proteins. A similar protocol has been described for iTRAQ . In most cases, full protein denaturation improves labeling results but care has to be taken to avoid protein precipitation (by, e.g., the use of charged reagents). Labeling of intact proteins can be quite advantageous since it allows for further protein separation steps on the combined samples. This may facilitate characterization of protein isoforms by, e.g., 2D gel electrophoresis . However, there are two important caveats to protein labeling: one is that trypsin does not cleave modified lysine residues, which leads to significantly longer peptides that generally are more difficult to identify by MS; second, very high labeling efficiencies are required in case further protein separation is desired prior to MS analysis, since incomplete labeling impairs resolving power achievable with, e.g., 1D and 2D gel electrophoresis. A general draw back with all chemical labeling approaches is that they are prone to side reactions that can lead to unexpected products and which may adversely influence quantification results.
Absolute quantification using internal standards
The use of isotope-labeled synthetic standards has a long history in quantitative mass spectrometry. Originally described in the early 1980s , it is now becoming more broadly applied as a method commonly known as AQUA (absolute quantification of proteins) . In the simplest case, absolute quantification can be achieved by the addition of a known quantity of a stable isotope-labeled standard peptide to a protein digest and subsequent comparison of the mass spectrometric signal to the endogenous peptide in the sample. Unlike in metabolic labeling, where relative quantitative information is acquired for a large number of the proteins present in a mixture, the addition of synthetic peptides to a proteome digest focuses on the determination of the quantity of one or a few particular proteins of interest. This approach is attractive for studies aimed at, e.g., the analysis and validation of potential biomarkers in a large number of clinical samples  or at measuring the levels of particular peptide modifications such as ubiquitinylation .
The approach has been refined by constructing synthetic genes that express concatenated standard peptides which upon tryptic digestion either provide multiple peptides of the same protein for quantification or quantification standards for a group of proteins of interest . Not only does the provision of multiple peptides increase confidence in quantification, the synthetic protein can also be added earlier in the process than individual peptides, thus controlling any potential bias encountered during protein digestion. One notable example of following the synthetic gene strategy is the determination of the stoichiometry of the eight-membered eIF2B-eIF2 protein complex .
Given that tryptic digests of entire proteomes are very complex mixtures, and that most mass spectrometers have a rather limited dynamic detection range, there are a number of limitations to the AQUA approach. One practical drawback is that one has to ‘guess’ how much of the labeled standard should be added to a sample. This amount may be different for all proteins of interest as their expression levels (used here in the sense of protein abundance rather than protein synthesis) may differ greatly within a sample. Another limitation is the specificity of the spiked standard as there are likely multiple isobaric peptides present in the mixture. Both of these issues can be greatly improved by a method called multiple reaction monitoring (MRM)  in which the (triple quadrupole) mass spectrometer monitors both the intact peptide mass and one or more specific fragment ions of that peptide over the course of an LC-MS experiment. The combination of retention time, peptide mass, and fragment mass practically eliminates ambiguities in peptide assignments and extends the quantification range to 4–5 orders of magnitude . Obviously, the choice of synthetic peptide standard is important and is mostly determined empirically. However, recent data suggest that it is possible to predict which of a protein’s tryptic peptides will be most frequently observed for a given proteomic platform and thus would be a suitable quantification standard . Despite the ability to calculate protein amounts from an AQUA experiment, there are still question marks as to how absolute these values are as any sample manipulation prior to adding the synthetic standard may bias the results (losses or enrichment). Consequently, the amount of a protein in an experiment determined by AQUA may not reflect the true expression levels of this protein in a cell.
LC-MS/MS analysis of stable isotope labeled peptides
As described above, quantitation based on stable isotope labeling can be achieved by signal integration in survey MS spectra (e.g., SILAC) or tandem MS spectra (e.g., iTRAQ). For both approaches, several points have to be considered in the design and analysis of an experiment. Although the assumption that stable isotope labeling does not alter the physicochemical properties of a peptide is generally valid, it has been observed that deuterated peptides show small but significant retention time differences in reversed-phase HPLC compared to their non-deuterated counterparts . This complicates data analysis because the relative quantities of the two peptide species cannot be determined accurately from one spectrum but requires integration across the chromatographic time scale. Retention time shifts are far less pronounced for labels such as 13C, 15N, or 18O isotopes , so that the additional signal integration step over retention time can generally be omitted.
A further parameter impacting accuracy and dynamic range of quantification is the mass spectrometric detection system itself. In survey MS spectra, the definition of very low and very strong signals can be problematic. At very low signal, peptide ions are often difficult to distinguish from background noise (Fig. 3b) and for very strong signals, the detector may become saturated (Fig. 3c). In practice, saturation is more often observed for quadrupole TOF instruments than ion traps because these latter devices can control the number of ions before detection . In any case, the relatively recent introduction of high-resolution/high mass accuracy mass spectrometers in proteomics has greatly facilitated the ability to quantify proteins in complex proteomes because the increased instrument performance enables the exact discrimination of peptide isotope clusters from interfering signals caused by, e.g., co-eluting and near-isobaric peptides and other chemical entities [71, 72, 73]. For quantification in tandem MS spectra, saturation effects are rarely a problem. Instead, low-intensity spectra are frequently obtained and may result in less robust quantitation values due to poor ion statistics. Unlike for quantification in survey spectra, the contribution of peptidic or chemical background noise to quantification does not depend on the mass resolution of the mass spectrometer but on the size of the m/z window chosen for isolation of peptides for sequencing (typically 2–6 m/z). All ions present in this window will contribute to the signal of the, e.g., iTRAQ reporter ions. As a result, it is not always clear to what extent quantification was contributed by the peptide of interest or by background. This can sometimes lead to a large underestimation of true changes, especially for very weak peptide signals.
Taken together, the limits to quantification of complex proteomes by stable isotopes is first and foremost an issue of signal interference caused by co-eluting components of similar mass. Therefore, the most straightforward way for optimizing quantitative analyses is to decrease sample complexity by increasing HPLC gradient times or by biochemical fractionation prior to LC-MS analysis.
Currently, two widely used but fundamentally different label-free quantification strategies can be distinguished: (a) measuring and comparing the mass spectrometric signal intensity of peptide precursor ions belonging to a particular protein and (b) counting and comparing the number of fragment spectra identifying peptides of a given protein. In the former approach, the ion chromatograms for every peptide are extracted from an LC-MS/MS run and their mass spectrometric peak areas are integrated over the chromatographic time scale. For low-resolution mass spectra this is typically done by creating extracted ion chromatograms (XICs) for the mass to charge ratios determined for each peptide . More recently, this concept has been extended to high-resolution data to include contributions of 13C isotopes to the overall signal intensities . The intensity value for each peptide in one experiment can then be compared to the respective signals in one or more other experiments to yield relative quantitative information [74, 76, 77, 78, 79, 80]. For proteomic analysis of very complex peptide mixtures, three important experimental parameters affect the analytical accuracy of quantification by ion intensities. (i) It is advantageous to employ a high mass accuracy mass spectrometer because the influence of interfering signals of similar but distinct mass can be minimized. (ii) The peptide chromatographic profile should be optimized for reproducibility to ease finding corresponding peptides between different experiments. This is not a trivial task and special software has been developed to align LC-runs prior to identifying corresponding peptides [81, 82, 83, 84]. (iii) The right balance between acquisition of survey and fragment spectra has to be found. While extensive peptide sequencing by tandem MS is required to identify as many proteins as possible in complex mixtures, a robust quantitative reading by ion intensities requires multiple sampling of the chromatographic peak by survey mass spectra. Typically, multiple fragment spectra are acquired for every survey spectrum at acquisition rates ranging from 0.2 s/spectrum (ion traps) to 1–3 s/spectrum (quadrupole-TOF instruments). Given that chromatographic peak widths are in the order of 10–30 s for nano-LC separations, ion traps have an inherent advantage over QTOFs because many more MS to MS/MS cycles can be performed within the available chromatographic time. Still, even for fast sampling instruments, better quantification accuracy will inevitably mean poorer proteome coverage and vice versa. This dilemma has led some laboratories to conduct two separate experiments for each sample: one which focuses on identifying as many peptides as possible by MS/MS and a second performed in MS-only mode in order to optimize sampling of intact peptide signals. In these approaches, matching of integrated peak intensities to identified peptides is performed by using a combination of accurate mass and retention time [84, 85, 86]. An alternative has been proposed in which the mass spectrometer no longer cycles between MS and MS/MS mode but aims to detect and fragment all peptides in a chromatographic window simultaneously by rapidly alternating between high- and low-energy conditions in the mass spectrometer [87, 88, 89, 90]. Obviously, there are challenges with analyzing such data from complex samples as many fragmentation spectra will be populated with sequence ions from multiple peptides each contributing differently to the overall spectral content.
The peptide or more recently introduced spectral counting approach [91, 92, 93] is based on the empirical observation that the more of a particular protein is present in a sample, the more tandem MS spectra are collected for peptides of that protein. Hence, relative quantification can be achieved by comparing the number of such spectra between a set of experiments. In contrast to quantification by peptide ion intensities, spectral counting benefits from extensive MS/MS data acquisition across the chromatographic time scale both for protein identification as well as protein quantification. However, the commonly employed dynamic exclusion of ions that have already been selected for fragmentation is detrimental for accurate quantification . Although very intuitive and attractive in practical terms, the spectrum counting approach is still controversial because it does not measure any direct physical property of a peptide. It further assumes that the linearity of response is the same for every protein. In fact, the spectrum count response is different for every peptide because, e.g., the chromatographic behavior (retention time, peak width) varies for every peptide. Therefore, even reasonable quantification requires the observation of many spectra for a given protein. Old et al.  have shown that although it is possible to detect threefold protein changes with as few as four spectra; this number increases exponentially for smaller changes (ca.15 spectra for twofold). At the same time, saturation effects will be observed at higher spectral counts and saturation levels will be different for all proteins which renders the assessment of the dynamic range of observed changes difficult.
Nevertheless, the correlation between amount of protein and number of tandem mass spectra does hold and has led researchers to extend the concept to the estimation of absolute protein expression levels. In the first of a series of papers, Rappsilber et al.  computed a protein abundance index (PAI) by dividing the number of observed peptides by the number of all possible tryptic peptides from a particular protein that are within the mass range of the employed mass spectrometer. In a subsequent refinement, the same group transformed the PAI into an exponentially modified form (emPAI)  which showed a better correlation to known protein amounts. Further advances have been made by using computational models that predict which peptides of a given protein are likely to be detected by the mass spectrometer in the first place and thus would form a better basis for quantification [97, 98, 99, 66]. For example, results obtained by the absolute protein expression profiling (APEX) method  suggest that absolute protein expression can be determined to within the correct order of magnitude.
Label-free approaches are certainly the least accurate among the mass spectrometric quantification techniques when considering the overall experimental process because all the systematic and non-systematic variations between experiments are reflected in the obtained data (Fig. 2). Consequently, the number of experimental steps should be kept to a minimum and every effort should be made to control reproducibility at each step. Nonetheless, label-free quantification is worth considering for a number of reasons. In simple practical terms, the time-consuming steps of introducing a label into proteins or peptides can be omitted and there are no costs for labeling reagents. In terms of analytical strategy, the following points may also be important: (i) there is no principle limit to the number of experiments that can be compared. This is certainly an advantage over stable isotope labeling techniques that are typically limited to 2–8 experiments that can be directly compared. (ii) Unlike for most stable isotope labeling techniques, mass spectral complexity (in terms of detected peptide species within a particular chromatographic time window) is not increased which, in turn, might provide for more analytical depth (i.e., number of detected peptides/proteins in an experiment) because the mass spectrometer is not occupied with fragmenting all forms of the labeled peptide. (iii) There is evidence that label-free methods provide higher dynamic range of quantification than stable isotope labeling (Table 1) and therefore may be advantageous when large and global protein changes between experiments are observed. However, particularly for spectral counting, this comes at the cost of unclear linearity and relatively poor accuracy .
Analysis of quantitative MS data
When contemplating a data analysis strategy for proteomic data generated by quantitative mass spectrometry, it is worth reconsidering a couple of principle points. Quantitative proteomic data are typically very complex, and often of variable quality. This is in part because the data are incomplete: even the most advanced mass spectrometers, which can acquire several tandem MS spectra per second, are often overwhelmed by the number of peptides present in a sample. As a consequence, only a subset of all proteins present can be identified in any one analysis . For protein quantification, it is further mandatory to detect a protein in all experiments that should be compared. As a result, often only a subset of identified proteins can actually be quantified (Fig. 1) . Identification and quantification rates are direct functions of sample complexity. While a large fraction of proteins present in, e.g., affinity purifications can be identified and quantified using a reasonable number of acquired spectra, a much smaller fraction of the content of whole proteome shotgun experiments will be covered and with fewer spectra for each protein. This clearly limits the confidence in quantification results.
For protein quantification based on spectrum counting, the data processing steps are basically identical to the general protein identification workflow in proteomics which is one of the reasons why this approach has become so popular. Researchers can choose from a variety of methods available for automated protein identification and subsequent (probabilistic) validation of spectrum-to-peptide matches (for a recent review see Ref. ). It should be emphasized that for any quantification method it is mandatory to consider only those spectrum-to-peptide matches that are unique for a particular protein .
Extracting quantitative information from MS and MS/MS spectra
Quantification methods based on ion intensities, regardless of whether employing stable isotope labeling or not, require a number of additional steps prior to protein quantification (boxed area in Fig. 4). Two particular elements are important to mention here: intensity integration (i) within the mass spectrum (centroiding) and (ii) across the chromatographic peak. For low-resolution MS data, both aspects are carried out in one operation by extracting the ion chromatograms from the LC-MS data. For high-resolution MS data, the procedure is more complex and typically performed in two steps. Signal intensity integration within the mass spectrum can either utilize the intensity/area of the monoisotopic peak or the sum of the intensities/areas of all isotopomers of a peptide. Each method has its merits and detractions: monoisotopic peak integration is relatively straightforward to implement but not very sensitive particularly for larger peptides for which the monoisotopic peaks only constitute a minority of the total signal intensity. In addition, the use of heavy isotopes distorts the relative isotope distribution of peptides which leads to inaccuracies. In contrast, the summed area of the entire isotope cluster is the most sensitive and accurate method  as it utilizes all of the data but is more difficult to implement computationally. As discussed in a previous section, signal intensity integration over the chromatographic time scale is primarily required for label-free quantification as well as those stable isotope reagents that lead to significant differences in chromatographic behavior. For methods which do not suffer from this shortcoming, time integration can be performed but is not required. Instead, collection of several spectra for each peptide is generally useful in order to obtain several quantitative readings.
Quality control of raw MS data
There are several sources of potential error in the mass spectrometric readout of an LC-MS experiment that can negatively affect the results of peptide quantification. Spectra for which these errors are detected should be filtered out prior to computing quantification values. The first of these issues is the presence and variability of spectral background noise (Fig. 3b) which can be filtered out by most if not all available commercial and academic data processing packages. A second common issue is the presence of interfering signals other than background noise (Fig. 3b). For very complex peptide mixtures, these often constitute co-eluting peptides of very similar m/z values which in turn will render the correct assignment of signal intensities to particular peptide ions difficult. This is true for quantification in both MS and MS/MS spectra and such spectra should be removed from the analysis. Third, strong signal intensities can lead to detector saturation for some mass spectrometers (particularly quadrupole TOF instruments, Fig. 3c) which distorts the natural isotope intensity distribution and thus leads to false quantitative readings.
For stable isotope labeling, further quality criteria must be considered. One very simple and often incurred problem is systematic bias introduced by imperfections in mixing the two protein populations. Mixing errors can most of the time be determined experimentally and apply uniformly to all protein quantification values and are thus easily corrected for. A second systematic error is represented by the isotope purity of the employed labeling reagent which rarely exceeds 95–98%. Although this may not appear to be a significant source of uncertainty and, again, can be easily corrected for, isotope impurities lead to increased spectral interferences and, more importantly, limit the dynamic range of detectable differences between samples. A similar argument applies to incomplete incorporation of the isotope label into proteins and peptides. Again, while isotope incorporation can be measured and correction factors can be applied, the combination of the above items limits the dynamic range of detectable differences between samples to approximately 20–30:1. Consequently, determined changes are often smaller than their true values. It is important to keep in mind that this effect can be much more pronounced when spectral background contributes significantly to overall spectral intensity.
From spectra to relative protein quantification
Statistical analysis of experimental data
Statistical methods for proteomics
Protein change between conditions
Does a protein behave significantly different between two samples?
Multiple hypothesis testing
Does a protein exhibit time-dependent change?
Analysis of variance (ANOVA)
Is the sample a member of a defined class of samples?
Classification methods (e.g., linear discriminant analysis, support vector machines)
Dependencies between proteins
Which proteins behave similarly in the experiment?
Raw data from quantitative MS experiments are generally not suitable for statistical analysis, thus a number of preparative steps are required. First, raw data are typically not normally distributed, an assumption made by many statistical tests. Therefore, data are frequently log-transformed assuming that the data are lognormal-distributed. This operation typically also harmonizes the variance of data (otherwise high values would have large variances and vice versa). If replicates of the experiment have been generated, normalization of their data is mandatory because technical bias may overshadow the underlying biological effects (for details on normalization techniques, see Refs. [106, 107, 108]). As discussed above, technical effects include sample mixing errors, incomplete isotope incorporation, or isotope impurity. In many cases, systematic technical bias can be measured directly but in some cases requires dedicated experimentation (e.g., by a label swap experiment [109, 110]) to determine its source. The resulting information is used to build correction functions that are consecutively applied to the data. It should be noted that it is very likely that not all manifesting sources of systematic error have been described yet or that these are not readily amenable to determination (e.g., background contribution in iTRAQ experiments). It can be expected though that with the rapid evolution of proteomic technologies, many of these yet unknown sources of error will be uncovered and the learnings subsequently used to sharpen the data which, in turn, increases data quality.
Another challenge to a statistical treatment of proteomic data is the mostly random sequencing of peptides by the mass spectrometer. As a result, not every available peptide is identified in every experiment. This effect is more pronounced for peptides of low abundance and poor detectability, resulting in many missing values in an experiment. However, statistical methods often require complete data. In such cases, missing values may be estimated by, e.g., averaging available values of the protein from other replicates or using related values from other proteins from the same experiment. It should be noted though that estimating values inevitably results in decreased statistical power [111, 112].
Values that are grossly different from comparable observations (outliers) require special attention. They can either indicate a true observation of a particular peptide species, e.g., a regulated post-translational modification, or a false reading. In both cases, these data points should initially be excluded from the calculation of protein quantities but not categorically rejected. A common way to spot outliers is visual inspection by the investigator, leaving considerable room for subjective judgement. During calculation of protein values from individual spectra by linear regression (see above) outlier detection on the spectrum level is possible using established methods [113, 114] but may result in loss of valuable data. For data correction at the protein level, methods for multivariate data can also be adapted [115, 116].
Detection of differential protein expression
Characteristics and applications of statistical tests
Tests for experiments with replicates
Replications, n > 3
All quantitation methods
Data normally distributed
Replications, n > 1
All quantitation methods
Tests for experiments without replicates
(Very) large number of peptide spectra
Fischer’s exact test
(Very) large number of peptide spectra
(Very) large number of peptide spectra
In the proteomics case where many proteins are tested simultaneously, the probability of committing an error increases often dramatically. For example, when considering a list of hundreds of proteins at a defined error rate of, e.g., 0.01, it is likely that several false positives will occur by chance. However, when setting the thresholds too conservatively to minimize false positive rate (i.e., the rate that truly null features are called significant), this often leads to an unacceptable increase in the false negative rate (i.e., the rate that truly significant features are called null). Commonly used alternative measures of error rates in multiple testing procedures are the family wise error rate (FWER; i.e., the rate that one truly null feature is called significant among all tests) and the false discovery rate (FDR; i.e., the rate that features called significant are truly null) which break up the direct dependency between false positive and false negative rates. Instead of simply reporting rejection or acceptance of the specified hypothesis using these methods, a p-value connected to the test can be defined which describes the significance of a test as the smallest possible significance level at which the null hypothesis would be rejected. Various procedures for deriving adjusted p-values for multiple hypothesis testing have been suggested, e.g., the Bonferroni adjusted p-value for FWER and the q-value for FDR . q-Values have since also been adopted in proteomics research [123, 124]. A detailed overview of multiple hypothesis testing has been given by Dudoit and co-workers .
For a number of proteomic applications, sampling statistics (e.g., spectrum count, peptide count, sequence coverage) shows increasing potential. Zhang and co-workers  recently compared the aforementioned three approaches and found that the spectrum counting approach offered the greatest reproducibility. This is probably not surprising given that this approach generates many more data points than peptide counting or measuring sequence coverage. In addition this paper explores a number of statistical methods for data analysis. For experiments that feature three or more replicates of each condition, statistical difference can be assessed by the t-test as described above. However, if repetitions are not available, other statistical options have to be considered. To that end, tests may be applicable that attempt to mimic replicates by pooling certain features. For example, for each detected protein, spectral counts from a pair-wise experiment can be arranged in a two-way table (proteins vs. conditions). A protein is then called differentially expressed if its proportion of spectrum counts to the total spectrum count in the experiment is significantly different between both conditions. There are a number of possible statistical tests using different hypotheses for this approach (Table 3, bottom). The authors of the aforementioned paper conclude that Fisher’s exact test, the AC-test, and the G-test return comparable results. However the G-test is computationally simpler and can be generalized for multi-condition experiments and thus may be the more versatile approach. Results typically improve with increased sampling (total number of spectrum counts in an experiment). Despite the fact that the commonly used dynamic exclusion option during LC-MS analysis violates random sampling, Zhang et al. showed that the approach can be generally useful .
In contrast to statistical estimation, the performance of a chosen statistical test can often also be assessed experimentally by means other than multiple repetitions. One way of measuring errors directly and under the same analytical conditions is to offset the measurement of a particular sample to a dilution of the very same sample . Also, spiked proteins have been used to generate reference data for a set of proteins with known behavior that can be utilized for ‘calibrating’ an experiment type . Once the statistical parameters have been learned, these may be applied to subsequent experiments without the need for repetition. Although the statistical power of such approaches is lower than those based on multiple repetitions of the same experiment, the former may be sufficient particularly for samples of low protein complexity (e.g., affinity purifications). Further assessment of data significance may be provided by curve fitting methods (e.g., the LOWESS fit) which can reveal regions of random experimental error in the observed dataset .
A multitude of methods has emerged for the analysis of simple and complex (sub-)proteomes using quantitative mass spectrometry, and the field is beginning to learn for which type of study these methods can be meaningfully applied. However, significant further improvements to experimental strategies are required particularly for the quantitative analysis of post-translational modifications. It is probably fair to say that the field is still far from being able to generate quantitative proteomic data at a scale which would allow the comprehensive investigation of a biological phenomenon. At the same time, the recent exponential increase in data volume and complexity demands the development of appropriate statistical approaches in order to arrive at meaningful interpretations of the results. This can only be achieved if the influence of the employed technologies on the results obtained is well understood and by ensuring that experimental design follows the biological context so that the ‘right statistics’ can be developed for the problem at hand in order to generate scientific insight.
The authors wish to thank David Simmons and Ulrich Kruse for critically reading the manuscript and Frank Weisbrodt for help with preparing the figures. We are grateful to Nature Publishing Group for granting permission to reproduce and adapt previously published material.