Protein quantification by mass spectrometry (MS) is considered to make an essential contribution to strategies toward precision diagnostics [1]. Basically, uncertainties of 5%, or less, can be achieved with proteins if isotope labeled internal standards are employed (ID-MS) [2]. However, a lack of information about the impurity fraction in the calibrator material increases the overall uncertainty and may contribute to the lack of reproducibility and/or comparability of measurement results. Methods and approaches have just recently been reviewed for impurity determination in organic compounds intended for use as primary calibrators in quantitative analysis [3].

Rather than looking at small organic molecules, the present work is motivated by the additional need for well-characterized reference materials (RMs) for the targeted quantification of proteins. Depending on the measurement strategy involved, either proteotypic peptides or full length proteins are used for calibration [4,5,6]. For peptides, different approaches to impurity measurement have been studied, as was reviewed in Josephs et al. [7]. Direct quantification by amino acid analysis (AAA), quantitative nuclear resonance spectroscopy (qNMR), or elemental analysis was found to work best in many situations. To obtain accurate results, rigorous detection, quantification, and correction for interfering compounds are required [8,9,10]. A complementary approach consists of the one-by-one detection, identification, and quantification of individual contaminants as separate analytes to obtain the mass fraction of the impurity. Such mass-balance approaches, in spite of being labor-intensive, are a viable and common option for short peptides. Typically, solid-phase synthesis (SPSS) is used for their production. The main routes and causes of deviation from the intended amino acid sequence are well known for SPSS [11, 12]. In such a setting, therefore, the number of contaminants to be taken into account may be small and manageable.

In contrast to this, the practicality of both mentioned approaches is complicated when used for determining the purity of protein materials, if possible at all. Indeed, effective methods are available for removing host cell-related proteins (HCPs) from the target after expression. Still, there are circumstances that may cause significant amounts of HCPs to remain in the product. For example, in E. coli, this has been pointed out to typically happen if the expression yield is low [13]. Overexpression of the target might also induce expression of a number of bacterial proteins due to pleiotropism and/or stress conditions. There are recurring basic patterns of such proteins, as identified in Bolanos-Garcia and Davies [13]. These are confined to a much smaller subset compared to the original proteome. Although this reduces their number, the presence and individual abundances of HCPs may vary between preparations. In many instances, residing HCPs will still be many in number, thus limiting the practicability of the one-by-one approach.

Here, we systematically evaluate the use of label-free quantification (LFQ) of proteins by precursor-ion (MS1) intensities to reliably obtain the mass fraction of HCPs in a given sample. Central to the functioning of LFQ is that an amount of any peptide produces a specific amount of MS1 intensity per mass of protein regardless of what the individual peptides (proteins) are. Unlike in most applications of LFQ, [5, 6, 14, 15] in our context, peptide intensities are not collected separately per protein, but are rather integrated over all proteins identified from the host cell proteome. We demonstrate that this quantity can be calibrated against known amounts of the host cell proteome. Beyond this, it will be shown that any other proteome, or even a protein mixture of just a few common proteins, could likewise be used for calibration. The example presented here is hemoglobin-A2 (HbA2), a pure-substance material obtained by overexpression in E. coli. The material was produced as a primary calibrator for the quantification of the mass fraction of HbA2 in whole blood by isotope dilution mass spectrometry (ID-MS) [16, 17]. Besides the natural form, a U-15N-labeled version was also engineered, thus providing an internal standard. In either case, purification by immobilized metal affinity chromatography (IMAC) had left an estimated 5–10% mass fraction of E. coli proteins. In addition to the (natural) HbA2 material and the (labeled) U-15N-material, a third material was included in the study. This was obtained through further purification of the natural HbA2 material (referred to as ultra-purified HbA2). This was then used to demonstrate the applicability of the approach to low-level impurity materials as well.

Experimental section

Study materials

Recombinant HbA2 (α2δ2) in natural and U-15N-labeled form was kindly provided by the European Commission, Joint Research Centre, Geel, Belgium. The materials were produced by Trenzyme GmbH as previously described [17] using P69905 and P02042 (UniProtKB) as templates for the co-expression in E. coli of the α and δ subunits, respectively. Both materials were obtained as solutions of about 0.42 mg/g (natural form) and 0.47 mg/g (labeled form) in 50 mmol/L 2-Amino-2-(hydroxymethyl)propane-1,3-diol (Tris), pH 7.5, and 100 mmol/L NaCl. The sample volumes were 28 mL (HbA2) and 26 mL (U-15N-HbA2). The recombinant HbA2 (α2δ2) of natural isotopic composition was ultra-purified by semi-preparative strong anion exchange chromatography using a MONO-Q 4.6/100 PE column, yielding 8 mL of ultrapurified HbA2 at a concentration of 0.43 mg/g. This was then used as the third study material. All study materials were stored in a fridge at 4–8 °C until used.

Calibrators

HbA2 used for preparation of the calibrators

The material was obtained from SIGMA-Aldrich, cat. No.: H0266; lot: SLBK8749V, as a neat substance. The “protein-purity” was 99.0% using the LFQ method described herein. For value-assignment, a stock solution was prepared from this solid material by dissolving ∼2 mg in 1 g of Tris (10 mmol/L, pH 7.8). The mass fraction of HbA2 in the solid material was 482.7 mg/g as determined by AAA. A stock solution of the HbA2 material in Tris (10 mmol/L, pH 7.8) was prepared. An aliquot of this was used as a constant component present in each of the calibrators (red in Fig. 1A).

Fig. 1
figure 1

LFQ-based measurement of HCPs (E. coli) mass fraction in recombinant HbA2. A Calibrators were obtained by spiking aliquots of HbA2 stock solution (red) with increasing amounts of E. coli lysate (green). Amounts per mass (mg) of both components (HbA2 and E. coli) and mass fractions of E. coli were hence known for each calibrator through amino acid analysis. B Quantitative information was acquired by shotgun proteomics. C MS1 intensities were integrated over all features associated with peptides identified from either E. coli or HbA2. D The sum of all MS1 peak intensities from peptides associated with E. coli per sum of all peak intensities associated with HbA2 peptides for the calibrators were plotted vs. mass fractions. The fitting linear function was then used to calculate the E. coli fraction in the investigated materials from sample measurements (red line and arrows)

E. coli proteome sample

Lyophilized E. coli protein material was obtained from BIO-RAD (ReadyPrep, Catalog 163 2110, L9703999, Control 310,004,134). For a stock solution, the material was reconstituted in water (30% acetonitrile, 0.1% formic acid); the mass fraction of E.coli proteins as determined by AAA in that stock solution was 0.345 ± 0.012 mg/g. A series of calibrator samples was prepared by mixing an aliquot of HbA2 stock solution with the appropriate amount of E. coli stock solution (red and green, respectively, in Fig. 1A). The mass fractions of HbA2 in these sample solutions were 0.390, 0.383, 0.372, 0.361, and 0.350 mg/g (solution). The corresponding mass fractions of E. coli protein, relative to HbA2 in the calibrator samples, was 10.3, 25.0, 52.7, 80.1, and 111.6 mg (E. coli)/g(HbA2).

Yeast proteome sample

Yeast-based calibrators were prepared in the same way as described for E. coli. A whole-cell protein extract of Saccharomyces cerevisiae (Promega, V7341, lot 434,786) was deployed. The material came as a solution in 50 mmol/L Tris and 6.5 mol/L urea. Solutions containing mass fractions of HbA2 (0.390, 0.387, 0.383, 0.378, and 0.372 mg/g) were prepared. Mass fractions of yeast proteins in these solutions were 12.7, 26.5, 52.4, 76.6, and 113.6 mg/g (relative to HbA2).

Human K562 proteome sample

A whole-cell protein extract from human K562 cells (Promega, V6941, lot 444,583) was dissolved in 50 mmol/L Tris and 6.5 mol/L urea, as above. Solutions containing mass fractions of HbA2 (0.390, 0.387, 0.383, 0.378, and 0.372 mg/g) were prepared, comprising also K562-proteins at 15.0, 32.7, 63.9, 94.3, and 136.2 mg/g as mass fractions relative to HbA2.

Protein mix

Human C-reactive protein (CRM GBW09228, National Institute of Metrology, China), human insulin analog (insulin aspart, NovoLog) and human β2-microglobulin (kindly provided by the European Commission, Joint Research Centre, Geel, Belgium) were obtained as solutions. Bovine serum albumin (Sigma-Aldrich, cat. No. 05470, lot No. 1099572), myoglobin from horse skeletal muscle (Sigma-Aldrich, cat. No. 70025, lot No. 381848/1), cytochrome-c from bovine heart (Sigma-Aldrich, cat. No. C3131, lot SLBZ0555), somatotropin (NIBSC, WHO International Standard 98/574), human ceruloplasmin (Athens Research & Technology, cat. No. 16–16-030518), and serotransferrin (Sigma-Aldrich, cat. No. T3309, lot BCBR1763V) were obtained as solids and had to be dissolved to known concentrations in water prior to use. The mass fractions of somatotropin, ceruloplasmin, serotransferrin, β2-microglobulin, and insulin were determined by mass spectrometry based AAA, while certified values were used, as provided by the supplier, for C-reactive protein, albumin, cytochrome-c, and myoglobin. Aliquots of these solutions were mixed to yield a stock solution containing somatotropin, ceruloplasmin, serotransferrin, β2-microglobulin, insulin, C-reactive protein, albumin, cytochrome-c, and myoglobin in the mass-ratio of 0.1055:0.1176:0.124:0.1251:0.1247:0.0317:0.1215:0.1260:0.1233. Aliquots of this mixed solution were spiked with aliquots of the HbA2 stock solution, resulting in HbA2 mass fractions of 0.393, 0.388, 0.383, 0.378, 0.372, and 0.367 mg/g, and protein mass fractions, relative to HbA2, of 22.7, 47.4, 69.9, 92.8, 117.1, and 140.8 mg/g. To additionally cover the low HCPs fraction range as needed for the ultra-purified HbA2 material, a second series of calibrator samples was prepared. These calibrators were of 0.427 mg/g HbA2 mass fraction and 0.2, 0.3, 0.4, 0.7, 1.1, and 1.4 mg/g protein mass fractions relative to HbA2.

Determination of protein mass fractions in the calibrators

For the stock solutions used to prepare the calibrators, the mass fractions of amino acids were determined by mass spectrometry based AAA, as detailed in Arsene et al. [17]. These mass fractions were then combined with the known mass fractions (or relative amounts) of these amino acids in the protein or proteome to yield the protein mass fraction in that stock solution. In the cases of E. coli, yeast and K562, relative amounts (by mass) of amino acids were used, as was previously published [18,19,20]. The contribution to the overall estimated measurement uncertainty from AAA (in our laboratory) and uncertainties published with literature data were combined to yield expanded (95%) uncertainties of 3.5%, 2.7%, 3.0%, and 3.2%, respectively, for the E. coli, yeast, K562, and protein-mix calibrator stock solutions.

Proteolysis

To a 30 µL aliquot of sample (recombinant HbA2) or calibrator, 70 µL of Tris solution (35 mg Tris base, 46 mg Tris HCl, dissolved in 1 mL water) were added. Proteolysis (37 °C) was started by the addition of 10 µL of trypsin solution (1 mg/mL in 50 mM acetic acid). Trypsin from porcine pancreas was obtained from Sigma-Aldrich, St Louis, USA; cat. No.: T0303. After 10, 70, 130, 190, and 250 min, further 10 µL aliquots of trypsin solution were added. In parallel, 40 µL aliquots of acetonitrile were added after 10, 30, 60, 90, 120, and 150 min, respectively. The sample or calibrator was further incubated at 37 °C overnight. For reduction, 0.8 mg of dithiothreitol (DTT) was added. After incubation (37 °C) for 1 h, 3 mg of 2-iodoacetamide were added for alkylation (30 min at room temperature). The excess of 2-iodoacetamide was quenched with 3 mg of DTT. The reaction was stopped by the addition of 10 µL of formic acid (10 vol.-%). The sample or calibrator was desalted using solid-phase extraction (SPE) C18 ec cartridges (Chromabond, 100 mg, Macherey Nagel, Düren, Germany). After lyophilization, residues were redissolved in 40 µL of water (0.1% formic acid) and subjected to nLC-MS/MS analysis.

Liquid chromatography-mass spectrometry

An UltiMate 3000 RSLCnano HPLC system (Thermo Fisher Scientific) coupled to a timsTOF Pro mass spectrometer (Bruker Daltonics) was used for the analysis of the proteolysed samples and calibrators. Peptides were trapped on a pre-column (Acclaim PepMap C18, 5 µm, 0.3 × 5 mm) and then separated on a Bruker Fifteen nanoFlow column (15 cm × 75 µm, C18, 1.9 µm, 120 Å) using a linear water-acetonitrile gradient from 1 to 60% B in 210 min and then from 60 to 80% B in 20 min (with solvent A: water, 0.1 vol.-% formic acid and B: acetonitrile, 0.1 vol.-% formic acid) at 40 °C. The flow rate was 300 nl/min. The timsTOF Pro mass spectrometer was equipped with a CaptiveSpray ion source. The mass spectrometer was run using the DDA-PASEF-standard-1.1 s-cycle time method, as provided by Bruker. Briefly, the settings were 10 PASEF MS/MS scans per acquisition cycle with a trapped ion mobility accumulation and elution time of 100 ms. Spectra were acquired in a m/z range of 100 to 1700 and in an (inverse) ion mobility range (1/K0) of 0.60 to 1.60 Vs/cm2. The collision energy was set up as a linear function of ion mobility starting from 20 eV for 1/K0 of 0.6 to 59 eV for 1/K0 of 1.6.

Protein database search

PEAKS Studio Xpro (Bioinformatics Solutions Inc.) was used for feature detection/database searching and precursor ion (MS1) quantification. Databases for E. coli, Saccharomyces cerevisiae, human proteome, and the mixture of nine proteins were obtained as FASTA files (uniprot.org, accessed: 27. Aug. 2021). FASTA files of human hemoglobin subunit alpha and delta (Uniprot: P69905 and P02042) were added to the databases of non-human proteomes. The following settings were applied for data analysis: carbamidomethylation of cysteine as fixed modification, methionine oxidation, and glutamine or asparagine deamidation as variable modifications. A maximum of two modifications per peptide were allowed. With the human proteome, glycosylation was set as an additional variable modification using the built-in glycosylation list. Trypsin/P was set as the enzyme, and no more than two missed cleavages per peptide were allowed. The mass tolerance for the monoisotopic mass of precursor ions and fragment ions was 15 ppm and 0.05 Da, respectively. For the retention time and ion-mobility of an identified peptide, the shift tolerance between different runs was 3 min and 5%, respectively. Mass correction was enabled for precursor ions. The false discovery rate (FDR) was 1% at the peptide and protein level. The minimum length of identified peptides was seven amino acids. Results of quantification were obtained as peak areas at the protein-level.

Mass fraction of impurity in the labeled HbA2 material

The mass fraction of impurity in the labeled HbA2 material was 78.1 ± 8.6 mg/g. The individual results from n = 6 repetitions of label-free quantification were 70.8, 72.4, 84.2, 92.8, 73.2, and 75.0 mg/g.

Fractions of co-purifying proteins

The fraction of E. coli proteins known to frequently be co-purified [13] was calculated as the ratio of the (MS1) intensity of these proteins to the intensity of all E. coli proteins identified in the HbA2 material or in the E. coli proteome sample. For Fig. 3, fractions were calculated for the two HbA2 materials and compared to the fraction for an E. coli proteome sample containing a similar amount of E. coli proteins (80.1 mg/g).

Downstream data analysis

Further analysis of the data exported from PEAKS was based on Python 3.8 with the modules pandas, numpy, numpy.linalg, and Matplotlib imported as needed. For datafit and cross-validation, Scikit-learn [21] 1.0.2 was used.

Data availability

The mass spectrometry data and the tables of identified and quantified proteins have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD041736.

Results and discussion

Calibration and value assignment to the HbA2 materials

The mass fractions of E. coli in the HbA2 (raw) products (natural and labeled HbA2 materials), as well as in the ultra-purified HbA2 material, were quantified based on a standard shotgun proteomics approach, as illustrated in Fig. 1. MS1 intensity ratios (E.coli ÷ HbA2) were calibrated against known mass fractions of E. coli proteins relative to HbA2 (Fig. 1A and D). In this way, HbA2 technically also provides a pseudo internal standard (PIS), [5] improving the reproducibility of the measurements. For calibration, a linear model was established for the dependence of the instrumental response (MS1 intensity ratio) on the mass fraction of proteins. In turn, as is common in quantitative chemistry, this model function was resolved for the fraction as a dependent variable, thus allowing the fraction to be predicted from an intensity ratio as input (red arrows in Fig. 1D). The result of such a calibration with n = 3 repeats at each level, and subsequent value assignment to both HbA2 materials (natural and labeled) is shown in Fig. 2A.

Fig. 2
figure 2

Calibrating MS1 intensity ratios (proteome ÷ HbA2) vs. mass fractions of HCPs impurity in HbA2. A Calibration using E. coli lysate. B Joint calibration using a set of different proteomes in addition to E. coli: E. coli lysate (green), yeast (yellow), human K562 cell line (blue), and a mix of nine neat proteins (red). Dashed lines: linear regression fit. Annotations: results for the HbA2 material and the U-15N-labeled HbA2 material using these calibrations

Representation of the studied materials by the E. coli lysate

The impurity protein profiles in the studied materials differ significantly from those in the whole cell lysate. This is illustrated by different aspects in Fig. 3. First, as expected, the set of identified E. coli proteins in the materials is significantly reduced relative to the whole cell lysate (Fig. 3A). Many of these are known to typically be co-purified, if using IMAC for cleanup. Particularly, YfbG (P77398), YodA (P76344), GlmS (P17169), and ArgE (P23908) correspond to proteins previously reported in this context [13]. They make up a fraction of 60–70% (orange in Fig. 3B) in the studied materials, but represent only about 4% in the whole cell lysate. The difference in protein profiles is further substantiated by a principal component plot of results, as shown in Fig. 3C. Two series of samples of systematically changed E. coli mass fractions are shown. The first one is simply the same data as was acquired with the E. coli lysate calibration (green in Figs. 2A, 3A, and C). The second one was generated by dilution of the labeled HbA2 material (153 proteins, blue in Fig. 3A and C). Unlike with most applications of principal component analysis (PCA), no data scaling was applied for the results in Fig. 3C. Object scores (samples at different levels of mass fraction) and feature loadings (E. coli proteins quantified) are jointly shown in the space of the first two components (PC1 and PC2). In this kind of presentation, the proximity of a protein (small black crosses) to a series (blue or green circles) corresponds with the involvement of that protein in the MS1 signal ratio for that series. At the same time, the distance from the origin quantitatively reflects the degree of this involvement. Visibly, the majority of proteins are significantly involved in just one of both materials. This particularly holds for the top abundant ones, such as YfbG. As an exception, on the other hand, YodA is one of the few markedly involved in both materials, though not to exactly the same extent.

Fig. 3
figure 3

Comparison of the E. coli protein profile in the lysate with the HbA2 materials. A Identifications. B Relative amounts we found in these samples of frequently being co-purified [13] E. coli proteins (orange). C Interrelation of E. coli protein fractions in E. coli lysate (green circles) and in labeled HbA2 material (blue) with MS1 intensities of individual proteins (x), illustrated by a principal components plot. The series of protein fractions for the lysate is based on the same data source as the calibration data shown in Fig. 2A and B (green subset), whereas the HbA2-series was obtained by dilution of the labeled HbA2. Areas of the circles are in proportion with the fractions in the pertaining samples. The plot is a projection of the data on the plane of the first two principal components, PC 1 and PC 2, of the joint dataset (lysate plus labeled HbA2). Variance coverage: 61.6% (PC 1) and 36.1% (PC 2)

The demonstrated difference in protein profiles obviates prediction of impurity in an unknown material by linear regression models using as inputs the individual proteins from another material (as e.g., E. coli). As previously mentioned, LFQ works around the problem by integrating signals over all proteins for E. coli on one hand and in relation to HbA2 on the other, in order to map the mass fraction. This notion, indeed, has been the consensus in the literature for some time, [22,23,24] but still was put to the test for the present purpose. Moreover, additional proteomes, beyond E. coli lysate, were used for calibration, but were otherwise subjected to the same workflow as before. These proteomes were yeast, K562, and a mix of nine neat proteins. The eventual reduction to the simple protein mix was on purpose to provide an artificial proteome with a minimum number of components. Results of the respective calibration runs are plotted in Fig. 2B; HCPs mass fractions obtained for the two HbA2 materials by application of these calibrations are annotated in Figs. 4A and B. Apparently, there is good agreement on the whole between the individual plots. This supports the assumption that the individual linear calibration models (per proteome) are samples from a common statistical population. This in turn suggests that calibration based on the E. coli lysate (Fig. 2A) should essentially be valid for predicting HCPs fractions in the studied materials, too. Finally, pooling all individual calibrations into a common one is possible, as shown in the top trace (black) in Fig. 4, which may enhance the statistical robustness of value assignments.

Fig. 4
figure 4

HCPs fractions determined by LFQ for both HbA2 materials, and degree of equivalence between different calibrator sources: E. coli lysate (green), yeast (yellow), human K562 cell line (blue), protein mix (red), and all of these series merged into one (black). A Natural isotopic HbA2 material and B isotope labeled version; C distribution of prediction error obtained by leave-one-out cross-validation of the calibration data; orange lines and numbers provide the medians. Circles: fliers. Error bars in (A) and (B) correspond to the medians in (C)

Overall uncertainty

The reasonably identifiable major sources of uncertainty are associated with (i) mass fractions assigned to the calibrators (Fig. 1A) and (ii) repeatability of sample preparation, calibration, and measurement (Fig. 1B and D). Treating the underlying random variates as independent of one another, and referring to Ucal, Umeas as expanded relative uncertainties, [25] the resulting overall uncertainty on the impurity is U 2imp = U 2 cal + U 2meas.

The calibrator uncertainties (i), Ucal, are likely to be dominated by value assignment of mass fractions to the calibrator stock solutions (HbA2, E. coli lysate, yeast, K562, and protein mix). ID-MS-based amino acid analysis (AAA) was employed for this, with 3.5% uncertainty or less at 95% confidence (see “Experimental section”, “Determination of protein mass fractions in the calibrators”).

The uncertainty contribution by sample preparation, sample measurement plus establishment of the calibration was estimated from the results of calibration measurements according to a common approximation; see, e.g., [26], chapter 5. For the present purpose, the set of n = 18 calibration results for the protein mix was used. The reference to the protein-mix calibration is motivated by the fact that the prediction error was highest with the protein mix compared to the other calibrator sources, consequently providing us with a conservative estimate. As the outcome does not only depend on the calibration results, but also on the number of sample measurements that were averaged to calculate the result, two different standard uncertainties were obtained: umeas = 16.7% for the natural HbA2 -material (with just one measurement), while umeas = 8.3% for the labeled one, as based on six measurements. With k = 2.12 (using the student’s t value for 16 degrees of freedom), the expanded uncertainties, Umeas, are 35.4% for the natural and 17.6% for the labeled material, resp. Combining uncertainty components Ucal and Umeas, eventually yields Uimp = 35.6% and 18.0%, resp. Based on this, the impurity mass fractions are 84 ± 30 mg (impurity)/g(HbA2) in the natural and 78 ± 14 mg (impurity)/g(HbA2) in the labeled material. In terms of purity, this makes 923 mg(HbA2)/g(total protein) with a confidence interval of 898–949 mg/g and 928 mg(HbA2)/g(total protein) with an interval of 916–940 mg/g. This corresponds to about ± 2.8% and ± 1.3%, resp., in terms of relative uncertainty on the purity values.

Result for the ultra-purified HbA2 material

To demonstrate the scalability of the method, a separate calibration was performed in the range of 0.1–1.5 mg/g fraction of protein mix. Based on duplicate measurements of calibrators at 0.15, 0.3, 0.43, 0.71, 1.08, and 1.42 mg/g, a linear fit was obtained at a coefficient of determination of r2 = 0.964, comparable to the previous broader-range calibrations (as, e.g., 0.932 with E. coli, cf. Figure 2A).

By a single sample measurement based on the established calibration, in the same way as above, umeas = 11.6% and Umeas = 25.8% (k = 2.23) were obtained and combined with Ucal (3.5%) to the overall uncertainty Uimp = 26.1%. This results in 0.86 ± 0.22 mg(impurity)/g(HbA2), corresponding to 999.1 mg(HbA2)/g(total protein) for the HbA2 fraction, or purity, resp., with a confidence interval of 0.9989–0.9994 mg/g. This is equivalent to 0.02% uncertainty.

Sample size and associated error

The previous results suggest that calibration can in practice be performed with a small number of well-characterized proteins just as well as with complex biological materials. However, LFQ depends on the assumption that the peptides captured by their MS1 intensities are models from the same population, for sample and calibrator, as regards molar sensitivities. Consequently, on significantly reducing the number of peptide species involved, an associated sampling error will become apparent, thus increasing the overall measurement uncertainty. In Fig. 5, results of a simulation are shown that seeks to estimate the size of that potential error. Calibration based on the protein mix was used and the deviation of obtained HCPs mass fraction for the (unlabeled) HbA2 material calculated, assuming a reduction in the number of proteins used for calibration down to n = 8 − 1 proteins, randomly selected from the nine. To generate a distribution of possible outcomes, 100 random drafts of this number of proteins were acquired at each level, and the respective results for the HCPs mass fraction were calculated. The data shown in Fig. 5 cannot exactly map reality, of course, since, even if using all of the nine proteins, the sampling error will be less than with just one, but cannot completely disappear at n = 9. As such, Fig. 5 does not exactly reflect the ground truth, but it should be close. Accepting this, the example suggests that a number of five proteins may suffice on average to keep the sampling error at 15% or less.

Fig. 5
figure 5

Estimating the sampling error caused by the finiteness of the number of proteins/peptides used for calibration or present in the sample. The data shown are results for the HbA2 material after stepwise reduction of the number of proteins included in calibration with the protein mix. Solid line: median obtained at n = 9 (87.7 mg/g), dashed: ± 15%. Scatterpoints: median results after recalculating the calibration function using random drafts of n = 1–8 out of the originally nine proteins. Dark grey area: corresponding standard deviations (shown here relative to the solid line, rather than to the medians)

Top-N protein quantification strategy as an alternative

In the introduction, we claimed an exhaustive one-by-one quantification of individual HCPs to be non-practical if the aim is to find the total amount (or fraction) of HCPs in cell-expressed isolated proteins. Revision of the data shown in Fig. 3C indeed suggests the option of individually quantifying the top 5 E. coli proteins (YfbG and YodA for the most abundant ones) and taking the sum as an estimate.

However, integrating the MS1 signals over these five proteins results in only 69% of what was obtained if integrating over all proteins (as proposed in this study) for the natural material and 88% for the labeled one. In terms of impurity fraction, this would be a systematic error of 31% and 12%, resp., caused by non-captured proteins, resulting in overestimation of material purity by about 2.5% for the natural HbA2, and about 0.9% for the labeled one. Depending on the intended use, this added systematic uncertainty may still be acceptable. However, a further argument in favor of LFQ is that the one-by-one approach is likely to be more expensive, compared to a series of simple shotgun-experiments, as required in LFQ.

Conclusions

LFQ is applicable to the quantification of host cell-derived impurity in bioengineered proteins. Calibrating the integrated MS1-intensity for all HCPs against the same quantity obtained for samples of known mass-fractions is a straightforward solution to the problem of quantitatively capturing a composite set of individual proteins ultimately to be expressed as a gross-measurand. The viability of this process is not hampered by the fact that the profile and identities of HCPs do not normally coincide with those of the calibrator material. This opens up the option of using proteomes for calibration other than those suggested by the expression system. This commutability of materials means that simple mixtures of well-characterized proteins are also viable candidates.

For the natural HbA2 material, we estimate 84 ± 30 mg(impurity)/g(HbA2), for the isotope-labeled HbA2 material 78 ± 14 mg/g and for the ultra-purified (natural) HbA2 material 0.86 ± 0.22 mg/g. This translates to 923 ± 2.8%, 928 ± 1.3%, and 999.1 ± 0.02% mg(HbA2)/g(total protein), respectively, fractions of HbA2 in the materials. The latter provide the correction factors to be applied to a quantitative result, if using these materials as a reference. For the first two (IMAC-purified) materials, an uncertainty result of ≈1.3–3.0% contributed to the overall budget for the analytical result, while ≈0.02% was obtained for the ultra-purified material. Considering the expense in terms of material loss, and assuming a target of 5% uncertainty as acceptable for the protein as a measurand in a biological sample, immediate use of the IMAC-purified material would have been optimal when compared to the efforts required for further purification.

Although discussed here in the context of value-assignment to materials to be used as primary calibrators with protein quantification, LFQ is increasingly also being used in areas, such as process optimization and quality control of pharmaceutical products [15, 27,28,29,30,31]. Typically in most of these applications, it is about quantification of individual proteins, rather than aiming at a mass fraction as a whole for HCPs. However, capturing a mass fraction of HCPs as a gross quantity, as discussed in this paper, or selectively for protein-subclasses of particular interest, could gain importance in these industries for reasons of particular toxicity of such classes, or other legal requirements.