Introduction

Protein-based biotherapeutics comprise a large and promising market within the United States. Biologic drugs, including protein based biotherapeutics and monoclonal antibody drug products, made up 23% of global pharmaceutical sales in 2014 and are expected to rise to 27% by 2020 [1]. Furthermore, protein-based biotherapeutics comprise 25 of the 50 top selling drugs around the world [1]. Unlike synthetically-produced small molecule drugs, biotherapeutics are primarily produced through recombinant approaches in mammalian cells or other expression systems (i.e., E. coli). The complex nature of these bioprocessing techniques results in inherently heterogeneous products that require elaborate technical analysis using advanced analytical technologies in order to ensure product quality [2, 3]. Thorough characterization of biotherapeutics must be conducted as moderate changes in various product attributes, such as glycosylation, oxidation, and deamidation, may alter efficacy and/or prove to be immunogenic [4, 5].

One of the major techniques used for the characterization of biotherapeutics is mass spectrometry (MS) [6]. MS can be used to analyze many product quality attributes, including molecular weight [7], amino acid sequence [8], post-translational and chemical modifications [9], as well as both product-related [10, 11] and process-related impurities. With recent technological advances, MS has been increasingly applied to the analysis of protein higher order structure (HOS) [12]. Recent technological advances in MS instrumentation have facilitated a significant increase in the number of areas where the application of MS is useful. There are multiple general workflows that rely heavily on the use of MS, including intact mass analysis, peptide mapping, and detached glycan analysis. For each of these workflows, there are multiple parameters throughout the analytical process that can be varied, including the ion introduction technique used, the scan types used, and the instrument platform used. The use of a combination of these workflows with varied instrument and methodological parameters can result in a highly detailed characterization of many quality attributes to create a detailed picture of the biotherapeutic.

A recent FDA data mining study surveyed the methods used for glycosylation analysis in monoclonal antibody drug applications and found that the use of MS was highly prevalent in the biopharmaceutical industry [13]. That study inspired this more inclusive analysis of MS use in protein biologics license applications (BLAs). The goal of this study was to monitor the usage and progress of MS implementation in the drug substance characterization sections of protein-based biotherapeutic BLAs. The purpose of this study was to inform the public and industry of the trends in MS use in approved BLAs. Biotherapeutics were classified into four categories: antibodies, fusion proteins, antibody-drug conjugates (ADCs), and other proteins. BLAs for vaccines were outside the scope of this study. The BLAs for 79 biotherapeutics approved between 2000 and 2015 were investigated for the extent of MS utilization in their drug substance characterization sections.

Methods

Information related to the use of MS was extracted from the quality sections of 80 BLAs that were approved by the FDA between 2000 and 2015. Ten additional BLAs approved during this time period were excluded from the study because they were not submitted electronically. The information was entered into a database template. Extracted data included basic product information, including product type and approval date, as well as more detailed MS usage information. Thirty-two specific MS attributes were analyzed. For each of the attributes, the general workflow (intact mass, peptide mapping, glycan profiling, other), the instrument introduction method (ESI, LC-ESI, MALDI), the scan type (MS, MS/MS, etc.), and the instrument were noted.

After the database template was completed, the data were evaluated based on several classifying factors. Only one of the 80 BLAs analyzed did not use MS. This BLA was excluded from further analyses. The analysis of the remaining 79 applications identified that a variety of MS techniques and instrumentation were used across the BLAs employing MS. The data were further mined, examining the change in use of MS over time and any differences in the utilization of MS between different types of biotherapeutics. Specific MS usage details were evaluated within the areas of amino acid sequence determination, determination of molecular mass, disulfide bond localization, and characterization of glycosylation and other protein modifications.

Data were analyzed based on a variety of binning criteria (including drug class, attribute, instrumentation type, approval year, and instrument introduction method), and graphs were prepared using Microsoft Excel for visualization of usage trends. For time course analyses, BLAs were binned by year with four-year increments.

Results and Discussion

Between 2000 and 2015, the FDA approved 90 protein-based biotherapeutic BLAs. Over the time-course studied, particularly in the last two years, the number of approved BLAs has increased (Figure 1a). Of these applications, 80 were submitted electronically after the implementation of the electronic common technical document (eCTD) submission module. The characterization sections of these 80 electronically submitted BLAs, including both the drug substance structural elucidation and impurity sections, were examined for the use of MS. Among the 79 approved electronic BLAs that used MS, the most common type of biotherapeutic was antibodies, representing 50% of the total number of applications. Additionally, non-fusion proteins represented 36%, fusion proteins represented 11%, and antibody-drug-conjugates (ADCs) represented 3% of the dataset (Figure 1b). Within this study, the workflows, methods, and instrumentation used within those 79 BLAs were analyzed in great detail, and the results were related back, when possible, to the particular drug type being characterized.

Figure 1
figure 1

Data overview. (a) Protein-based biotherapeutic BLA approvals over time. (b) Distribution of analyzed biotherapeutic product types. Four major categories of biotherapeutic BLA were analyzed: monoclonal antibodies (mAbs), other proteins, fusion proteins, and antibody drug conjugates (ADCs)

The analysis of these BLAs focused on four major components: quality attributes, MS workflow, MS instrumentation, and MS methodology. The quality attributes were the specific attributes that were characterized through MS, the MS workflow was the general analytical method, the MS instrumentation included specific instruments, and the MS methodology included information about the introduction techniques and the scan types used. These details were monitored using a comprehensive spreadsheet. Data sorted in accord with the specific workflow used were further classified within the context of particular attributes and drug types.

Quality Attributes

Thirty-two specific quality attributes were monitored to determine if MS was used to identify or characterize those attributes within the BLAs (Table 1). The two attributes for which MS was most commonly used were amino acid sequence analysis and molecular mass analysis; 99% of the BLAs analyzed used MS in the analysis of one or both of these attributes, indicating the importance of MS in the basic characterization of these products. These two attributes are so often analyzed because they provide the most basic structural information of the protein: its size and its sequence. In addition to these top two attributes, the next six most analyzed attributes were for the characterization of various protein modifications. The characterization of these key attributes, especially when combined, can provide enough information to create a representative map of a protein, including its major heterogeneous species.

Table 1 Major MS Attributes for Analysis. Thirty-two Specific MS Attributes were Found to be Analyzed at Varying Levels Across BLAs

MS usage for the analysis of these top eight attributes was analyzed over time (Figure 2a). For this analysis and all of the time courses within this study, data were binned in four-year increments to normalize for variability in the number of BLAs per year. Amino acid sequence analysis was conducted via MS at a consistently high level (greater than 91%) over the BLA approval year time course. MS-based molecular mass analysis was widely used at the start of the time period at 83% of the BLAs and has continued to increase in use over time; 97% of the most recent bin of BLAs used MS for molecular mass determination. While approximately 50% of the applications approved early in our time range used MS for disulfide bond mapping, the utilization increased above 90% between 2004 and 2011 before decreasing in the 2012–2015 bin. This pattern may be indicative of the versatility of the method; the decrease may be a reflection of alternative methodologies that can be used to identify disulfide bond patterns or it may indicate a change in the type of products where such methods would not be necessary or suitable. Glycosylation analysis through MS was consistent over the time period studied, with a low at 62% and a high of 76%. N-terminal and C-terminal sequence variants, most commonly pyroglutamate formation and lysine clipping, respectively, have had similar patterns of MS analysis over the time period. Specifically, both modifications were analyzed through MS at a level of approximately 40% in the earliest time period and have been implemented to a greater degree to what appears to be a plateau at around 70%. Finally, usage of MS for deamidation and oxidation characterization was low at the early end of the time period studied (17%–25%) and has risen steadily to a much higher level in recent years (76%–80%). The increased application of MS for oxidation and deamidation analysis may be a reflection of the increased availability and capabilities of high resolution mass spectrometers that are capable of consistently performing these analyses and have become more available in the marketplace in recent years.

Figure 2
figure 2

MS attribute analysis. (a) Top eight MS attributes over time. BLAs were binned by year with four-year increments. The percentages of BLAs that examined the top eight MS attributes are shown for each bin. Percentages are based on the total number of electronic BLAs that used MS. (b) The mean number of MS attributes analyzed per BLAs per year. (c) The mean number of MS attributes per BLA is shown based on product type

In addition to increased usage of MS for the analysis of individual attributes, the number of attributes analyzed by MS within each BLA also increased (Figure 2b). In 2000, the average number of MS attributes per BLA was two. In 2015, this average had risen to 11 attributes per BLA. In particular, in one BLA in 2015, MS was used to analyze 18 different attributes. The number of MS attributes per application was also broken down by type of biopharmaceutical product (Figure 2c). On average, across all biopharmaceutics, the average number of MS attributes per BLA is just over eight. For antibodies, this number was greater than 9.5, whereas for general proteins the number decreased to just over six. The differences in number of attributes per type of biopharmaceutical drug product may be construed to be a reflection of the complexity of the drug product. Products that are generally more complex require additional levels of characterization, and many of these products were found to be characterized through the use of additional MS-based workflows.

MS Workflow

The three major MS workflows monitored were peptide mapping, intact mass analysis, and glycan profiling. Here, peptide mapping analysis is a bottom-up workflow, where the protein is digested before MS analysis. Intact mass is a top-down workflow where the intact protein is analyzed through MS either with or without reduction. Glycan profiling includes the removal of glycans from the protein, generally through digestion with PNGase F, and subsequent MS analysis of the cleaved glycans. Additional workflows used include gas chromatography coupled with MS (GC-MS) and inductively coupled plasma MS (ICP-MS). Peptide mapping was used in all of the BLAs analyzed, intact mass analysis was used in 92% of the applications, glycan profiling was used in 44% of the BLA, and additional workflows were present in 29% of the applications. Usage of MS workflows was analyzed over time, and again a binned method was applied (Figure 3a). Intact mass analysis increased during the studied time period, from 83% to 97%. Glycan profiling was more variable, with usage levels ranging from 17% to 58%; most recently, for the last 8 years, this level has remained at 47%. Other workflows, including GC-MS and ICP-MS, were present at variable levels. The total number of MS-based workflows within each BLA was also analyzed (Figure 3b); 95% of the applications used at least two different MS-based workflows, with an average of 2.6 workflows used per BLA. These findings indicate that peptide mapping, intact mass, and glycan profiling analyses were often used in conjunction for more complete product characterization. Notably, the same workflows were often used to characterize multiple attributes, while certain attributes were sometimes analyzed using multiple workflows. For example, peptide mapping was regularly used for amino acid sequence analysis in addition to the characterization of multiple modifications, whereas intact mass analysis was often used for the characterization of those same modifications within the same BLA.

Figure 3
figure 3

MS workflows. (a) MS workflows over time. Three major MS workflows were found within the analyzed BLAs: intact mass analysis, peptide mapping, and glycan profiling. Percentages are based on the total number of electronic BLAs that used MS. (b) Number of MS workflows per BLA. The total number of MS workflows used per BLA is shown, indicating that in 95% of the BLAs at least two workflows were used (i.e., intact mass was often used alongside peptide mapping)

MS Instrumentation

The specific MS instrumentation used within the BLAs was analyzed, both as a whole and over time (Figure 4a). MALDI-TOF usage has been decreasing over time, from over 80% down to 30%, with an overall usage of 58%. QTOF usage has been increasing over time from 0% to nearly 60%, with an overall usage of 46%. TOF and Orbitrap usage have also been increasing, while ion trap, triple quadrupole, and single quadrupole usage have been decreasing. These findings likely represent an increase in the usage (and availability) of high resolution instrumentation, such as Q-TOFs and Orbitraps, within the biopharmaceutical industry. However, one drawback to using BLAs as a data-source for studies such as this one is that applicants are not always responsible for disclosing all details pertaining to characterization methods used. Particularly, applicants do not always indicate the specific instrumentation used for each experiment. Thus, although MALDI-TOF was the most commonly found instrument, it was not necessarily the most used, as 47% of the BLAs contained at least some instrumentation with an unidentified mass analyzer.

Figure 4
figure 4

MS instrumentation usage over time. Percentages are based on the total number of electronic BLAs that used MS

MS Methodology

MS methodology was broken down into two categories: instrument introduction methods and scan type. The major introduction methods used in the analyzed BLAs included LC coupled with ESI (95%), ESI alone (35%), and MALDI (63%). Other introduction methods including GC and ICP were used at lower levels (15%). Intact mass analyses were conducted more than 50% of the time using LC-ESI, whereas MALDI and ESI alone accounted for approximately 35% of the BLAs. Peptide mapping analyses used LC-ESI in over 90% of the BLAs with MALDI in 29% and negligible usage of ESI alone. LC-ESI and MALDI were used at similar levels (22%) for glycan profiling (Figure 5a). Instrument introduction methods were also analyzed over the course of the studied time period (Figure 5b). LC-ESI usage has increased slightly over time, from 92% to 97%. MALDI usage has seen a decline in usage over the last several years, from 83% to 38%. Usage of ESI alone has remained fairly constant, between 33% and 41% over the time period analyzed.

Figure 5
figure 5

MS methodology. (a) and (b) Introduction methods. (c) and (d) Scan types. Both methodology analyses were conducted by MS workflow [(a) and (c)] and over time [(b) and (d)]. Percentages are based on the total number of electronic BLAs that used MS

The most widely used scanning methodologies used in the electronic BLAs that used MS are full MS scans (100%) and MS/MS – data-dependent acquisition (DDA, 71%). Additional methods include MS3/MSn (6%), selected ion monitoring (SIM, 6%), and MS/MS – data-independent acquisition (DIA, 1%). Scan type usage was also broken down by workflow (Figure 5c). Full MS was used for all intact mass experiments, whereas only one BLA used top-down MS/MS. Both full MS and MS/MS (DDA) were used regularly for peptide mapping experiments. Glycan profiling was largely conducted through MS, with some amount of MS/MS (DDA) and MS3/MSn being used. Full MS analysis was used consistently across all BLAs, while MS/MS (DDA) usage generally increased over time (Figure 5d). Usage of other techniques remained relatively low. Few BLAs included information on what type of fragmentation was used, such that it was not feasible to study fragmentation approaches in detail. When fragmentation was mentioned, collision-induced dissociation (CID) was generally used, with some usage of higher-energy collisional dissociation (HCD) in more recent years. Additionally, some usage of modern fragmentation approaches, such as electron-transfer dissociation (ETD), was observed. Particularly, ETD was used to characterize deamidation, an approach that has been well characterized in the literature [14, 15]. Within the literature, ETD has been utilized for disulfide bond analysis as well [16, 17]; however, whether ETD was specifically used in this manner in the analyzed BLAs was unclear.

Methodology Analysis by Quality Attribute

In addition to the workflow analysis, the specific MS methodology used for several quality attributes was also analyzed. Specifically, the top four attributes, amino acid sequence analysis, molecular mass analysis, disulfide bond analysis, and glycosylation analysis, were monitored. As previously mentioned, these top attributes are critical to ensemble characterization of a biopharmaceutical drug. Furthermore, several additional attributes of interest, such as HOS and HCPs, which have been increasingly analyzed via MS in the literature, were monitored.

Amino acid sequence analysis through MS was conducted in 97% of the BLAs that used MS. This attribute was analyzed entirely through peptide mapping; 82% of the BLAs that analyzed amino acid sequence through MS used a single introduction technique, most commonly LC-ESI; 52% of the these BLAs used more than one scan type, generally including MS and MS/MS (DDA). In the literature, top-down MS/MS has been increasingly used for protein sequence analysis [18, 19]; however, this method had not yet made its way into approved BLAs by 2015; 74% of these BLAs used a single instrument; most often the instrument identity was not identified by the applicant (34%). The most commonly identified instruments used for amino acid sequence analysis were QTOFs (25%). Additional instrument usage information can be found in Table 2. For all product types, MS was used for amino acid sequence analysis in at least 95% of the BLAs (Figure 6a). The high incidence of use of MS for amino acid sequence analysis over time and across all biopharmaceutical product types indicates the ubiquitous importance of this particular characterization technique within BLAs.

Table 2 Percent of Instrumentation Usage by Attribute. The Total Percentage of Applications that Analyzed Each of the Top Four Attributes Using Each Type of Instrument is Shown. Percentages are Based on the Total Number of Electronic BLAs that Used MS for Each Attribute
Figure 6
figure 6

Top four quality attribute analyses by product type. The top four MS attributes, including amino acid sequence (a), molecular mass (b), disulfide bond (c), and glycosylation (d), are shown by product type. Percentages are based on the total number of electronic BLAs that used MS for each product type

Analysis of the protein molecular mass through MS was performed in 92% of the analyzed BLAs. As was previously mentioned, analysis of this attribute by MS has increased over time, which may be indicative of the necessity of high resolution instruments for making these measurements that have been more widely available in the later years of our study range. This attribute was analyzed exclusively through intact mass analysis. No MS/MS was observed for the characterization of this attribute; 74% of these BLAs used a single introduction technique, most commonly LC-ESI; 78% of these BLAs used a single instrument; most often the instrument used was a QTOF (38%, Table 2). By product type, usage levels of MS-based molecular mass analysis for antibody and ADC characterization was slightly higher than the average usage level across product types, whereas fusion protein and other protein characterization were below the average (Figure 6b). The analysis patterns of this attribute show how increasingly important MS has become in molecular mass determination. The increase in use of MS for molecular mass determination may be attributed to the ability of MS to provide a more accurate and precise mass value compared with more traditional gel electrophoretic workflows.

Analysis of disulfide bonds through MS, including identification and localization, was performed in 77% of the analyzed BLAs. This attribute was analyzed through peptide mapping in all of these BLAs, with 3% of these BLAs performing intact mass analysis as well. Both workflows were generally performed with and without reduction in order to determine the presence and location of bridges; 92% of these BLAs used a single introduction technique, most commonly LC-ESI; 57% of these analyses used a single scan type, most commonly full MS; 90% of these BLAs used a single instrument; most often the instrument identity was not disclosed (33%). When disclosed, the most commonly used instruments were QTOFs (26%, Table 2). By product type, MS-based disulfide bond analysis usage levels for antibody and fusion protein characterization were higher than the average usage level across product types, where ADC and other protein characterization has been below the average (Figure 6c). This finding is likely due to the disulfide bonds inherent to antibodies that may or may not be present in other proteins; however, as only two ADCs were surveyed, ADC levels may be skewed.

Glycosylation analysis through MS was performed in 71% of the analyzed BLAs. These analyses were predominantly focused on N-linked glycans. This attribute was analyzed through peptide mapping (80%), intact mass analysis (75%), and released glycan profiling (63%); 75% of these BLAs used at least two distinct workflows, including peptide mapping (80%), intact mass (75%) analyses, and cleaved glycan analysis (44%); 53% of these BLAs used at least two introduction techniques; most commonly both LC-ESI (86%) and MALDI (50%) were used; 55% of these analyses used a single scan type, which was most commonly a full MS scan (98%); 59% of these BLAs used two or more instruments; most often, this included MALDI-TOF (50%) and QTOF (46%) platforms (Table 2). By product type, the MS-based glycosylation analysis usage level for antibody characterization was higher than the average usage level across product types, whereas all other analyzed product types were below the average (Figure 6d). This finding is likely due to the inherent glycosylation on antibodies as well as the relative ease of glycan characterization for antibodies. As most therapeutic antibodies have a single glycosylation site per heavy chain, the complications associated with glycosylation analysis are more limited than with more heavily glycosylated proteins.

Additional attributes of particular interest include HOS, HCPs, and PEGylation. Three BLAs analyzed HOS via MS, all of which were approved in 2014. These products included one mAb, one protein, and one fusion protein. These studies were conducted using hydrogen-deuterium MS (HDX) with both intact mass and peptide mapping workflows. HDX has been used increasingly in both academia and industry for protein HOS characterization, including epitope mapping and comparability studies [2022]. However, there appears to be a substantial lag time between establishment of this technique, as demonstrated in the literature and at academic conferences, and its implementation within BLAs.

Three BLAs used MS to analyze HCPs. These applications were approved in 2003, 2005, and 2014. All three of these BLAs combined an in-gel digest with LC-MS or LC-MS/MS of the major bands. Within the field, the use of MS for the characterization of HCPs has been increasing steadily in recent years. Particularly, advanced separation and fragmentation methods, such as two-dimensional LC and data-independent acquisition, have been implemented in order to determine HCP abundance in the presence of API, which requires analysis spanning multiple orders of magnitude [23]. At this point in time, such technological advances have not been translated into approved protein BLAs.

Three BLAs analyzed PEGylation via MS. These applications were approved in 2008, 2010, and 2014. PEGylation studies consisted generally of peptide mapping and/or intact mass analysis to pinpoint the PEGylation site and/or MW. These studies did not use MS/MS for more detailed characterization. In the literature, PEGylation has been analyzed using intact mass analysis for heterogeneity characterization [24], top-down MS/MS and MS3 for improved conjugation site characterization [25], and HDX for PEG-related changes in conformation [26]. Such differences between the findings of this study and the current state of the literature indicate that there is room for growth in biotherapeutic characterization.

Overall, these attribute analyses provide insight into the importance of MS within the BLAs. However, compared with the literature, these findings generally indicate that there is room for more improved, modern MS methodologies to be used within the characterization section of BLAs.

The Future of MS in Biotherapeutic BLAs

We anticipate that as the MS field continues to progress, so will the usage of MS within biotherapeutic BLAs. As instruments are developed with higher resolution and mass accuracy, we expect that they will subsequently be implemented for these analyses and more. Additionally, as new techniques, particularly those for structural characterization, emerge and become more established, we anticipate that they will be implemented in BLAs as well. This trend can be seen with the recent implementation of HDX-based analyses in BLAs. HDX of proteins has become increasingly popular in recent years as MS instrumentation has improved and automated instrumentation and software platforms well-suited for the regulatory environment have become commercially available. These analyses have progressed enough to have become commonplace at this point in time, such that over 100 papers were published on the topic in 2010 and over 150 in 2015 (based on a PubMed search). As a result of this acceptance of a recently emerged technique, there has been an appearance of HDX studies within biotherapeutic BLAs. Similarly advanced emerging techniques will likely follow a similar pattern. Based on recent trends in the field of MS, we believe that we will see further top-down and middle-down MS/MS experiments as well as perhaps ion mobility MS in these types of BLAs.

As these methods become more established and accepted, we expect that they will be applied more broadly throughout the application. Specifically, we expect that we will see additional MS methodology within the quality control and comparability sections. This anticipation is not based on changing regulatory requirements, but review of the current literature where biopharmaceutical companies have made great strides in purposing MS for analysis of multiple quality attributes during development. Furthermore, as biotherapeutic products expand to include new, more complex classes, such as biosimilars and other emerging bioengineered protein groups (i.e., bispecific antibodies and antibody-drug conjugates), we envision that advanced MS methodology will be used for their analysis.

Conclusions

Within this set of approved BLAs, MS was found to be fundamental in the characterization of protein-based biotherapeutics. As the number of biotherapeutic BLAs has increased, particularly over the last few years, the usage of MS within these applications has increased proportionally, with MS being used in some way in all but one of the analyzed BLAs. Not only is MS being used consistently within these applications, the level of its usage has been growing, such that there has been a steady increase in the number of attributes analyzed per BLA over the last 16 years. Furthermore, the MS-based characterization assays conducted within these BLAs have been increasing in complexity and sophistication as the technology in the field has improved to include high resolution and high mass accuracy instrumentation. This trend leads us to believe that the complexity of MS assays within these applications will continue to progress as the MS field advances.