Skip to main content
Log in

Principal component analysis for automated classification of 2D spectra and interferograms of protein therapeutics: influence of noise, reconstruction details, and data preparation

  • Article
  • Published:
Journal of Biomolecular NMR Aims and scope Submit manuscript

Abstract

Protein therapeutics have numerous critical quality attributes (CQA) that must be evaluated to ensure safety and efficacy, including the requirement to adopt and retain the correct three-dimensional fold without forming unintended aggregates. Therefore, the ability to monitor protein higher order structure (HOS) can be valuable throughout the lifecycle of a protein therapeutic, from development to manufacture. 2D NMR has been introduced as a robust and precise tool to assess the HOS of a protein biotherapeutic. A common use case is to decide whether two groups of spectra are substantially different, as an indicator of difference in HOS. We demonstrate a quantitative use of principal component analysis (PCA) scores to perform this decision-making, and demonstrate the effect of acquisition and processing details on class separation using samples of NISTmAb monoclonal antibody Reference Material subjected to two different oxidative stress protocols. The work introduces an approach to computing similarity from PCA scores based upon the technique of histogram intersection, a method originally developed for retrieval of images from large databases. Results show that class separation can be robust with respect to random noise, reconstruction method, and analysis region selection. By contrast, details such as baseline distortion can have a pronounced effect, and so must be controlled carefully. Since the classification approach can be performed without the need to identify peaks, results suggest that it is possible to use even more efficient measurement strategies that do not produce spectra that can be analyzed visually, but nevertheless allow useful decision-making that is objective and automated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Availability of data and materials

Example processing and analysis scripts, and the spectral data used in the work will be posted at the NISTmAb NMR home page: https://www.ibbr.umd.edu/groups/nistmab-nmr. The NISTmAb Reference Material is available for purchase from NIST: https://www-s.nist.gov/srmors/view_detail.cfm?srm=8671.

Software availability

The work makes use of the following software, which is all also available on the NMRbox cloud computing platform: NMRbox: https://www.nmrbox.org. NMRPipe: https://www.ibbr.umd.edu/nmrpipe/install.html. SMILE: https://spin.niddk.nih.gov/bax/software/smile. hmsIST: http://gwagner.med.harvard.edu/intranet/hmsIST (download by request). NESTA: http://nestanmr.com (download by request).

Abbreviations

1D:

One-dimensional

2D:

Two-dimensional

CQA:

Critical quality attribute

DSS:

3-(Trimethylsilyl) propane-1-sulfonate

gHSQC:

Gradient-selected heteronuclear single quantum coherence spectrum

HOS:

Higher order structure

mAb:

Monoclonal antibody

NMR:

Nuclear magnetic resonance spectroscopy

NUS:

Non-uniformly sampled

PCA:

Principal component analysis

References

Download references

NIST disclaimer

Certain commercial equipment, instruments, and materials are identified in this presentation in order to specify the experimental procedure. Such identification does not imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the material or equipment identified is necessarily the best available for the purpose.

Funding

Work was internally funded.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frank Delaglio.

Ethics declarations

Conflict of interest

Authors have no conflicting or competing interests to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 2001 kb)

Supplementary file2 (PNG 666 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Brinson, R.G., Elliott, K.W., Arbogast, L.W. et al. Principal component analysis for automated classification of 2D spectra and interferograms of protein therapeutics: influence of noise, reconstruction details, and data preparation. J Biomol NMR 74, 643–656 (2020). https://doi.org/10.1007/s10858-020-00332-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10858-020-00332-y

Keywords

Navigation