Top-Down Proteomics and Direct Surface Sampling of Neonatal Dried Blood Spots: Diagnosis of Unknown Hemoglobin Variants
We have previously shown that liquid microjunction surface sampling of dried blood spots coupled with high resolution top-down mass spectrometry may be used for screening of common hemoglobin variants HbS, HbC, and HbD. In order to test the robustness of the approach, we have applied the approach to unknown hemoglobin variants. Six neonatal dried blood spot samples that had been identified as variants, but which could not be diagnosed by current screening methods, were analyzed by direct surface sampling top-down mass spectrometry. Both collision-induced dissociation and electron transfer dissociation mass spectrometry were employed. Four of the samples were identified as β-chain variants: two were heterozygous Hb D-Iran, one was heterozygous Hb Headington, and one was heterozygous Hb J-Baltimore. The fifth sample was identified as the α-chain variant heterozygous Hb Phnom Penh. Analysis of the sixth sample suggested that it did not in fact contain a variant. Adoption of the approach in the clinic would require speed in both data collection and interpretation. To address that issue, we have compared manual data analysis with freely available data analysis software (ProsightPTM). The results demonstrate the power of top-down proteomics for hemoglobin variant analysis in newborn samples.
Key wordsTop-down mass spectrometryHemoglobin variantsETDCIDDirect surface samplingDried blood spots
Hemoglobin (Hb) is a tetramer consisting of four polypeptide chains. Normal adult hemoglobin (HbA), dominant in the blood from 3 to 6 months after birth, comprises two α and two β globin chains (α2β2) of 141 and 146 amino acids residues, respectively, with a total molecular weight of ~68,000 Da. The fetus’ reliance on an oxygen supply from the mother accounts for the presence of the higher oxygen affinity fetal hemoglobin (HbF) α2γ2 in newborns. The γ chains are homologous to the β chain, differing by 39 out of 146 residues . Hemoglobinopathies, variants of Hb, are the most common single gene recessive disorder with approximately 269 million carriers worldwide . Hemoglobinopathies are classified into two types: thalassemias, which are characterized by a reduction in the synthesis of the globin chains, and structural hemoglobin variants which are caused by a point mutation in the globin gene typically leading to a single amino acid substitution in the globin chain. There are over 1000 Hb variants classified  and in the majority of cases the variants are not clinically significant. The most well-known clinically significant structural Hb variant is the sickle variant (HbS) (β6 Glu→Val Δm –29.97 Da) . Whereas heterozygous inheritance of the sickle variant (HbAS) results in mild complications, the phenotypic consequence of inheriting the HbS variant in homozygous form is sickle cell disease (SCD), the symptoms of which are severe . Patients with SCD experience reduced lifespans . Rapid diagnosis during newborn blood screening programs, which allows for the implementation of suitable healthcare strategies, has been credited with the ability to virtually eradicate childhood mortality in SCD and SCD-related disorders [7, 8]. Newborn screening programs typically also screen for the structural variants HbC (β6 Glu→Lys Δm–0.9476 Da), HbD (β121 Glu→Gln Δm –0.9840 Da), HbE (β26 Glu→Lys Δm –0.9476 Da), and HbO-Arab (β121 Glu→Lys Δm –0.9476 Da). In the heterozygous or homozygous state these variants are benign but become clinically significant when co-inherited with HbS, resulting in sickling disorders [9, 10]. In addition, screening may detect the presence of other variants that cannot be identified. Generally, these variants will be benign but if the variant is dominant, it will be clinically significant in the heterozygous state. Additionally, these variants may be significant if co-inherited with HbS.
In the UK, the newborn screening program screens ~700,000 neonates each year for five inherited disorders . Current screening methods for hemoglobinopathies involve collection of blood samples via heel prick onto standard NHS dried blood spot (DBS) cards from neonates between 5 and 8 days old. Samples are left to dry at room temperature before being sent to a newborn screening laboratory. The first step in the screening process involves punching out small discs (~3 mm in diameter) from the dried blood card and eluting the sample for around 30 min . The resolubilized samples are analyzed by cation exchange high-performance liquid chromatography (ceHPLC) and/or isoelectric focusing (IEF) . Both methods involve lengthy sample preparation and are presumptive rather than absolute (i.e., diagnosis of Hb variants is not possible by these methods). Absolute determination of an Hb variant can only be determined by second-line testing, via DNA analysis (sequencing) or mass spectrometry , carried out in a separate laboratory.
Mass spectrometry-based approaches for identifying hemoglobin variants have been developed both for whole blood and for samples resolubilized from dried blood spots [15–21]. Typically, mass analysis of tryptic peptides of the hemoglobin is performed. That is, a bottom-up approach is utilized. For example, Daniel et al. developed a multiple reaction monitoring method for the analysis of tryptic digests of whole blood and latterly to blood eluted from dried blood spots [15, 18]. An alternative approach involves mass analysis of intact globin chains followed by MS/MS of tryptic peptides [20, 21] .
In contrast, the top-down approach [22, 23], in which intact proteins are analyzed, negates the need for lengthy proteolysis. Top-down mass spectrometry also addresses the problems of ‘information loss’ inherent in bottom-up approaches . That is, there is no loss of regions of the protein due to poor ionization, or connectivity between substitutions and/or modifications, or confusion due to the presence of isomeric peptides. We contend, therefore, in terms of identifying unknown hemoglobin variants, top-down analysis of the globin chains is the most suitable approach.
We recently demonstrated that liquid microjunction (LMJ) surface sampling  of newborn dried blood spots coupled with high resolution mass spectrometry can be applied to the screening of Hb variants, HbS, HbC, and HbD . The method involves no sample preparation and the results are unambiguous. Here, we optimize the approach to maximize protein sequence coverage following collision-induced dissociation (CID) and electron transfer dissociation (ETD) mass spectrometry with the aim of diagnosing unknown variants. The method was applied to six variant samples, which could not be diagnosed by the standard screening methods. Five of the six were diagnosed unambiguously. The sixth was either misassigned as a variant by the standard screening methods, or was an isomeric substitution (Ile→Leu or vice versa).
Our results demonstrate the robustness of the direct surface sampling top-down proteomics approach and confirm that the approach is applicable both for screening and diagnosis. All data were analyzed manually, with the attendant time costs, constituting a bottle-neck in the workflow and highlighting the need for data analysis software specifically for the diagnosis of Hb variants from top-down MS/MS data. Therefore, the freely available web-based top-down protein characterization software Prosight PTM  was investigated as an alternative to manual data analysis.
Normal adult (HbA) DBS specimens were collected via finger prick on to standard NHS blood spot (Guthrie) cards, Ahlstrom grade 226 filter paper (ID Biological Systems, Greenville, SC, USA). Anonymized dried blood spots from normal neonates (FA), and unknown variants (FAV), acquired from newborn patients between 5 and 8 days, were supplied by Birmingham Children’s Hospital in accordance with the Code of Practice for the Retention and Storage of Residual Spots . All samples have previously undergone analysis by the standard NHS screening protocols using ceHPLC and IEF. For the FAV samples, screening suggested the presence of an Hb variant but could not determine its identity. Samples were stored at -20 °C, with 1 g desiccant packets (Whatman International Ltd., Kent, UK). The electrospray solution consisted of methanol (J.T. Baker, Deventer, The Netherlands) and water (J.T Baker) (48.5:48.5), with 3 % formic acid (Sigma-Aldrich Company Ltd., Dorset, UK).
2.2 Surface Sampling
DBS were placed onto the Advion LESA (Liquid Extraction Surface Analysis) universal plate adapter (Advion, Ithaca, NY, USA) and an image of the DBS was acquired using an Epson Perfection V300 photo scanner. Using the LESA Points software (Advion), the precise location of the DBS to be sampled was selected using the scanned image. The universal plate adapter was placed into the Triversa Nanomate chip-based electrospray device (Advion, Ithaca, NY, USA), coupled to the ThermoFisher Scientific Orbitrap Velos ETD (ThermoFisher Scientific, Bremen, Germany).
Sample analysis was performed using the LESA feature of the Triversa Neonate ChipSoft Manager software. This platform was used to set the parameters for the surface sampling routine based on robotic arm movements (X,Y,Z) of the Nanomate probe. The routine begins with the Nanomate probe collecting a conductive tip, which is used for sample recovery and delivery, from the tip rack. The tip is then moved to solvent well containing the electrospray solution and collects 7 μL of the solution. The robotic arm relocates to a predetermined position above the surface of the DBS and the tip is lowered towards the surface of the DBS leaving a gap of 1.6 mm between the sample surface and tip. Six μL of the electrospray solution is dispensed onto the DBS forming the sample surface to tip liquid microjunction (LMJ). The LMJ is maintained for 5 s, and then 5 μL is reaspirated and introduced via nanospray to the mass spectrometer.
2.3 Mass Spectrometry
The sample was introduced at flow rate of ~80 nL/min, with a gas pressure of 0.5 psi, a tip voltage of 1.75 kV and a capillary temperature of 250 °C. MS data was collected in selected ion monitoring (SIM) mode (m/z 1055–1090) at a resolution of 100,000 at m/z 400 in the Orbitrap. Each scan comprises 30 co-added microscans. SIM-mode mass spectra shown comprise between 1 and 3 scans (acquired for ~2 min). Automatic gain control (AGC) target was 1 × 106 with a maximum fill time of 2 s. CID was performed in the ion trap and the fragment ions detected in the orbitrap with a resolution of 100,000 at m/z 400. The isolation width was 8–10 Th. The AGC target for CID was 1 × 106 with a maximum fill time of 2 s and CID experiments were performed with helium gas at normalized collision energy between 20 % and 40 % as discussed in the text. Each CID scan comprises 30 co-added microscans. CID MS/MS spectra shown comprise 3 co-added scans (acquired for ~3 min). ETD was performed in the ion trap and the fragments detected in the orbitrap with a resolution of 100,000 at m/z 400. The isolation width was 10 Th. The AGC target for ETD was 1 × 106 with a maximum fill time of 2 s. The reagent ion (fluoranthene) AGC target was 1 × 105 with a maximum fill time of 1 s. ETD activation time was 20 ms with a supplemental activation energy of 20 %. Each ETD scan comprises 30 co-added microscans. ETD MS/MS spectra shown comprise ~3 co-added scans (acquired for ~3 min).
Data were analyzed using Xcalibur 2.10 software (ThermoFisher Scientific). For the SIM mode mass spectra, the Xtract program was used to calculate monoisotopic masses (44 % fit factor, 25 % remainder, S/N threshold 2.1). CID and ETD MS/MS spectra were analyzed manually resulting in de novo identification of the site and nature of amino acid substitution/insertion. That information was manually searched against the HbVar database (http://globin.bx.psu.edu/hbvar/menu.html) enabling Hb variant identification. Experimentally measured fragment m/z values were compared with theoretical m/z values for the variants. Theoretical m/z values for comparison were calculated in Protein Prospector (http://prospector.ucsf.edu/prospector/mshome.htm). Manual fragment assignments for all the hemoglobin species studied here are given in Supplemental File 1. For data analysis using ProSightPTM 1.0, CID and ETD MS/MS spectra were deconvoluted using the Xtract program (44 % fit factor, 25 % remainder, S/N threshold 2.1). Xtracted fragment masses were searched against the (manually determined) globin/variant sequence, mass error = 10 ppm, in ProSightPTM 1.0.
3.1 Protein Sequence Coverage
Previously we showed, in the analysis of a neonatal heterozygous HbS dried blood spot (DBS) sample, that collision-induced dissociation of [M + 15H]15+ ions resulted in a sequence coverage of 39 % for the β-chain and 32 % for the HbS variant . Clearly, top-down diagnosis of unknown variants requires the maximum sequence coverage possible. CID of [M + 15H]15+ ions of the β-chain from normal adult routinely gives a mean sequence coverage of 44 %, however we found that reducing the normalized collision energy by 5 % (from 35 % to 30 %) dramatically improves sequence coverage to a highly reproducible 68 %, see Supplemental Figure 1. If electron transfer dissociation (ETD) is performed the combined sequence coverage (CID + ETD) is 81 %. Neonate DBS samples analyzed with the optimized CID parameters gave average protein sequence coverage of 63 %. This slightly lower value may be due to the lower abundance of β chain in neonate samples. (HbA comprises 25 % of the total Hb content ).
3.2 Diagnosis of Unknown Variants
Clearly, manual analysis of top-down mass spectrometry data for the diagnosis of hemoglobin variants is time-consuming and is a potential barrier to adoption in the clinical laboratory. Ideally, data collection would be followed by automated searching against a hemoglobin variant database. To test that approach, we searched the CID MS/MS data for FAV1 against the Hb D-Iran sequence using ProSightPTM 1.0. The aim was to mimic a search against a dedicated Hb variant database. The protein sequence coverage for the Hb D-Iran variant chain generated using ProSightPTM was 43 % (see Supplemental Figure 2a). The source of the discrepancy in sequence coverage was investigated further. For example, fragment ions b324+ and b334+ were assigned manually but were not identified by ProSightPTM (see Supplemental Figure 2b and c). For these fragments, the Xtract program had failed to distinguish the presence of two overlapping isotope distributions, one from the β-chain and one from the Hb D-Iran variant. Despite the difference in sequence coverage, the site of the substitution was identified and the variant could be unequivocally diagnosed.
A second unknown variant, FAV2, was also identified as heterozygous Hb D-Iran. Again, because of the small mass difference, both the variant and the β-chain were selected for CID. The fragment peaks observed in the CID mass spectrum of FAV2 and their assignments are shown in Supplemental Table 2. The protein sequence coverage obtained was 57 %. Again, the data were searched against the Hb D-Iran sequence using ProSightPTM and gave a coverage of 56 %, and unequivocal diagnosis. A study reviewing screening results over a 10 year period in the North Thames health region showed that Hb D variants are the third most common variants with a carrier incidence rate of 1 in 631. Hb D-Iran is the second most prevalent of the Hb D variants after Hb D Punjab (Los Angeles) . Heterozygous Hb D-Iran is clinically benign.
The SIM-mode mass spectrum for the final unknown variant FAV6 is shown in Supplemental Figure 3. The mass spectrum does not reveal the presence of any additional peaks. The measured masses of the α, β, Gγ and Aγ chains were MWmeas 15117.8211 (MWcalc 15117.8924 Δ -4.7 ppm), MWmeas 15858.3217 (MWcalc 15858.2570 Δ 4.1 ppm), MWmeas 15986.3329 (MWcalc 15986.2626 Δ 4.4 ppm), MWmeas 16000.2849 (MWcalc 16000.2782 Δ 0.4 ppm), respectively. We therefore performed CID of the peaks centered at m/z 1059 (β-chains), m/z 1068 (γ-chains) and m/z 1081 (α-chains). The CID protein sequence coverages were 46 % (β-chain), 45 % (α-chain), 43 % (Gγ-chain) and 32 % (Aγ-chain). Fragments observed are detailed in Supplemental Tables 8–11.The ETD protein sequence coverages were 17 % (β-chain), 44 % (α-chain), 28 % (Gγ-chain) and 15 % (Aγ-chain). Fragments observed are detailed in Supplemental Tables 12–15. No erroneous fragment peaks were observed for any of the globin chains suggesting that either the sample had been incorrectly identified as a variant in the screening process or contains an isomeric substitution (i.e., Leu→Ile or vice versa). (It is worth noting that no such substitution is currently described in the HbVar database .) It was not possible to reconfirm the screening results because the sample was anonymized prior to mass spectrometry analysis. The estimated false positive rate in screening is ~1 %  and documented sources of error include administration errors or unfocused/ merged bands in IEF .
Summary of Identified Hb Variants
HbVar ID (UniProt)
Heterozygous Hb Phnom Penh
We postulate that the top-down proteomics approach could be employed within a newborn screening program, combining screening and diagnosis on a single platform. Efficient application of the approach would require improvements to Xtract program and the use or development of software capable of searching top-down MS/MS data (CID, ETD) against a dedicated hemoglobin variant database. We have used ProSightPTM to search against specific variant sequences to illustrate such an approach. In all cases, the automated search identified the unique fragment ions, allowing variant diagnosis, albeit with reduced overall protein sequence coverage.
R.L.E. received an EPSRC-funded studentship. The Advion Triversa Nanomate and ThermoFisher Orbitrap mass spectrometer used in this research were funded through the Birmingham Science City Translational Medicine: Experimental Medicine Network of Excellence Project, with support from Advantage West Midlands (AWM). The authors thank Mark Baumert for technical discussions.