Interactions of DNA with proteins are essential for life processes such as DNA replication and repair. The complexity of DNA-protein supermolecular assemblies, as revealed by X-ray crystallography [1], makes it challenging to study the components of DNA-protein interactions at high resolution and to investigate the interaction dynamics on a time scale relevant for biochemical processes. Mass spectrometry has been used to aid identification of DNA-protein cross-links formed in solution by specific chemical reactions [2,3,4,5,6,7], as well as by ionizing radiation [8], and UV light [9,10,11,12,13,14,15,16], and the various approaches to deal with the complexity of the resulting mixtures have been reviewed [17,18,19,20]. The complexity problem can be simplified by bringing the DNA and protein components into the gas phase as non-covalent ion complexes and study their structure and dynamics in the absence of solvent, additives, reagents, and other intervening effects. Previous studies demonstrated the usefulness of such an approach, showing that oligonucleotide-peptide ion complexes can be generated in the gas phase [21,22,23], isolated by mass-to-charge ratios, and studied by tandem mass spectrometry [24,25,26]. However, the previous gas-phase studies were limited to using collision-induced dissociation (CID) to probe the interactions in the complexes, whereas structural details of nucleotide-peptide binding were not obtained.

We have shown recently that related peptide-peptide interactions in non-covalent ion complexes can be probed by combining site-specific photodissociative cross-linking in the gas phase with Born-Oppenheimer computational analysis of the molecular dynamics, revealing the interactions in the complexes and indicating the formation of covalent cross-links. By combining experiment and theory, we have been able to assign interactions in gas-phase complexes of the small peptide Cys-Ala-Gln-Lys (CAQK) with selected peptide motifs [27].

CAQK has recently been discovered to mitigate adverse effects of brain trauma in mice [28], possibly because of specific interactions with yet-to-be identified proteins. Because of the biological importance of CAQK, we now set out to investigate its interactions with several dinucleotides as surrogates of nuclear DNA. This small lysine-containing peptide can be viewed as a simplified surrogate of a histone interacting with DNA. Interaction with DNA and RNA of histone lysine side chains is the foundation of nucleic acids coiling in chromatin [29]. Modifications of histone lysines by alkylation and acylation are important tools for chromatin regulation [30]. However, the pattern of these modifications, the so-called histone code [31], has not been fully elucidated in spite of numerous studies [32]. In our approach, we use modified CAQK peptides which are specifically tagged with a 4,4-azipentyl group at the N-terminus (*CAQK) [33], cysteine thiol group (C*AQK) [27], and lysine side chain (CAQK*) [33], introducing a diazirine ring that can be selectively photodissociated at 355 nm (Fig. 1). Diazirine photodissociation generates a transient carbene [34] that can undergo non-selective insertion into the proximate C–H, N–H, or O–H bonds of the partner molecule, forming a covalent bond and converting the complex into a molecular ion. The photodissociation products, both covalent and non-covalent, are further analyzed by CID tandem mass spectrometry (CID-MS3) to probe the formation and location of the covalent bonds connecting the peptide and the DNA. Since CID-MS3 often results in limited sequence coverage [35], the experimental data do not allow us to achieve atomic-level resolution in determining the cross-links and identifying all interactions in the complexes. To improve resolution, we complement experimental cross-linking data with Born-Oppenheimer molecular dynamics (BOMD) calculations that involve all valence-electron interactions along with nuclear motion at the experimental temperature. BOMD provides time-resolved trajectories that reveal contacts between the peptide and DNA atoms that may result in cross-linking in photochemically produced carbene intermediates. We now apply this approach to the study of CAQK complexes with DNA dinucleotides dAA, dAT, dGG, dGC, and dCG. We wish to show that in interactions with CAQK ions, even simple dinucleotides differ in their binding efficiency and stereochemistry, and we provide structures of selected complexes obtained by electronic structure theory calculations using density functional theory (DFT).

Fig. 1
figure 1

Diazirine labeled Cys-Ala-Gln-Lys peptides

Experimental Section

Materials and Methods

Peptides C*AQK, *CAQK, and CAQK* were synthesized according to previously reported procedures [27]. Mass spectra were measured on an LTQ-XL-ETD (Thermo, San Jose, USA), and Bruker amaZon Speed (Bruker, Bremen, Germany) ion trap mass spectrometers equipped with an Nd-YAG laser EKSPLA laser [36]. Ions were produced by electrospray ionization of ca. 50 μM solutions in methanol-water-acetic acid, selected by m/z and stored in the ion trap.

Calculations

Born-Oppenheimer molecular dynamics (BOMD) trajectories were run with semiempirical all-valence-electron quantum chemistry calculations using the Berendsen thermostat algorithm [37]. Temperature was set at 310 K to represent the experimental conditions in the ion trap. For each complex type, several (> 5) initial structures were constructed from PM6-optimized peptide subunits in which the diazirine tag and dGG were in different spatial orientations. These complexes were subjected to a preliminary BOMD using PM6 [38] that was augmented by including dispersion interactions (D3H4 [39]). These calculations were run by MOPAC [40] that was coupled to the Cuby4 framework [41, 42]. Running trajectories with 1 fs steps for 20 ps furnished 20,000 snapshots per each initial structure from which 200 snapshots were extracted at 100 fs intervals. The extracted snapshot structures were fully gradient-optimized with PM6-D3H4 and sorted out by their secondary and supersecondary structural similarities to compact duplicates and reduce the size of the selection. This yielded 60–70 distinct structures whose geometries were fully optimized by DFT gradient calculations. B3LYP/6-31G(d,p) [43] optimized structures were used for frequency calculations to provide vibrational enthalpies and entropies whereas ωB97X-D/6–31 + G(d,p) [44] optimized structures were used to evaluate electronic energies including dispersion interactions. The electronic, vibrational, and rotational terms were combined to yield relative free energies for the gas-phase complexes. In addition, solvent effects were included by single-point energy self-consistent reaction field calculations using the polarizable continuum model [45] with water as the dielectric and the gas-phase optimized structures. All the thermochemical calculations were performed using the Gaussian 16 (Revision A.03) [46] suite of programs. The final optimized structures with relative free energies within 40 kJ mol−1 of the global energy minimum were then subjected to a full BOMD with PM6-D3H4 for 100 ps at 310 K. Cuby4 was also used to extract close contacts between the carbon atom of the diazirine ring and the dinucleotide atoms. Close contacts were limited to X–H…C distance of less than 4.5 Å on the basis of the van der Waals radii of the diazirine and X–H atoms [36].

Results and Discussion

Complex Formation and Photodissociation

Electrospray ionization of equimolar mixtures of the labeled CAQK peptides (abbreviated as M*) and dinucleotides (dXX, abbreviated as m) produced the pertinent singly charged (M* + dXX + H)+ complexes in ca. 1% yield relative to the dominant singly charged (M* + H)+ and (m + H)+ ions. This is illustrated by the spectra of (C*AQK + dXX) complexes in Fig. S1a-e (Supporting Information). The complexes were isolated by mass, m/z 1109, 1100, 1141, and 1101 for dAA, dAT, dGG, and dGC, dCG, respectively, stored in the ion trap and subjected to several laser pulses at 355 nm. This resulted in loss of N2 and partial dissociation as shown by the UVPD-MS2 mass spectra (Fig. 2a–e). With dAA and dAT complexes, UVPD resulted in substantial dissociation following the loss of N2, producing (M* − N2 + H)+ peptide ions at m/z 517 whereas the complementary dinucleotide (m + H)+ ions gave only minor peaks (m/z 565 and 556, Fig. 2a, b). In contrast, UVPD of the (M* + m + H)+ ions from dGG, dGC, and dCG, following the loss of N2, produced (M* − N2 + m + H)+ complexes with only minor dissociation to peptide and dXX monomers (Fig. 2c–e). The UVPD-MS2 results for C*AQK, *CAQK, and CAQK* complexes are summarized in Table 1. Comparing the UVPD-MS2 spectra across the three photopeptides, the photolyzed C*AQK and *CAQK complexes of dGG, dGC, and dCG showed comparably high resistance to dissociation. This was expressed as a percent fraction R(MS2) of surviving (M* − N2 + m + H)+ ions (Table 1) [35, 36]. The complexes of CAQK* were somewhat less stable, giving 82% survivor ions for dGG. The complexes of dAA displayed more variable stabilities, giving 36%, 27%, and 19% survivor complexes for C*AQK, *CAQK, and CAQK*, respectively.

Fig. 2
figure 2

UVPD-MS2 spectra of C*AQK complexes with (a) dAA, (b) dAT, (c) dGG), (d) dGC, and (e) dCG

Table 1 Cross-linking percent efficiencies

The dissociations of the photolyzed complexes are in part driven by the highly exothermic isomerization of the carbene intermediate to an olefin (ΔHisom ≈ 200 kJ mol−1) [47] that occurs in the absence of fast X–H insertion [48,49,50,51]. The UVPD-MS2 dissociations were compared with CID-MS2 of (M* + m + H)+ ions that were not exposed to diazirine photolysis. The spectra in Fig. S2a-e (Supporting Information) indicated that among the C*AQK complexes, that with dAA was most sensitive to CID, primarily forming the (M* + H)+ ion at m/z 545. CID of the dAT, dGG, and dGC complexes produced protonated dinucleotide and peptide at comparable relative intensities. The complex of dCG was most resistant to CID, forming small but nearly equal fractions of protonated dCG and C*AQK. We note that CID did not result in elimination of N2, the only significant dissociation besides cleavage to the monomers being a minor loss of water (Fig. S2a-e).

CID-MS3 of Photolyzed Conjugates of C*AQK

The (M* ‑ N2 + m + H)+ ions formed by photodissociation were selected by mass and subjected to CID. In the absence of covalent cross-linking, the MS3 fragmentation was expected to result in the formation of monomeric units, competing for the single available proton according to their relative basicities. In contrast, the covalently bound ions, which we call conjugates, can presumably undergo a combination of peptide and nucleotide backbone fragmentations, as well as water, ammonia, side chain, and nucleobase losses. To describe the fragment ions, we used the standard DNA ion nomenclature [52, 53], which, reading from the 5′ to the 3′ terminus, assigns the fragments as a, b, c, d, w, x, y, z, and those by loss of the neutral 5′ and 3′ base as (-B1H(XH)) and (-B2H(XH)), respectively. The protonated 5′ and 3′ bases are denoted as (B1H2+) and (B2H2+), respectively (Scheme 1). Backbone cleavage in the peptide was described by the standard Biemann-Roepstorff-Pohlman nomenclature for the C-terminal (xn, yn) and N-terminal (am, bm) fragments [54, 55]. For combined dissociations in both the dXX and peptide units, we used a combined notation, e.g., w1_B1 refers to an ion containing the w1 nucleobase fragment linked to the B1 peptide fragment. A table of all theoretical fragment ion combinations was assembled and compared against the fragment ions observed in the CID-MS3 spectra. The identified fragment ions from all CID-MS3 spectra of photo-cross-linked complexes are shown in Table S1.

Scheme 1
scheme 1

Nomenclature of fragment ions from dinucleotide-peptide cross-links

CID of the (dAA + C*AQK − N2 + H)+ ion (m/z 1081) resulted in the formation of the peptide monomer (M* − N2 + H)+ (m/z 517) and its secondary fragment (M* − N2 ‑ S-tag + H)+ (m/z 415, Fig. 3a). In addition, the CID-MS3 spectrum showed numerous fragment ions indicating covalently linked species. We succeeded in assigning most CID-MS3 fragment ions from (dAA + C*AQK ‑ N2 + H)+ to backbone cleavages in dAA combined with loss of a nucleobase and water. Consecutive loss of water is indicated by green arrows in Fig. 3a–e. A few fragment ions, such as m/z 659, did not correspond to logical backbone cleavages and were unassigned. The ion series starting at m/z 946 was formed by loss of adenine (B1H or B2H) and followed by successive eliminations of water (m/z 928 and m/z 910). Another prominent fragment ion series started at m/z 848 (w1/d1_M), indicating backbone cleavage in dAA and loss of a mononucleoside unit. In this case, as with dGG, the w1 and d1 fragments were isomers and thus indistinguishable by mass. Two low-intensity fragment ion series were particularly diagnostic for locating covalent cross-links. The m/z 811 ion was formed by losses of both adenine nucleobases, indicating peptide cross-linking to the deoxyribose-phosphate framework. Consecutive eliminations of water from the m/z 811 ion produced secondary fragment ions at m/z 793, 775, and 757. Likewise, the m/z 713 fragment ion and its secondary fragments by loss of water (m/z 695, 677, and 659) indicated loss of both nucleobases in precursors cross-linked to the deoxyribose-phosphate framework. The formation of the m/z 713 ion can be viewed as converging from multiple pathways, for example, loss of H3PO4 from m/z 811, loss of adenine from m/z 848, or loss of adenine-dideoxyribose (233 Da) from m/z 946. These are standard dissociations of nucleotide ions [52, 53, 56, 57]. In contrast to the above fragment ions, the m/z 652 and its loss-of-water secondary fragments were assigned to adenine-peptide cross-links. This indicated that the population of (dAA + C*AQK ‑ N2 + H)+ ions was heterogeneous, containing isomers cross-linked to the nucleobases as well as to the deoxyribose-phosphate framework. The cross-linking efficiency for (dAA + C*AQK + H)+ was R(MS3) = 58% (Table 1), as expressed by a ratio of the intensities of the identified cross-linked fragment ions to the sum of all fragment ion intensities in the CID-MS3 spectrum [35, 36]. Combined with the 36% fraction of (dAA + C*AQK ‑ N2 + H)+ ions surviving UVPD-MS2, the overall efficiency R(total) was 21%.

Fig. 3
figure 3

CID-MS3 spectra of (C*AQK – N2) complexes with (a) dAA, (b) dAT, (c) dGG, (d) dGC, and (e) dCG. Green arrows indicate consecutive dissociations by loss of water

The CID-MS3 spectrum of the (dAT + C*AQK − N2 + H)+ ion (Fig. 3b, m/z 1072) was analyzed in a similar fashion, as briefly discussed below. A comparison of the monomer (M* − N2 + H)+ ion relative intensity with those of the cross-linked fragment ions in the m/z 530–1000 region indicated a significant fraction, R(MS3) = 67%, of covalently cross-linked conjugates of dAT. The structurally most significant fragment ions were at m/z 937 (loss of adenine) and m/z 839 (w1_M) that both indicated peptide cross-linking to the 3′-dT moiety.

A significant fraction of cross-linked products, R(MS3) = 47% (Table 1) was also indicated by the CID-MS3 spectrum of the (dGG + C*AQK ‑ N2 + H)+ ion (Fig. 3c, m/z 1113). Major dissociations were a loss of guanine (m/z 962) and a backbone cleavage giving the w1/d1_M ion at m/z 864. The less intense ions at m/z 811 and m/z 713 and their secondary fragments by loss of water pointed to cross-links within the deoxyribose-phosphate framework, as discussed above for dAA. Specific cross-linking to the nucleobases was indicated by minor fragment ions at m/z 668, pertinent to G_M.

The CID-MS3 spectra of the isomeric (dGC + C*AQK ‑ N2 + H)+ and (dCG + C*AQK − N2 + H)+ ions showed striking differences (Fig. 3d, e) despite similar cross-linking efficiencies, R(MS3) = 39% and 38%, respectively (Table 1). Starting with dGC (m/z 1073, Fig. 3d), the CID-MS3 spectrum displayed fragment ions by loss of both cytosine (m/z 962) and guanine (m/z 922) and their loss-of-water satellites. The loss-of-G peak was substantially more intense than that for the loss of C. Another significant fragment ion series (m/z 824, w1_M, and its loss-of-water satellites) indicated peptide cross-linking within the 3′-dC moiety. In addition, the m/z 811 and m/z 713 fragment ions revealed the presence of isomers in which the peptide was cross-linked to the deoxyribose-phosphate framework. Peptide cross-linking to the nucleobases was indicated by minor fragment ions at m/z 668 and m/z 628 that were assigned to (G_M) and (C_M), respectively.

Contrasting the spectrum of the GC conjugate, the CID-MS3 spectrum of (dCG + C*AQK − N2 + H)+ showed fragment ions by loss of cytosine (m/z 944) whereas loss of guanine (m/z 962) was undetectable (Fig. 3e). The other important fragment ions were represented by m/z 864 that was assigned as w1_M and indicated peptide cross-linking to the 3′-dG moiety. The m/z 713 and 677 ions pointed to loss of guanine and water from the m/z 864 ion, indicating that the peptide was cross-linked to the 3′-phosphodeoxyribose moiety. However, the weaker fragment ion at m/z 668 also indicated a minor fraction of isomers where the peptide was cross-linked to the guanine residue.

Conjugates of *CAQK and CAQK*

Photodissociation of dinucleotide complexes with *CAQK gave results that were analogous to those for C*AQK. Upon photodissociative loss of N2, the complexes of AA, AT, and TA underwent substantial dissociation to monomers, chiefly represented by the peptide ion M+ (m/z 517, Fig. S3a-c). The survivor (dXX + C*AQK ‑ N2 + H)+ conjugates within this group amounted to R(MS2) = 26–36% of the ion intensities (Table 2). In contrast, conjugates of GG, GC, and CG dissociated much less (Fig. S3d-f), generating 93–98% of survivor ions (Table 1). Photodissociation of complexes with CAQK* was studied for dAA and dGG only and produced results that were analogous to those for the other tagged peptides. Table 1 shows 19% and 82% survivor ion yields in the CID-MS3 spectra of conjugates with dAA and dGG, respectively. The UVPD spectra of the *CAQK complexes were compared with CID spectra that resulted in dissociation to monomers (Fig. S4a-f).

Table 2 Relative free energies of (dGG + C*AQK + H)+ complexes

The CID-MS3 spectrum of (dAA + *CAQK − N2 + H)+ (m/z 1081, Fig. S5a) showed major cross-linked fragment ions by loss of adenine (m/z 946), backbone cleavage (w1/d1_M, m/z 848), and BH_M (m/z 652). The quality of this and Fig. S5b spectrum was affected by the presence of near-isobaric contaminants of ions from *CAQK disulfide that we found forming readily in solutions of peptides with free cysteine. These ions contained two diazirine groups and underwent loss of the second N2 molecule upon CID-MS3, giving rise to fragment ions at m/z 1053 and 1044 in the respective Figs. S5a and S5b. The CID-MS3 spectrum of (dTA + *CAQK − N2 + H)+ (m/z 1072, Fig. S5c) showed fragment ions by loss of adenine (m/z 937), but also an adenine-peptide adduct (m/z 652), indicating non-specific cross-linking in the 5′ and 3′ parts of the dinucleotide. The CID-MS3 spectrum of (dGG + *CAQK − N2 + H)+ (m/z 1113, Fig. S5d) was dominated by fragment ions formed by loss of guanine (m/z 962), the w1/d1_M backbone ion (m/z 864), and their dehydrated satellites. Also in this series, the CID-MS3 spectra of the dGC and dCG complexes displayed marked differences in the fragment ion intensities (Fig. S5e, f). The dinucleotides had different propensities for competing for the proton against CAQK*–N2, with (dCG + H)+ showing higher ion relative intensity than (dGC + H)+, implying a higher gas-phase basicity for dCG. Regarding the cross-linked fragments, loss of the 5′ nucleobase was prevalent, as evidenced by the relative intensities of the m/z 922 and m/z 962 ions from the dGC and dCG conjugates, respectively. Dinucleotide backbone cleavage formed the cross-linked w1_M ions, retaining the 3′-portion of the dinucleotide for both dGC and dCG. These results indicated that the peptide N-terminal tag in the complexes was in the vicinity of the 3′-residue regardless of whether it was guanine or cytosine.

The CID-MS3 spectrum of the (dAA + CAQK* − N2 + H)+ complex was similar to that of the C*AQK complex, showing again fragment ions by loss of adenine (m/z 946), formation of w1/d1_M (m/z 848), m/z 811, and m/z 713 ions along with their dehydrated satellites (Fig. S6a). These results indicated that in the dAA complexes, the position of the peptide diazirine tag did not play a critical role to affect cross-linking.

Reference CID of Dinucleotide Ions

The CID-MS3 of peptide-dinucleotide conjugates revealed that most dissociations occurred in the dinucleotide moieties whereas peptide backbone dissociations were scarce or absent. To assess the effect of peptide covalent binding on the CID spectra of the conjugates, we obtained reference CID-MS2 of protonated dinucleotides under the same conditions of slow heating in the ion trap, typically achieving > 80% precursor ion depletion following excitation. The reference spectra (Supplementary Figs. S7a-f) are now briefly described, and the observed fragment ions are compared with their analogues from the peptide conjugates.

CID-MS2 of (dAA + H)+ (Fig. S7a) showed a (m-BH)+ > w1/d1+ pattern of fragment ion intensities that was analogous to that for CID-MS3 of the C*AQK, *CAQK, and CAQK* conjugates. The [(m-BH)+]/[w1/d1+] intensity ratio for (dAA + H)+ (1.4) was slightly higher than that shown for (dAA + C*AQK − N2 + H)+ (1.2) in Fig. 3a. CID-MS2 of (dGG + H)+ (Fig. S7b) showed a dominant loss of guanine but relatively much less backbone cleavage to the w1/d1+ ions than was observed in CID-MS3 of the C*AQK and *CAQK conjugates. The differences for dAA and dGG may be caused by peptide cross-linking to the nucleobases, as indicated by the CID-MS3 spectra, which can be expected to hamper the loss of a free nucleobase from the conjugate. CID-MS2 of (dAT + H)+ (Fig. S7c) chiefly produced the w1+ fragment ion, which was also prominent in the CID-MS3 of the dAT conjugate (Fig. 3b). The CID-MS2 spectrum of (dGC + H)+ (Fig. S7d) displayed fragment ions by loss of G and C in a 1.1:1 ratio. This was different from the CID-MS3 spectrum of the dGC conjugate (Fig. 3d) where the loss of G was fourfold more prominent. In addition, the conjugate showed a higher propensity for backbone cleavage forming the w1_M+ fragment ion compared to the formation of the analogous w1+ ion from (dGC + H)+. Regarding dCG, both (CG + H)+ (Fig. S7e) and the conjugate ion with C*AQK (Fig. 3e) preferred loss of cytosine. The main difference was in the formation of the w1+ fragment ion which was more prominent in the CID-MS3 spectrum of the conjugate (m/z 713, Fig. 3e). Summarizing these comparisons, covalent attachment of CAQK peptides did not change the nature of CID of the dinucleotide conjugates and indicated the prevalence of 5′-base losses and formation of w ions. The differences in the relative intensities of analogous nucleotide and conjugate fragment ions can be related to the presence of the peptide moiety, in particular in blocking the loss of free nucleobases.

Complex Ion Structures

The experimental data indicated a high overall efficiency of gas-phase cross-linking in CAQK-dXX complexes that reached 46% for (dGG + C*AQK + H)+ (Table 1). At the same time, the CID-MS3 data provided only limited information on the location of covalent cross-links, which, for dAA and dGG, was hampered by their near-symmetrical nature, leading to indistinguishable isomeric fragments. To improve resolution and relate the observed cross-links to the gas-phase ion structures, we carried out extensive computational analysis of the (dGG + C*AQK + H)+ system using Born-Oppenheimer molecular dynamics (BOMD) at the semi-empirical level of quantum theory that included all valence-electron interactions. Thermodynamic calculations were also carried out that relied on density functional theory including dispersion interactions and solvation effects in water.

Initial complex ion structures were built for two different (dGG + C*AQK + H)+ protomers that were protonated at the Cys N-terminus and Lys side chain, in combination with peptide carboxylate or dGG phosphate anions to produce the overall charge of + 1. Several initial structures of each protomer type were built at different relative orientations of the peptide and dGG counterparts. In total, 11 initial structures were considered as starting points for the BOMD trajectory studies (Fig. S8). BOMD was run at 410 K for 20 ps to produce a total of 11 × 20,000 = 220,000 snapshots out of which 2200 structures were extracted at 100 fs intervals, optimized by PM6-D3H4, and sorted by their secondary and supersecondary structural similarities to compact duplicates and reduce the size of the selection. In the course of BOMD runs, the conformation sampling developed the initial complexes into five groups of canonical and zwitterionic structures out of which 60–70 distinct structures were selected for full geometry optimization by DFT gradient calculations. B3LYP/6-31G(d,p) optimized structures were used for frequency calculations to provide vibrational enthalpies and entropies whereas ωB97X-D/6–31 + G(d,p) optimized structures provided electronic energies. The electronic, vibrational, and rotational terms were then combined to yield relative free energies for the complexes. In addition, solvent effects were included in single-point energy calculations via the polarizable continuum model [45] using water as the dielectric. The calculated relative free energies are compiled in Table 2, the optimized geometries of complexes a-d, m, and j are given in supplementary Tables S3-S8.

The distribution of free energies for the gas-phase complexes (ΔGg,310K) is shown in a box-plot format (Fig. S9) for the five protomer types. The Cys-protonated canonical structures encompassed 10 conformers within a 22–237 kJ mol−1 energy range and a median of 106 kJ mol−1. The Lys-protonated canonical structures had 8 conformers ranging from 21 to 150 kJ mol−1, with a 63 kJ mol−1 median. The three zwitterionic protomer types showed distinct free-energy distributions. The (Cys+, Lys+, COO)+ combination had 20 structures within a 5.8–170 kJ mol−1 range and a 63 kJ mol−1 median. The (Cys+, Lys+, phosphate)+ protomers included ~ 16 structures within a narrow range of 0–69 kJ mol−1 and a median of ~ 19 kJ mol−1. The (3′-G+, Cys+, phosphate)+ combination had 7 structures within a 35–146 kJ mol−1 range with a 107 kJ mol−1 median. The energies of the gas-phase complexes indicated overall preference for zwitterionic protomers of the (Cys+, Lys+, COO)+ and (Cys+, Lys+, phosphate)+ types.

Since the complexes were formed in solution before or during electrospray, we also strived to assess their free energies in aqueous solution. To this end, we selected structures within 66 kJ mol−1 of the lowest-energy gas-phase structure for single-point energy calculations that included solvent effects. These showed that most low-energy solvated structures belonged to the zwitterionic protomer types. Solvation changed the ranking of some complexes as illustrated by the free-energy data for 16 low-energy structures a-p in Table 2. In particular, complex d, which is a (Cys+, Lys+, phosphate)+ zwitterion, was stabilized by 19 kJ mol−1 in water relative to the other complexes, becoming the global free-energy minimum. On the contrary, complexes l and p which were low free-energy zwitterions of the (Cys+, Lys+, COO)+ type in the gas phase were relatively destabilized by 35–40 kJ mol−1 in water (Table 2).

The binding between the peptide and dGG in complexes a-p was realized in a few general types (Fig. 4). Structures b, c, d, and m had strong hydrogen bonds between the charged phosphate and both peptide ammonium groups. In addition, the Lys NH3+ developed a stabilizing hydrogen bond to the 5′-hydroxyl in these low-energy complexes (b-d, m, Fig. 4) that were particularly favored by solvation. The hydrogen bonding to 5′-OH enforced the orientation of the peptide with respect to dGG in which the N-terminal tag was in the vicinity of the 3′-guanine. Another group of (Cys+, Lys+, phosphate)+ zwitterionic complexes, e, f, g, h, and n, were characterized by hydrogen bonds between the N-terminal NH3+ and the 3′-guanine whereas the Lys NH3+ developed hydrogen bonds to the 5′-guanine and OH group (Fig. S10). The Table 2 data indicate that this H-bonding arrangement was less energetically favorable, leading to higher free energies in the gas phase and upon solvation. The low-energy structure a (Fig. 4) represented a hybrid in which the N-terminal NH3+ forms a hydrogen bond to the phosphate whereas the Lys NH3+ bound to the 3′-guanine.

Fig. 4
figure 4

ωB97X-D/6–31 + G(d,p) optimized structures of zwitterionic (C*AQK + dGG + H)+ complexes (a), (b), (c), (d), and (m) and (j). Atom color coding is as follows: C in peptide = light green; C in dGG = cyan; O = red; N = blue; S = yellow; H = gray. Only exchangeable hydrogens are shown. Double-headed arrows indicate hydrogen bonds between the peptide and dGG. The diazirine ring is labeled with an asterisk

The (Cys+, Lys+, COO)+ zwitterionic complexes, i, j, k, l, o, and p offered a variety of hydrogen-bonding patterns (Fig. 5). In some (i, j, k, and o), the COO anion participated in H-bonding to dGG, and in the others, it did not (Fig. 5). There was no obvious effect of COO hydrogen bonding on the complex free energy, as structures of both types belonged to low-energy gas-phase complexes (j, k, l, and p, Table 2). However, only complex j was favored by solvation to compete with a-d and m. It appears from the energy data that complexes a-d, j, and m were thermodynamically most favored to be formed in aqueous solution and transferred to the gas phase. Conformational changes caused by thermal motion in gas-phase ions can favor low-energy gas-phase structures (p) and disfavor others (m). These effects are discussed later in the paper in connection with BOMD trajectory analysis.

Fig. 5
figure 5

ωB97X-D/6–31 + G(d,p) optimized structures of zwitterionic (C*AQK + dGG + H)+ complexes (i), (k), (l), (o), and (p) of the (Cys+, Lys+, COO)+ type. Atom color coding is as in Fig. 4

To summarize the structure analysis, the predominant interactions determining most of the low-energy dGG complex structures were due to ion-ion hydrogen bonding between the charged groups. Another important feature was hydrogen bonding of the Lys NH3+ to the dGG 5′-hydroxyl that assisted the alignment of the dGG and peptide ion moieties where the N-terminus carrying the diazirine tag was close to the 3′-nucleobase. Peptide hydrogen bonding to the nucleobases, which could be considered important for DNA sequence recognition, was important in several structures e, f, g, h, i, j, k, l, n, o, and p. However, according to the calculated free energies, these complexes, except j, were disfavored by solvation with water. We presumed that these results can be generalized to the other dinucleotides and used to interpret their cross-linking.

Born-Oppenheimer Molecular Dynamics and Contact Analysis

Complexes a-p were analyzed for close contacts between the diazirine carbon atom (C-136) and the guanine, 2′-deoxyribose, and phosphate oxygen atoms (Fig. 6). dGG atoms carrying hydrogen (X–H) were considered as potential targets for carbene insertion, while the basic N-7 and N-3 positions in the guanines could potentially react with the transient carbene. First, we inspected close contacts within 4.5 A in optimized structures a-p representing 0 K geometries. Structures b, f, n, o, and p displayed no 0 K contacts.

Fig. 6
figure 6

Close contacts of the incipient carbene atom in optimized (0 K) structures of (dGG + C*AQK + H)+. Bold: contacts within 4.5 Å; red italics: contacts within 4.0 Å. There were no 0 K contacts in complexes (b), (i), (n), (o), and (p)

The majority of close contacts in a, c, d, e, h, j, k, l, and m were realized in the 3'-guanine positions N-1 (N61) and NH2 (N62) for potential insertion to the N–H bonds, and at C-5 (C56), N-7 (N55), and C-8 (C52) for carbene addition. The 5′-nucleoside showed only two 0 K contacts, one at the NH2 (N21 in g) and the other with the deoxyribose C-2 (C28 in c).

BOMD trajectories were run for 100 ps at 310 K for complexes a-p. The distribution of contacts between the incipient carbene (C136) and the guanine and deoxyribose units is shown in Fig. 7 and further specified for the positions in the frequently visited 3′-guanine base, such as C-8 (C52), C-4 (C53), C-5 (C56), C-2 (C59), N-7 (N55), N-3 (N-57), N-1 (N61), and NH2 (N62) that are ordered left to right in the distribution bar graphs in Fig. 7. The overall contact frequency varied from 0.2% in complex f through 150% in complex a. The high figure exceeding 100% was due to multiple simultaneous contacts in the course of the 100,000-step trajectory. We focus the discussion of the contact data on the thermodynamically most stable complexes a-d, j, k, l, m, and p. Complexes b, c, d, and m showed a high proportion of contacts at the 3′-guanine N-7 position that correlated with the proximity of N-7 and the diazirine C136 in the 0 K structures (Fig. 6).

Fig. 7
figure 7

Distribution of close contacts in complexes (a-p) from BOMD trajectories of (dGG + C*AQK + H)+ complexes. Positions C-8, C-4, C-5, C-2, N-7, N-1, and NH2 refer to the 3′-guanine base

The other positions developing contacts in b, c, d, and m were C-8 and C-5 (Fig. 7). The total contact counts in b, c, d, and m were similar, ranging between 25 and 37% (Fig. 7). The contacts in complexes c and m showed similar distributions indicating similar ranges of thermal motion of the 4,4-azipentyl substituent in these complexes. Thermal motion in complexes a, j, k, l, and p created contacts with the N-1 and NH2 positions in 3′-guanine that were inaccessible in b, c, d, and m. In addition, a was the only low-energy complex that developed contacts in the 5′-nucleoside (Fig. 7).

Reaction Mechanisms for Covalent Bond Formation

The high overall cross-linking efficiency in (dGG + C*AQK + H)+ (46%, Table 1) indicated that a substantial fraction of the peptide carbenes that were formed at close contact with dGG reacted by forming covalent bonds. The trajectory data indicated that the majority of contacts occurred at 3′-guanine. This result was also consistent with the cross-linking data in the GC and CG complexes that showed preferential covalent bond formation in the 3′-nucleoside. The high incidence of close contacts at the 3′-nucleobase raised the question of the reaction mechanism for the covalent bond formation. Contacts with the 3′-guanine N-1-H and NH2 groups in complexes j, k, l, and p can proceed by the standard insertion mechanism [58] involving the N–H bonds. These insertions were expected to be 350–380 kJ mol−1 exothermic and thus thermodynamically driven. The exothermicity for the carbene insertion was estimated from the calculated reaction enthalpies for a model system, which was the insertion of propane-2-carbene (2) into the N-1-H and NH2 bonds in 9-methyl guanine (1) forming 2-propylguanines 3 and 4 (Scheme S1, Supporting Information). In contrast, carbene reactions with heterocycles in general [59], and nucleosides in particular [60,61,62], have been studied only sparsely. Our calculations of a model propane-2-carbene addition to N-7 in 9-methylguanine revealed that the reaction was substantially exothermic, ΔH0,rxn = − 152 to − 184 kJ mol−1 at different levels of theory (Scheme S1, Table S2), and therefore thermodynamically favorable. The addition initially formed a dipolar intermediate in the imidazole part of the nucleobase that can be characterized as a N-7-C-8-C-9 ylide (5). A similar intermediate has been observed in the dichlorocarbene reaction with pyridine [59]. Intermediate 5 can further react by closing a three-membered ring. The ring closure forming a N-7-C-8 fused azirine was calculated to be 66–81 kJ mol−1 exothermic, leading to a stable cross-linked product 6. The activation energy for the ring closure, 112–119 kJ mol−1 for TS1 relative to 5 (Scheme S1) can be readily provided by the exothermic addition.

We envisaged an analogous mechanism for the carbene attack on guanine in the (dGG + C*AQK − N2 + H)+ complex as sketched in Scheme 2. Because of the substantial exothermicity of these reactions, we did not expect the 3′- and 5′-guanine rings to display distinct reactivity toward the carbene intermediate. Rather, we argue that the preferential attack at the 3′-nucleobase was guided by the conformations of the complexes that stemmed from the hydrogen bonding interactions between the polar phosphate, ammonium, and 5′-OH groups. At the same time, the tight hydrogen bonding in the complexes allowed for only a limited conformational motion of the diazirine-carrying side chain whereas the core of the complex was not affected. This interpretation was consistent with the contact analysis of the 0 K structures and 310 K trajectories. For example, the low-energy complex d showed 0 K contacts with C-5, N-7, and C-8 in 3′-guanine (Fig. 6), which developed into the most frequent contacts with 3′-N-7 at 310 K whereas the other positions were visited much less (Fig. 7).

Scheme 2
scheme 2

Reaction of carbene with 3′-guanine in the (dGG + C*AQK − N2 + H)+ complex

Effects of Peptide Complexation on the Dinucleotide Structure

The optimized structures of low-energy complexes indicated strong dipolar interactions between dGG and C*AQK that were likely to also exist in complexes involving the other combinations of peptides and dinucleotides. To specify the effect of the peptide ion binding, we undertook a conformational search of dGG anions and compared their low-energy structures with those in the complexes. Three different initial monomer conformations were selected as starting points, which were of stacked, extended, and S-shape types. After conformation sorting, 13 non-degenerate structures were submitted to DFT geometry optimizations and frequency calculations to obtain the relative free energies. Based on this analysis, the lowest free-energy structures were of the stacked type (dGG1 and dGG2, Fig. S11), whereas the semi-folded S-shaped dGG3 and extended dGG4 were 28 and 31 kJ mol−1 higher in energy. The low-energy dGG structures displayed a stacked motif that was similar to that found in complexes c, d, and m (Fig. 4) where the exposed phosphate anion was solvated by the peptide dication. The hydrogen bond between the phosphate and 5-OH in dGG1 and dGG2 was replaced by an H-bond to the lysine NH3+ group in complexes c, d, and m, which endowed them with additional stability and directed the lysine C-terminus toward the 5′-nucleobase. However, the energy data suggested that coordination with the peptide ion can also be realized with other low-energy complex structures (a, b, j) in which guanine stacking was absent. This led us to the conclusion that non-bonding interactions in these peptide-dinucleotide complexes could produce low free-energy protomers and conformers displaying substantial structural variability while conserving or disrupting nucleobase stacking interactions in the dinucleotides.

Conclusions

Singly charged complexes of diazirine-tagged peptides CAQK with several DNA dinucleotides were generated in the gas phase. Peptide carbene intermediates formed by photodissociation were found to undergo efficient covalent cross-linking that largely occurred in the 3′-nucleoside. Electronic structure and Born-Oppenheimer molecular dynamics calculations of the (dGG + C*AQK + H)+ complex revealed a variety of low-energy conformers of zwitterionic types having doubly protonated peptides along with deprotonated dinucleotide phosphate or peptide carboxyl groups. A common feature revealed by the cross-linking data in accord with the structure analysis was that the peptide and dinucleotide preferred orientations allowing close approach of the diazirine tag at the N-terminal cysteine to the 3′-nucleoside. Thermal motion at 310 K in the complexes resulted in an extended range of close contacts between the tag and the nucleotide, but the core structure of the complex, as determined by hydrogen bonding, was not disrupted.