## Abstract

ASCAN is a new algorithm for automatic sequence-specific NMR assignment of amino acid side-chains in proteins, which uses as input the primary structure of the protein, chemical shift lists of ^{1}H^{N}, ^{15}N, ^{13}C^{α}, ^{13}C^{β} and possibly ^{1}H^{α} from the previous polypeptide backbone assignment, and one or several 3D ^{13}C- or ^{15}N-resolved [^{1}H,^{1}H]-NOESY spectra. ASCAN has also been laid out for the use of TOCSY-type data sets as supplementary input. The program assigns new resonances based on comparison of the NMR signals expected from the chemical structure with the experimentally observed NOESY peak patterns. The core parts of the algorithm are a procedure for generating expected peak positions, which is based on variable combinations of assigned and unassigned resonances that arise for the different amino acid types during the assignment procedure, and a corresponding set of acceptance criteria for assignments based on the NMR experiments used. Expected patterns of NOESY cross peaks involving unassigned resonances are generated using the list of previously assigned resonances, and tentative chemical shift values for the unassigned signals taken from the BMRB statistics for globular proteins. Use of this approach with the 101-amino acid residue protein FimD(25–125) resulted in 84% of the hydrogen atoms and their covalently bound heavy atoms being assigned with a correctness rate of 90%. Use of these side-chain assignments as input for automated NOE assignment and structure calculation with the ATNOS/CANDID/DYANA program suite yielded structure bundles of comparable quality, in terms of precision and accuracy of the atomic coordinates, as those of a reference structure determined with interactive assignment procedures. A rationale for the high quality of the ASCAN-based structure determination results from an analysis of the distribution of the assigned side chains, which revealed near-complete assignments in the core of the protein, with most of the incompletely assigned residues located at or near the protein surface.

### Similar content being viewed by others

## Notes

ASCAN is also laid out to operate with TOCSY data sets, which provide similar information to that exploited in the NOESY spectra and could be used as supplementary input. Whereas the main text describes exclusively the use of ASCAN with NOESY, its use with TOCSY-type spectra is further discussed in Appendix 2, which also provides a description of the expected TOCSY peak patterns.

## References

Altieri AS, Byrd RA (2004) Automation of NMR structure determination of proteins. Curr Opin Struct Biol 14:547–553

Atreya HS, Sahu SC, Chary KV, Govil G (2000) A tracked approach for automated NMR assignments in proteins (TATAPRO). J Biomol NMR 17:125–136

Baran MC, Huang YJ, Moseley HNB, Montelione GT (2004) Automated analysis of protein NMR assignments and structures. Chem Rev 104:3541–3555

Bartels C, Güntert P, Billeter M, Wüthrich K (1997) Automated sequence-specific NMR assignment of homologous proteins using the program GARANT. J Comp Chem 18:139–149

Buchler NE, Zuiderweg ER, Wang H, Goldstein RA (1997) Protein heteronuclear NMR assignments using mean-field simulated annealing. J Magn Reson 125:34–42

Coggins BE, Zhou P (2003) PACES: protein sequential assignment by computer-assisted exhaustive search. J Biomol NMR 26:93–111

Eghbalnia HR, Bahrami A, Tonelli M, Hallenga K, Markley JL (2005) Probabilistic identification of spin systems and their assignments including coil-helix inference output (PISTACHIO). J Am Chem Soc 127:12528–12536

Etezady-Esfarjabi T, Placzek WJ, Herrmann T, Wüthrich K (2006) Solution structures of the putative anti-sigma-factor antagonist TM1442 from Thermotoga maritima in the free and phosphorylated states. Magn Reson Chem 44:61–70

Garrett DS, Powers R, Gronenborn AM, Clore GM (1991) A common sense approach to pick-peaking two-, three- and four-dimensional spectra using automatic computer analysis of contour diagrams. J Magn Reson 95:214–220

Gronwald W, Kalbitzer HR (2004) Automated structure determination of proteins by NMR spectroscopy. Prog Nucl Magn Reson Spectrosc 44:33–96

Gronwald W, Kirchfofer R, Gorler A, Kremer W, Gansmeier B, Neidig KP, Kalbitzer HR (1998) CAMRA: chemical shift based computer aided protein NMR assignments. J Biomol NMR 12:395–405

Güntert P (2003) Automated NMR protein structure calculation. Prog Nucl Magn Reson Spectrosc 43:105–125

Güntert P, Braun W, Wüthrich K (1991) Efficient computation of three-dimensional protein structures in solution from nuclear magnetic resonance data using the program DIANA and the supporting programs CALIBA, HABAS and GLOMSA. J Mol Biol 217:517–530

Güntert P, Mumenthaler C, Wüthrich K (1997) Torsion angle dynamics for NMR structure calculation with the new program DYANA. J Mol Biol 273:283–298

Herrmann T, Güntert P, Wüthrich K (2002a) Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. J Mol Biol 319:209–227

Herrmann T, Güntert P, Wüthrich K (2002b) Protein NMR structure determination with automated NOE-identification in the NOESY spectra using the new software ATNOS. J Biomol NMR 24:171–189

Hyberts SG, Wagner G (2003) IBIS—a tool for automated sequential assignment of protein spectra from triple resonance experiments. J Biomol NMR 26:335–344

Koradi R, Billeter M, Wüthrich K (1996) MOLMOL: a program for display and analysis of macromolecular structures. J Mol Graph 14:51–55

Koradi R, Billeter M, Engeli M, Güntert P, Wüthrich K (1998) Automated peak picking and peak integration in macromolecular NMR spectra using AUTOPSY. J Magn Reson 135:288–297

Kraulis PJ (1989) ANSIG—a program for the assignment of protein H-1 2D NMR spectra by interactive computer graphics. J Magn Reson 24:627–633

Malmodin D, Papavoine CHM, Billeter M (2003) Fully automated sequence-specific resonance assignments using multi-way decomposition. J Biomol NMR 27:69–79

Michel E, Damberger FF, Chen AM, Ishida Y, Leal WS, Wüthrich K (2005) Assignments for the Bombyx mori pheromone-binding protein fragment BmPBP (1–128) at pH 6.5. J Biomol NMR 31:65

Moseley HNB, Montelione GT (1999) Automated analysis of NMR assignments and structures of proteins. Curr Opin Struct Biol 9:635–642

Moseley HNB, Monleon D, Montelione GT (2001) Automatic determination of protein backbone resonance assignments from triple resonance nuclear magnetic resonance data. Methods Enzymol 339:91–108

Moseley HNB, Riaz N, Aramini JM, Szyperski T, Montelione GT (2004) A generalized approach to automated NMR peak list editing: application to reduced dimensionality triple resonance spectra. J Magn Reson 170:263–277

Nishiyama M, Horst R, Eidam O, Herrmann T, Ignatov O, Vetsch M, Bettendorff P, Jelesarov I, Grütter MG, Wüthrich K, Glockshuber R, Capitani G (2005) Structural basis of chaperone-subunit complex recognition by the type 1 pilus assembly platform FimD. EMBO J 24:2075–2086

Orekhov VY, Ibraghimov VI, Billeter M (2001) MUNIN: a new approach to multi-dimensional NMR spectra interpretation. J Biomol NMR 20:49–60

Seavey BR, Farr EA, Westler WM, Markley JL (1991) A relational database for sequence-specific protein NMR data. J Biomol NMR 1:217–236

Slupsky CM, Boyko RF, Booth VK, Sykes BD (2003) Smartnotebook: a semi-automated approach to protein sequential NMR resonance assignments. J Biomol NMR 27:313–321

Wüthrich K (1986) NMR of proteins and nucleic acids. Wiley, New York

Wüthrich K, Billeter M, Braun W (1983) Pseudo-structures for the 20 common amino acids for use in studies of protein conformations by measurements of intramolecular proton–proton distance constraints with nuclear magnetic resonance. J Mol Biol 169:949–961

Zimmerman DE, Kulikowski CA, Huang Y, Feng W, Tashiro M, Shimotakahara S, Chien C, Powers R, Montelione GT (1997) Automated analysis of protein NMR assignments using methods from artificial intelligence. J Mol Biol 269:592–610

## Acknowledgments

We thank Dr. B. Pedrini for sharing his experience with applications of ASCAN for side-chain resonance assignments in a variety of proteins selected as targets in a structural genomics project. Financial support by the Schweizerischer Nationalfonds (project 3100-AO-113838) is gratefully acknowledged. Kurt Wüthrich is the Cecil H. and Ida M. Green Professor of Structural Biology at the Scripps Research Institute, and a member of the Skaggs Institute of Chemical Biology.

## Author information

### Authors and Affiliations

### Corresponding author

## Appendices

### Appendix 1: Generation of expected peak pattern

The evaluation of sets of expected peak patterns (Eq. 7) and of the values of the scoring function (Eq. 12) are the core elements of the ASCAN approach. In each ASCAN iteration, these two quantities are derived from the set of already known resonance frequencies (Eq. 1). In this treatment so far, the three frequency dimensions in the Eqs. 7 and 8 were selected with the sole criteria that two out of the three frequencies of an expected peak must be restrained by previous resonance assignments, and they did not necessarily have to correspond to the frequency dimensions of the NMR experiment used. In the following, however, the notation is more specific, with the three frequency coordinates Δ*ω*
_{1}, Δ*ω*
_{2} and Δ*ω*
_{3} of a cross peak in a 3D spectrum corresponding, in this order, to the indirect proton frequency, the heavy atom frequency, and the direct proton frequency. For the actual computation of *E*(*u*) and \( F(\Upomega _{{u_{k} }} ) \)with Eqs. 7 and 12, respectively, we distinguish three situations, depending on the extent of the previous assignments and the chemical structure of the amino acid side chain to be assigned.

Firstly, if the resonance frequency, \( \Upomega _{h(u)} , \) of the ^{13}C or ^{15}N atom, \( h\left( u \right) \in A, \) that is covalently bound to an unassigned ^{1}H atom, *u*, with \( u \in P\backslash A, \) is known, then the set of assigned atom pairs, *D*(*u*), used to determine the expected peak pattern for an unassigned ^{1}H atom, *u*, is derived from all so far assigned ^{1}H atoms, \( p_{i} \in A, \) that satisfy Eq. 6. Since \( h\left( u \right) \in A, \) the set of atom pairs *D*(*u*) can then be generated either with Eqs. 4 or 5. Consequently, the set of expected peaks for *u* is composed of two subsets,

which represent, respectively, the situations where either the frequency of the heavy atom bound to the previously assigned hydrogen atom, *h* (*p*
_{
i
}), or of the heavy atom bound to the unassigned hydrogen atom, *h* (*u*), is known:

In Eqs. 16 and 17, *M*
^{1} and *M*
^{2} denote the number of peaks in the two subsets (see also Eq. 7). The scoring function, *F* (Eq. 12), yields a scoring value for each potential resonance frequency, \( \Upomega _{{u_{k} }} \in R(u), \) which is calculated as follows:

In Eq. 18 the sums run over all the elements of the two subsets of expected peaks, \( \overrightarrow {{e^{1} (u)_{i} }} \in E^{1} (u) \) and \( \overrightarrow {{e^{2} (u)_{i} }} \in E^{2} (u). \)In addition to the three acceptance criteria formulated with Eqs. 13 and 14, each potential resonance frequency, \( \Upomega _{{u_{k} }} \in R(u), \) must result from at least one confirmed match between observed peaks and expected peaks from the subset *E*
^{2}(*u*), otherwise the value of the scoring function \( F(\Upomega _{{u_{k} }} ) \) is reset to zero. This additional requirement ensures that at least one observed peak used to identify the new resonance frequency of a previously unassigned hydrogen atom, *u*, is compatible with the known resonance frequency of its covalently bound heavy atom, Ω_{
h(u)}
*.*

Secondly if the resonance assignments for a ^{1}H atom, *u*, and its covalently bound heavy atom, *h*(*u*), are both unknown, with \( u,h(u) \in P\backslash A, \) then a two-step procedure is employed, whereby initially a set of potential resonance frequencies, *R*(*u*), for the unassigned proton, *u*, is obtained, and subsequently a set of potential resonance assignments for *h*(*u*), \( R(\Upomega _{{u_{k} }} ;h(u)) \) is identified, with \( \Upomega _{{u_{k} }} \in R(u). \)In the initial step, an analogous procedure to the one described above is applied, where the set of expected peaks involving the resonance frequency of the so far unassigned ^{1}H atom, *u*, is defined by

The mapping function, *Q*, between the sets of expected peaks, *E*(*u*), and the sets of observed peaks, *O*(*u*) ⊂ *S*, for a ^{1}H atom, *u*, yields a set of potential resonance frequencies, *R*(*u*) (Eqs. 9 and 10), and for each potential resonance frequency, \( \Upomega _{{u_{k} }} \in R(u), \) a value of the scoring function, \( F(\Upomega _{{u_{k} }} ) \) is calculated with Eq. 12. Subsequently, a set of potential resonance assignments for the covalently bound heavy atom of the proton *u*, *h*(*u*), is determined, with \( \Upomega _{{u_{k} }} \in R(u). \)The set of expected peaks correlating with *h*(*u*) is then defined by

A set of observed peaks, \( O(\Upomega _{{u_{k} }} ;h(u)) \subset S, \) is extracted from the updated list of the local extrema identified by a two-dimensional grid spanned by the potential resonance frequencies of *u*, \( \Upomega _{{u_{k} }} \in R(u), \) and all the chemical shifts of the previously assigned hydrogen atoms, \( p_{j} \in A. \)The mapping between the set of expected peaks, \( E(\Upomega _{{u_{k} }} ;h(u)), \) and the set of observed peaks, \( O(\Upomega _{{u_{k} }} ;h(u)) \) with Eq. 9 then yields a set of potential resonance frequencies, \( R(\Upomega _{{u_{k} }} ;h(u)) \) (Eq. 10). For each of these potential resonance frequencies, \( \Upomega _{{h(u)_{j} }} \in R(\Upomega _{{u_{k} }} ;h(u)), \) a value for the scoring function \( F(\Upomega _{{h(u)_{j} }} ) \)is then calculated with (Eq. 12). At the end of the two-stage procedure for this combined hydrogen and heavy atom assignment, a set of potential resonance frequency pairs,

is obtained for the unassigned ^{1}H atom, *u*, and its covalently bound heavy atom, *h*(*u*). For each potential pair of resonance frequencies, \( \left( {\Upomega _{{u_{k} }} ,\Upomega _{{h(u)_{k} }} } \right) \in R(u,h(u)), \) two scoring values, \( F(\Upomega _{{u_{k} }} ) \) and \( F(\Upomega _{{h(u)_{j} }} ), \) are calculated with Eq. 12, and the atom pair *u*, *h*(*u*), is added to the list of assigned atoms if the acceptance criteria of Eqs. 13 and 14 are met simultaneously for both potential resonance frequencies of the atom pair.

Thirdly, for ^{13}C–^{1}H fragments in aromatic rings, with, \( u,h(u) \in P\backslash A, \) the resonance assignments for the two atoms in the ^{13}C–^{1}H moiety can be obtained in a single step if the nearest-neighbor ^{13}C–^{1}H group in the aromatic ring has previously been assigned.

The set of assigned atom pairs used to derive the expected peak pattern for the so far unassigned aromatic ^{13}C–^{1}H moiety then consists of a single element, which is composed of the two assigned atoms H^{δ} and C^{δ} of the same aromatic ring:

The set of expected peaks for the unassigned aromatic ^{1}H atom, \( u \in P\backslash A, \) and its covalently bound heavy atom, \( h(u) \in P\backslash A, \) is then defined by

where the frequencies \( \Upomega _{{(C^{\delta } ,H^{\delta } )_{i} }} \) are determined using the local extrema that match the frequency coordinates given by Eq. 24 within a tolerance window \( \Updelta \vec{\omega } = (\Updelta \omega ^{p} ,\Updelta \omega ^{h} ,\Updelta \omega ^{p} ) \) (see Table 1):

The mapping function, *Q*, between the set of expected peaks in Eq. 23, *E*(*u*, *h*(*u*)), and the set of observed peaks, *O*(*u*, *h*(*u*)) (Eq. 8), then yields a set of potential resonance frequencies, *R*(*u*, *h*(*u*)) (Eq. 10), and for each pair of potential resonance frequencies, \( \left( {\Upomega _{{u_{k} }} ,\Upomega _{{h(u)_{k} }} } \right) \in R(u,h(u)), \) a single value for the scoring function \( F(\Upomega _{{u_{k} }} ,\Upomega _{{h(u)_{k} }} ) \)is evaluated (Eq. 12). Finally, the aromatic ring ^{13}C–^{1}H moiety with \( \left( {\Upomega _{{u_{k} }} ,\Upomega _{{h(u)_{k} }} } \right) \in R(u,h(u)) \) is added to the list of assigned atom if the acceptance criteria of Eqs. 13 and 14 are met.

### Appendix 2: Use of ASCAN with supplementary input of TOCSY data

In addition to accepting an input of 3D ^{13}C- or ^{15}N-resolved [^{1}H,^{1}H]-NOESY data, ASCAN is also laid out to operate on 3D heteronuclear-resolved [^{1}H,^{1}H]-TOCSY data sets. The correlation of an unassigned atom, \( u \in P\backslash A, \) to the set of assigned atom pairs, *D*(*u*), is then based on the TOCSY magnetization transfer pathways. Sets of assigned atom pairs, *D*(*u*), arising from scalar (“through-bond”) coupling of an unassigned hydrogen atom, \( u \in P\backslash A, \) with assigned hydrogen atoms, \( p_{i} \in A, \) are derived from the covalent structure based on the fact that only pairs of hydrogen atoms, *u* and *p*
_{
i
}, which are separated by a given number, *n*, of covalent bonds, \( n_{{{\text{up}}_{i} }}^{\text{cov}} , \) give rise to a scalar coupling in the same spin system. All correlated pairs of hydrogen atoms determined with Eq. 25,

are used to generate the elements of *D*(*u*) (Eq. 3) with the Eqs. 4 and/or 5. Overall, the treatment of [^{1}H,^{1}H]-NOESY and [^{1}H,^{1}H]-TOCSY data by ASCAN differs only by the considerations given to the different coherence transfer pathways in the two experiments, as reflected in the Eqs. 6 and 25. In practice, TOCSY data sets can be a useful supplement to the NOESY input data, and they should be used exclusively in conjunction with NOESY data.

## Rights and permissions

## About this article

### Cite this article

Fiorito, F., Herrmann, T., Damberger, F.F. *et al.* Automated amino acid side-chain NMR assignment of proteins using ^{13}C- and ^{15}N-resolved 3D [^{1}H,^{1}H]-NOESY.
*J Biomol NMR* **42**, 23–33 (2008). https://doi.org/10.1007/s10858-008-9259-x

Received:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s10858-008-9259-x