Introduction

Mycoplasmas belong to the class Mollicutes and number approximately 200 species, among which are obligate parasites of humans and commercially important mammals [1] such as pigs. Mycoplasmas are wall-less bacteria distinguished by small genomes of low G+C content. The parasitism, the reduced genome, and the close association of these bacteria with their hosts have contributed to the absence of enzymes involved in important biosynthetic pathways in mycoplasma [2].

Enzootic pneumonia caused by Mycoplasma hyopneumoniae is a major constraint to efficient pork production worldwide. The M. hyopneumoniae genome contains 920,079 base pairs and 716 protein-coding genes, of which 418 encode proteins that are homologous to proteins with known functions. Currently, there are nearly 1,500 complete genome sequences in GenBank, and half of all of the predicted genes encode proteins having no inferable functions. Similarly, almost 42 % of predicted M. hyopneumoniae genes correspond to proteins annotated as hypothetical [3]. This lack of annotation is a particularly intriguing and unsolved issue because, as mentioned above, components of important and essential metabolic pathways present in other organisms have not been identified in mycoplasmas [4, 5].

The BLAST program [6] has contributed significantly to the analysis of nucleotide and amino acid sequences, allowing the prediction of biological functions and evolutionary relationships of genes and proteins [7]. However, this tool can be used with a high degree of confidence only when the sequences are evolutionarily close to each other and the identity between them is over 50%. To overcome these limitations, alternative methodologies such as threading and homology modeling have been used to answer questions about protein properties. These methods are possible because biological processes such as gene duplication and evolutionary divergence occur in many distantly related organisms [8], giving rise to structurally and functionally similar families of proteins. When one or more proteins in a family have experimentally determined structures, it is feasible to model the structures of many other members with reasonable accuracy. This condition is particularly true when the sequence identity between protein domains is ≥30% and larger than 100 residues.

Threading and homology modeling can identify domains and active sites, aiding in placing their locations within a 3D structure (i.e.,surface or buried). Because the determination of a crystal structure is an arduous and sometimes impractical task for some proteins, the homology modeling methodology is a helpful approach that can guide further experimental assays to investigate protein function [911]. The rapid growth of structural genomics is producing a considerable number of templates that can be used for homology modeling. The availability of more templates increases the quality of new models, thereby diminishing the gap between computationally derived models and experimental outcomes.

Thus far, mycoplasma genome sequences have not been annotated for activities related to the utilization of ATP, NAD and NADH and amino acid synthesis derived from pyruvate. However, genes corresponding to these activities must exist, otherwise their enzymatic activities would not have been found [12]. This discrepancy suggests that sequence-based methodologies for identifying protein function may not be suitable for mycoplasmas in some cases.

In this study, using structure-based approaches, we were able to predict the function of seven proteins annotated as hypothetical in the M. hyopneumoniae genome. Three of the proteins are involved in metabolic processes, a finding that may enhance further studies concerning the metabolism of this bacterium. Another two proteins are involved in transcription, controlling gene expression based on cellular or environmental signals, an important characteristic of pathogenic bacteria such as M. hyopneumoniae. Functions for the other two proteins could not be assigned, but their modeled structures suggest experimental designs, which will allow future investigation concerning their function.

Materials and methods

The sequences of 298 proteins belonging to M. hyopneumoniae strain 7448, currently annotated as hypothetical in the Genesul database (http://www.genesul.lncc.br/finalMP/), were submitted to two threading programs, GenThreader [13] and Prospect-PSPP [14]. Additionally, these data were analyzed by InterProScan [15] and COG [16], and the functional predictions of these four programs were compared. Thirty-four sequences with the same functional predictions given by at least two of the mentioned programs were selected for manual analysis, resulting in the further selection of seven targets for structural investigation. Firstly, the sequences of these seven proteins were submitted to a PSI-BLAST search at http://blast.ncbi.nlm.nih.gov/Blast.cgi against the Protein Data Bank (PDB). To guide the functional inference of uncharacterized proteins, other bioinformatics tools were used as described elsewhere [17]. These other tools suggested scans against sequence pattern, domain, and family classification databases, as well as structural family databases, to identify conserved, functional residues and to extract homologs for post-hoc comparative modeling.

The local alignment between sequences of the seven selected proteins and their templates provided by threading results was performed using the EMBL/EBI software MAFFT [18] with little manual editing. Sequences were retrieved from NCBI and GeneSul. The BLOSUM30 matrix was used with gap and extension penalties of 1.0 and 0.123, respectively. Afterward, the alignment was used to model the selected proteins with the Modeller program [19] (version 9v8). The overall geometric and stereochemical qualities of the structures were assessed using PROCHECK through the PDBsum server [20] and PROSA-web [21] and are listed in Table 1.

Table 1 Sequence and structure information of the selected proteins and their templates

Results and discussion

Threading is based on sequence-to-structure alignment. The target sequence is “threaded” through each template present in databases that contain all known protein folds. Threading is performed by using measures for fitness for each type of amino acid in local structural environments and defined in terms of solvent accessibility and protein secondary structure. If a sequence fits well with a given fold, conserved residues are likely shared suggesting similar functions [22].

The PROSPECT-PSPP threading pipeline showed that 27 (9.06%) of 298 target proteins gave PSI-BLAST hits against the PDB with an E-value < 0.0001, indicating the existence of homologs. Additionally, 83 (27.85%) of the proteins had hits against PDB with a Z-score >20, indicating that the fold recognition confidence level was >99%; the remainder of the proteins had hits with confidence levels between 85 and 99%. The GenThreader results had high confidence levels (certain) for 84 (32.43%) of 259 proteins (total number of hypothetical sequences available in 2005). Detailed information analysis obtained by threading provided interesting and consistent results, which helped us to select seven proteins having the same prediction by the both mentioned programs. In addition, we followed the protocol suggested by Mazumder and Vasudevan [17], as mentioned in Materials and methods. The results proposed homologs with 3D structures available, thereby providing new knowledge to be applied for comparative modeling.

In the following sections, we will discuss the 3D structures and functions predicted for the seven proteins (YP_287866, YP_287786, YP_287675, YP_287559, YP_288024, YP_287971 and YP_288034). Table 1 lists the templates used to obtain the 3D structures and information about the selected protein models.

Completing the NAD biosynthesis pathway

The 3D structure of hypothetical protein YP_287866 exhibits similarity to portions of two different proteins, i.e., the N-terminal region of nicotinate-nucleotide adenylyltransferase (NadD) and the C-terminal region of an uncharacterized histidine-aspartate (HD) domain. Although the steps in NAD biosynthesis and recycling can vary between species, the enzymes involved in these pathways are generally the following: 1) nicotinate phosphoribosyltransferase (NAPRTase) (EC 2.4.2.11), 2) nicotinate mononucleotide adenylyltransferase (NaMNAT or NadD) (EC 2.7.7.1), and 3) NAD synthetase (NadE) (EC 6.3.1.5) (Fig. 1). These enzymes are encoded by the conserved genes pncB, nadD and nadE, respectively. Enzymes involved in NAD biosynthesis have been considered as promising drug targets because they are essential for the viability of most bacteria [23, 24]; however, only nadE is annotated in M. hyopneumoniae. Because NadD is likely essential, characterization of this enzyme using a structure-based approach for M. hyopneumoniae will improve its annotation and add this enzyme to the list of potential therapeutic targets.

Fig. 1
figure 1

Simplified NAD biosynthesis pathway proposed for M. hyopneumoniae. Highlighted in blue circles are the EC numbers of the enzymes whose 3D structure was predicted in this study. YP_287786 is proposed to be EC 2.4.2.11, a nicotinate phosphoribosyltransferase. YP_287866 (N-terminal region) is suggested to be a nicotinate-nucleotide adenylyltransferase, EC 2.7.7.18. EC 6.3.5.1 is the enzyme NadE, already annotated in M. hyopneumoniae. The 3D structures were obtained using comparative modeling methodology, and the structures were rendered with Pymol (www.pymol.org)

The sequence similarity between the YP_287866 N-terminal region and other nicotinate-nucleotide adenylyltransferases is low (approximately 30%); however, the proteins share two highly conserved ATP-binding motifs, GXXXPX(T/H)XX and SX(T/S)XXR. The crystal structures of many NaMNAT proteins [2529] reveal the residues involved in their function, such as the following: 1) His20, Ser162, Arg167 and the essential His17 in the enzymes from Pseudomonas aeruginosa [30], Escherichia coli [31] and B. subtilis [28], located in the ATP binding site, 2) Thr87 and Trp117 that interact with the substrate nicotinic acidyl, and 3) Arg134 that interacts with the adenosine.

The template selected to obtain the 3D structure of the YP_287866 N-terminal region was the crystal structure of nicotinic acid mononucleotide adenylyltransferase from Staphylococcus aureus [26] (PDB ID: 2H29). The sequence identity between these two proteins is 35%; however, they share similar topologies, being composed of eight α-helices, a six-stranded parallel β-sheet and an additional β-strand.

The model obtained for the YP_287866 C-terminal region adopted a similar conformation to proteins belonging to the metal-dependent phosphohydrolase superfamily. These proteins possess a variety of uncharacterized domains associated with nucleotidyltransferases from bacteria, archaea and eukaryotes; YP_287866 also appears to possess one of these domain architectures. The limitation of low sequence identity (∼ 25%) between YP_287866 and these proteins was circumvented by the presence of a metal-binding HD motif [32] in YP_287866. Crystal structures of HD-domain proteins have been solved for Bacillus halodurans (PDB ID: 2O08) and Streptococcus agalactiae (PDB ID: 2OGI); however, a large number of the HD-domain proteins remains uncharacterized [33].

Concerning the C-terminal region of YP_287866 (YP_287866C), the template used was the crystal structure of the putative metal-dependent phosphohydrolase from S. agalactiae (PDB ID: 2OGI). The resulting model consisted of an all-alpha structure formed by 13 helices.

YP_287866 is encoded by only one gene; however, it comprises two distinct domains with different functions. The complete model showed both domains linked by a disulfide bond between Cys74 and Cys275 within the N-terminal and C-terminal regions, respectively. This domain architecture was also found in another HD-domain protein fused to a nucleotidyltransferase domain [32]. Because the binding sites in both domains are not spatially superimposed, and the templates form dimers (2H29 and 2OGI), we can conclude that this architecture is likely to exist. Moreover, the model has 97.9% of its residues in preferred and allowed regions of the Ramachandran plot, indicating good stereochemical quality.

As mentioned above, some enzymes of the NAD biosynthetic and recycling pathways have not been identified in M. hyopneumoniae. However, based on structural information, we propose that one of the YP_287866 domains is NadD, and we also suggest that YP_287786 functions in this same metabolic pathway, thereby completing the NAD biosynthetic pathway.

The threading programs suggested the crystal structure of nicotinate phosphoribosyltransferase from Thremoplasma acidophilum (TmNAPRTase) [34] (PDB ID: 1YTK) as the best hit for the YP_287786 sequence. Further structural analysis suggested another homolog with a solved 3D structure, i.e., NAPRTase (EC 2.4.2.11) from Enterococcus faecalis (EfNAPRTase) (PDB ID: 2F7F). This enzyme catalyzes the synthesis of nicotinic acid mononucleotide (NAMN) from adenine and phosphoribosyl pyrophosphate (PRPP), regardless of the presence of ATP.

Although the sequence similarities between YP_287786 and its structural homologs TmNAPRTase and EfNAPRTase showed low overall identity (∼ 25%), many residues were found conserved, among which were TmNAPRTase residues Arg224, Asp226, Glu273 and Glu292 involved in NAMN binding [34]. Two other residues also implicated in NAMN binding are found in TmNAPRTase and substituted in YP_287786, i.e., Thr179/Ser166 and Thr293/Val294. The first substitution, between amino acids having a similar physicochemical property, may not affect the function of YP_287786 because NAMN binds TmNAPRTase through a hydroxyl group.

To transfer the phosphoribosyl group, PRPP must bind to NAPRTase. Two conserved motifs, 275hSGGh279 (h stands for hydrophobic residue) and 298GVG301, are responsible for accommodating the phosphate group of PRPP. Both motifs are conserved in YP_287786 except for a glycine residue being replaced by a serine at position 277. The stereochemical quality of the YP_287786 model was verified by the Ramachandran plot calculated using PROCHECK, which showed 97% of the residues in preferred or allowed positions.

Filling gaps in M. hyopneumoniae pathways

The biosynthesis of flavin adenine dinucleotide (FAD) in prokaryotes involves bifunctional proteins belonging to the FAD synthetase family that catalyze both riboflavin (RF) phosphorylation and flavin mononucleotide (FMN) adenylylation. In our study, the sequence of YP_287675 showed similarities to the crystal structure of FAD synthetase (TM379) from T. maritime [35] (PDB ID: 1S4M) and the in silico model of FAD synthetase from Corynebacterium ammoniagenes [36] (CaFADS) (PDB ID: 2X0K). Using the comparative genome tool from Genesul, we noticed that FAD synthetase was annotated in other mycoplasma genomes and YP_287675 also belongs to this cluster.

The 3D structure obtained for YP_287675 showed an overall topology similar to its template 1S4M. As expected, these proteins are folded in two domains. The N-terminal domain contains the FMN adenylylation function, catalyzing the reaction between ATP and FMN to form pyrophosphate and FAD (EC 2.7.7.2). Structurally, this domain consists of a typical nucleotide-binding fold (Rossmann fold) containing an ATP-binding site. The motif V/IXGX1-2GXXGXXXG/A associated with the Rossmann fold and FMN binding is present in YP_287675 with a few amino acid substitutions, i.e., VX3GGX2AX3GX7A. This motif was important in assigning biological function to proteins with unknown function from fully sequenced genomes [37]. Moreover, these residues are located in conserved positions allowing substrate binding. Similarly, the residues believed to be involved in ATP-binding are conserved between YP_287675 and its template, except for Glu25 and Phe100 (replaced by aspartate and tyrosine, respectively, in 1S4M and 2X0K).

The second domain of YP_287675, the C-terminal domain, folds into a six-stranded, antiparallel β-barrel architecture, implicated in RF binding. This interaction also involves a long α-helix and a conserved histidine at position 233. RF phosphorylation by CaFADS involves three important residues, Thr208, Asn210 and Asp268 [36]. With respect to sequence, none of these residues are at the same positions in YP_287675; however, the asparagine is maintained at the same structural location. Despite lacking structural information for some regions, the 3D structure of YP_287675 revealed that 96.5% of the residues are in favored and allowed regions.

The understanding of mycoplasma metabolism requires adequate annotation of its proteome. Our structure-based annotation of the proteins YP_287866, YP_287786 involved in NAD biosynthesis and YP_287675 implicated in FAD biosynthesis fills gaps in this annotation. Furthermore, proteins required in these biosynthetic pathways are being considered as antimicrobial drug targets.

Two important proteins implicated in transcription may not be absent from M. hyopneumoniae

The hypothetical protein YP_287559 exhibited structural similarities to the prokaryotic transcription factor NusB. NusB participates in the antitermination process, in which RNA polymerase is prevented from reading specific RNA secondary structures that usually terminate transcription. In E. coli, antitermination involves at least three Nus proteins: NusB, NusE (identical to the ribosomal protein S10), and NusG [38]. NusB, in association with these other proteins, is believed to bind an RNA motif, boxA, present in E. coli rrn operons. Mutations in NusB lower growth rate, which is an evidence for its role in rRNA synthesis [39]. E. coli has seven rrn operons whereas M. tuberculosis [40] and M. hyopneumoniae have only one such operon. Therefore, an efficient antitermination mechanism is particularly important in these pathogenic bacteria to ensure the expression of the entire single rrn operon [41]. Except for NusB, all other proteins required for efficient antitermination, such as NusA, NusG and S10, have been annotated in M. hyopneumoniae.

YP_287559 has only 133 residues (of 216) that align with the NusB sequence annotated in other bacterial genomes, including other species of mycoplasma. The remaining sequence (residues 1–82) possesses similarities to a transposase. As no suitable template was found to build the 3D structure of this part of the protein, only its C-terminal region was modeled.

The three dimensional structures of E. coli NusB [42] (PDB ID: 1EY1) and Aquifex aeolicus NusB [43] (PBD ID: 2JR0) derived from NMR experiments and the crystal structures of NusB from Thermotoga maritime [44] (PDB ID: 1TZT), M. genitalium [45] (PDB ID: 1Q8C), and M. tuberculosis [46] (PDB ID: 1EYV) were used as templates to model YP_287559.

The C-terminal portion of YP_287559 displays a topology composed of only alpha helices. Its structure can be divided into two subdomains, α1-α3 forming the N-terminal region and α4-α7 encompassing the C-terminal subdomain. In the N-terminal region, YP_287559 contains the conserved, positively charged residues Lys83, Arg84, Arg85 and Arg88, forming an arginine-rich motif with a high probability of being the RNA binding site of this protein. Also, interactions between nucleic acid bases and RNA binding proteins often involve aromatic residues essential for stacking [47]. As found in other NusB proteins, the YP_287559 sequence contains the following aromatic residues: Tyr96, Trp98, Phe101, Tyr114, Phe115, Phe127, Tyr132, Phe134, Trp147, Trp149, Phe168, Phe169, Phe176, Phe186, Phe194, Phe196, Tyr207, Tyr208, and Phe214 (Fig. 2). These amino acids located on the surface of the protein are believed to participate in recognition processes, whereas the remaining residues are probably involved in protein fold stabilization.

Fig. 2
figure 2

The 3D structure of YP_287559. Highlighted in green are α-helices and loops; sticks represent aromatic residues likely involved in substrate recognition

Previous studies have determined that NusB exists as a homodimer in M. tuberculosis (mtuNusB) [46], as a monomer in E. coli (ecoNusB) [42], M. genitalium (mgeNusB) [45], and A. aeolicus (aqNuB) [43], and as a monomer/dimer equilibrium with a preference for the monomeric form [44] in Thermotoga maritima (tmaNusB). We searched the YP_287559 structure for amino acids important for mtuNusB dimerization. However, two key residues in mtuNusB, alanine and phenylalanine, are replaced by serine and tyrosine, respectively, in both M. hyopneumoniae and E. coli. In mtuNusB, the dimer interface overlaps the region involved in RNA binding, which may allow mtuNusB to remain inactive until needed for transcriptional regulation [46].

We concluded that YP_287559 is composed of two domains, one similar to a transposase and the other to NusB. The Ramachandran plot analysis of the model structure from this last region showed that 95.1% of the residues are in favored and allowed regions.

The M. hyopneumoniae habitat is the porcine mucosal surface where amino acids, purines, and pyrimidines are acquired to compensate for the lack of important metabolic pathways. Studies suggested that, in mycoplasmas, genes involved in replication, transcription and translation are constitutively expressed in constant environments, eliminating the need for sophisticated genetic control mechanisms [1]. Moreover, M. hyopneumoniae has only one annotated sigma factor, RpoD [3], a key regulator of bacterial transcription initiation that is responsible for promoter recognition and melting [48]. However, the −35 regions of M. hyopneumoniae promoters have low sequence conservation, suggesting the presence of more than one sigma factor to respond rapidly to environmental changes.

In our structure-based analysis, we found similarities between the YP_288024 structure and the crystal structures of Rhodobacter sphaeroides SigE [49] (PDB ID: 2Z2S) and the flagellar Sigma-28 of A. aeolicus [50] (PDB ID: 1RP3). These similarities could indicate that mycoplasmas have a regulatory system not yet identified by traditional tools. Although gene expression in mycoplasma is not well characterized, recent work investigating transcriptional changes has shown that M. hyopneumoniae regulates its genes in response to environmental changes [5154], and 93% of its intergenic regions are transcribed [55].

The sequence alignment of the sigma -70 family revealed the conservation of four regions, divided into subregions. Highly conserved among all members of this family are subregions two and four that compose the sigma factor binding site for the −10 and −35 promoter elements [56]. Conserved only in a highly related sigma factor, subregion one is apparently involved in an antagonistic DNA-binding activity. Subregion three is absent from YP_288024 and from extracytoplasmic function sigma factors that allow bacteria to adapt rapidly to environmental changes. Furthermore, subregion three of extracytoplasmic function sigma factors interacts with the −10 element of promoters lacking a −35 element.

The structural alignment between these proteins showed the complete lack of α-helices four and five and a portion of α-helix six corresponding to the subregion three. All the other α-helices are conserved in YP_288024, suggesting their interaction with the −10 and −35 promoter elements. This functional prediction was based on a model where 96.2% of the residues lie in the most favorable and allowed regions.

High homology to protein with unknown function

The hypothetical protein YP_287971 exhibited structural homology to YlxR from S. pneumoniae [57] (PDB ID: 1G2R), a small protein with unknown function, although the YlxR gene is probably in an operon with the other well-studied genes nusA, infB, and rbfA. The protein encoded by rbfA (RbfA) binds to the 30S ribosomal subunit, perhaps promoting subunit maturation [58]. Crucial for translation initiation, IF2 (the product of infB) also functions by binding the 30S subunit [59]. NusA is a highly conserved, essential elongation factor that binds RNA polymerase as part of the transcriptional antitermination complex in many organisms [60]. The YlxR-containing operon has also been studied in E. coli and B. subtilis [61]. The latter presents two additional genes (Ylx-R and Ylx-Q) between nusA and infB; this order was not found in E. coli nor in M. hyopneumoniae wherein these genes are adjacent.

The 3D structure of YP_287971 showed a similar topology to YlxR of S. pneumoniae. Besides a short 310-helix, no regular secondary structure was found in the N-terminal region. The central core of the model was comprised of three antiparallel β-strands followed by two α-helices, one of which bends at Lys61. The YP_287971 sequence also possesses highly conserved residues, such as the GRGA(Y/W) motif present in the hydrophobic core together with Val10, Leu20, Leu24, Ile32, Ile47, Phe63 and Leu79. At the protein surface several positively charged residues are conserved (Arg6, Arg22, Asp27, Arg43, Lys60, Lys61 and Arg65), forming a patch typical of nucleic acid-binding proteins, as shown in Fig. 3. This region is proposed to be related in YlxR function, which may involve an RNA-binding activity found in proteins encoded by the genes in the nusA/infB operon [57].

Fig. 3
figure 3

Probable nucleotide binding site of YP_287971. The electrostatic potential surface distribution shows an extensive positively charged region (blue) typical of nucleic acid-binding proteins

YP_287971 is probably a member of a highly conserved family (DUF448) of unknown function, distributed in many organisms, including 14 species of mycoplasmas for which complete genome sequences are available. The stereochemical quality of YP_287971 was evaluated, resulting in 93.3% of the residues located in favored regions and 6.7% in additional allowed regions of the Ramachandran plot. Because it is of high quality and shows a significant structural resemblance to YlxR of S. pneumoniae, the model suggests the same function for YP_287971 and YlxR, and it will aid in the design of future experiments to verify the function.

Finally, the YP_288034 protein showed structural similarities to the crystal structure of YrdC from E. coli [62] (PDB ID: 1HRU). Members of the yrdC family code for proteins that fold into a single domain, as in the case of 1HRU, or as a domain in proteins implicated in regulation process. YP_288034 is probably an example of the latter because its alignment with E. coli YrdC involves only 164 amino acids out of the YP_288034 total of 287 residues. Searching for homologs within mycoplasmas, we observed that this protein clusters with a Sua5-like translation factor found in six other species. Thus, YP_288034 constitutes a two-domain protein containing a YrdC domain as found in E. coli and in Sua5 members such as that from Saccharomyces cerevisiae.

The function of E. coli YrdC is unknown, but its crystal structure suggested that it possesses a double-stranded RNA-binding capacity [62]. The Sua5 protein, containing an YrdC homolog domain in yeast, has been implicated in the re-initiation of translation [63]. This function is consistent with the large concave surface of Sua5; this surface has a positive electrostatic potential akin to that of the YrdC binding surface, which resembles other nucleic acid-binding proteins. The geometry of our model shows 96.6% of the residues in the most favored and additionally allowed regions of the Ramachandran plot.

Conclusions

One of the key challenges in the post-genomic era is the prediction of function for proteins annotated as hypothetical proteins. A combination of bioinformatic tools, focused not only on sequence analysis but also on structural information, guided us to suggest functions for seven hypothetical proteins in the M. hyopneumoniae genome. NadD, NAPRTase and FAD synthetase involved in metabolic processes; NusB and SigE in transcription; and for YrdC and YlxR, no conclusive functions were assigned; however, the results obtained helped us design rational experimental strategies for future works. Our results suggest that this structure-based approach provides significant improvements to domain and function prediction, especially for minimal genomes having poorly annotated metabolic pathways. Mycoplasma metabolism requires an adequate annotation of its proteome, and our results fill significant gaps in this annotation. Each target protein used in this work was approached from a unique perspective, taking into account the genomic localization/organization of its open reading frame, its conserved structural features, and any biological evidence available in the literature, even if such evidence was for remote homologs. The annotation of each target required an intense effort. However, our results proved to be important for both structural and biochemical genomics.