Comparative analysis of amino acid sequences from mesophiles and thermophiles in respective of carbon–nitrogen hydrolase family

A comparative study of amino acid sequence and physicochemical properties indicates the affiliation of protein from the nitrilase/cyanide hydratase family. This family contains nitrilases that break carbon–nitrogen bonds and appear to be involved in the reduction of organic nitrogen compounds and ammonia production. They all have distinct substrate specificity and include nitrilase, cyanide hydratases, aliphatic amidases, beta-alanine synthase, and a few other proteins with unknown molecular function. These sequences were analyzed for different physical and chemical properties and to relate these observed differences to the thermostability properties, phylogenetic tree construction and the evolutionary relationship among them. In this work, in silico analysis of amino acid sequences of mesophilic (15) and thermophilic (archaea, 15 and bacteria, 15) proteins has been done. The physiochemical properties of these three groups of nitrilase/cyanide hydratase family also differ in number of amino acids, molecular weight, pI values, positively charged ions, i.e. Arg + Lys, aliphatic index and grand average of hydropathacity (GRAVY). The amino acid Ala (1.37-fold) was found to be higher in mesophilic bacteria as compared to thermophilic bacteria but Lys and Phe were found to be significantly high (1.43 and 1.39-fold, respectively) in case of thermophilic bacteria. The amino acids Ala, Cys, Gln, His and Thr were found to be significantly higher (1.41, 1.6, 1.77, 1.44 and 1.29-fold, respectively) in mesophilic bacteria as compared to thermophilic archaea, where Glu, Leu and Val were found significantly high (1.22, 1.19 and 1.26-fold, respectively).


Introduction
On the basis of structure and sequence analysis, new family of enzyme, termed as nitrilase/cyanide hydratase, was constructed (Brenner 2002) that includes nitrilase cyanide hydratase and cyanide dihydratase, which also incorporated the less closely related aliphatic amidases (Novo et al. 1995). This family is part of a larger group of related proteins, which have been termed CN-hydrolases (Bork and Koonin 1994) or more recently as the nitrilase superfamily (Pace and Brenner 2001). Plants, animals and fungi perform a wide variety of non-peptide carbon-nitrogen hydrolysis reactions using enzymes of the nitrilase superfamily (Pace and Brenner 2001). These nitrilase and amidase reactions (Ambler et al. 1987;Bork and Koonin 1994;Pace and Brenner 2001) produce auxin, biotin, b-alanine and other natural products, and which result in deamination of protein and amino acid substrates, all involve attack of a cyano or carbonyl carbon by a conserved cysteine (Stevenson et al. 1990;Pace and Brenner 2001). Many bacteria and archaea, particularly those with an ecological relationship to plants and animals harbor members of the nitrilase superfamily and utilize the enzymes for chemically similar nitrile or amide hydrolysis reactions or for condensation of acyl chains to polypeptide amino termini (Pace and Brenner 2001). The nitrilase superfamily consists of thiol enzymes involved in natural product biosynthesis and posttranslational modification in plants, animals, fungi and certain prokaryotes. On the basis of sequence similarity and the presence of additional domains, the superfamily is classified into 13 branches, although the substrate specificity is known for only nine branches (Brenner 2002). Only branch one has nitrilase or cyanide hydratase activity, and eight of the remaining branches have amidase or amide condensation activities (Brenner 2002). Genetic and biochemical analysis of the family members and their associated domains helps in predicting the localization, specificity and cell biology of hundreds of uncharacterized protein (Pace and Brenner 2001).
The proteins show significant similarities at the amino acid and protein structure level but the enzymes show many differences in catalytic capability. Nitrilases, while catalyzing the hydration of nitrile to the corresponding acid, vary widely in substrate specificity. Cyanide dihydratase and cyanide hydratase employ inorganic cyanide as the only efficient substrate but produce acid and amide products, respectively. The similarities of all these enzymes at the amino acid level but the functional differences between them provide a platform for the study of structure/function relationships in this industrially important group of enzymes (O'Reilly and Turner 2003).
Cyanide and nitrile hydrolyzing enzymes have been studied in a wide range of microbial species, plants and animal systems. The enzymatic conversion of inorganic cyanide/nitrile to the corresponding acid can take place by a one-step process as exemplified by nitrilases and cyanide dihydratases or by a two-step process with an amide intermediate as is the case with nitrile hydratases and cyanide hydratases. Cyanide hydratase, although functionally different, shows no relationship to the more functionally similar nitrile hydratase (Wang and VanEtten 1992;Cluness et al. 1993). They have cyanide-hydrating activity but the enzymes differ in the product produced or in substrate specificity. Cyanide dihydratase and cyanide hydratase enzymes show high specificity for inorganic cyanide showing very little activity with nitriles, while nitrilases in general show activity with a broad range of nitrile substrates. Nitrilases and cyanide dihydratase produce mainly an acid product, while cyanide hydratase produces the amide product from inorganic cyanide. The nitrilases are important for their potential application in biotransformation particularly for the production of fine chemicals for the pharmaceutical industry (Kobayashi and Shimizu 2000;Banerjee et al. 2002), while inorganic cyanide-hydrating enzymes have application in the bioremediation of cyanide bearing waste (Dubey and Holmes 1995;O'Reilly and Turner 2003). Nitrilase-related sequences are also found in phylogenetically isolated prokaryotes that appear to have an ecological relationship to plants and animals. The nitrilase superfamily therefore probably emerged prior to the separation of plants, animals and fungi, radiated into families, and then spread laterally to bacteria and archaea. Some branches of the nitrilase superfamily are found only in prokaryotes; members of these branches may constitute rational antibiotic targets (Pace and Brenner 2001).
A number of physiochemical properties, e.g. number of amino acid residues, molecular mass, theoretical pI, amino acid composition, negatively charged residues (Asp ? Glu), positively charged residues (Arg ? Lys), atomic composition, total number of atoms, extinction coefficients (M -1 cm -1 ) at 280 nm, instability index, aliphatic index, grand average hydropathicity (GRAVY), etc. of enzymes greatly influence their applications and need to be carefully studied. These properties can be either determined experimentally or deduced from the in silico analysis of amino acid sequences of enzymes available in the databases. Latter approach seems to be attractive for comparison of large number of proteins/enzymes provided the amino acid sequences are available. In the present study, we report some physiochemical properties of proteins from nitrilase/cyanide hydratase family deduced from the in silico analysis of their amino acid sequences and also constructed the phylogenetic tree for their evolutionary relation.

Data collection and analysis
Information about the affinity for protein from nitrilase/cyanide hydratase family of some microorganisms was obtained from the National Centre for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov/protein) and from the NCBI Bioproject (http://www.ncbi.nlm.nih.gov/bioproject/). Amino acid sequences for all the forty five microorganisms having experimentally proved substrate specificity as well as complete protein sequences which are not fragmented, pseudo, putative or hypothetical (Tables 1, 2, 3). The amino acid sequences of nitrilase/cyanide hydratase family were downloaded from the ExPASy proteomic server. Physiochemical data were generated from the SwissProt and Expert Protein Analysis System (ExPASy) that is the proteomic server of Swiss Institute of Bioinformatics (SIB). FASTA format of sequences were used for analysis.

Sequence alignment and dendrogram construction
The program Clustal X (Larkin et al. 2007) was used for multiple sequence alignment; Phylip-69 was used for dendrogram construction by neighbor-joining (NJ) method. The dendrogram was edited by Dendroscope (Huson et al. 2007).
Deduction of physiochemical parameters generation using online tools Various tools in the proteomic server (ProtParam, Protein calculator, Compute pI/Mw, ProtScale) were applied to calculate/deduce different physiochemical properties of amidases from the protein sequences (Kyte and Doolittle 1982). The molecular weights (kDa) of these sequences were calculated by the addition of average isotopic masses of amino acid in the protein and deducting the average isotopic mass of one water molecule. The pI was calculated using pK values of amino acid (Bjellqvist et al. 1993). The atomic composition of these sequences was derived using the ProtParam tool, available at ExPASy. The extinction The values of aliphatic index of various sequences were obtained using ProtParam (ExPASy) tool (Kyte and Doolittle 1982). The instability index and grand average of hydropathicity (GRAVY) were estimated following the method of Guruprasad et al. (1990) and Kyte and Doolittle (1982), respectively. The number of amino acids was calculated with the help of ProtParam tool of the exposure proteomic server submitted as raw sequence in the fasta format.

Statistical analysis
Various parameters were calculated using statistical package 'Assistat version-7.6 beta 2012' for the p value regarding the same. An analysis of variance (ANOVA) or one way ANOVA follows the rule of null hypothesis which implies the data to be homogenous. F test was used to determine the statistical significance. When significant effects were detected, a Tukey test was applied for all pairwise comparisons of mean responses. F test helps to calculate the means to the variance within the samples, whereas T test or the Tukey test covers at least two groups taken with equal set or the homogenous set of data with equal number of samples.

Phylogenetic tree construction
In the present study to visualize the evolutionary relationship between the bacterial and archaeal sources from the protein sequences belonging to nitrilase/cyanide hydratase family, a total of forty three protein sequences of nitrilase/ cyanide hydratase family from bacterial and archaeal source organisms were subjected to phylogenetic tree construction revealed three major clusters (Fig. 1). One has only bacteria, another have both bacterial and archaeal species and one with dominance in archaeal strains. However, in second cluster nearly about all bacterial and archaeal species were found together in a corner which indicates the functional as well as structural similarity among these genera according to protein sequence. As far as phylogenetic tree was concerned, the bacterial species of second cluster have some structural similarity with those archaeal species but that showed very distinct relationship with respect to bootstrap values. The archaea are presently recognized as one of the two main domains of prokaryotes Woese 1987). The majority of genes that indicate archaea to be different from eubacteria are for             information transfer processes such as DNA replication, transcription and translation (Olsen et al. 1994;Rivera et al. 1999), and these processes are of fundamental importance. It has been assumed that these differences arose in the universal ancestor before the separation of these two domains. Woese (1998) and Kandler (1998) have suggested that these two domains as well as the eukaryotic cells evolved from a pre-cellular community containing different types of genes by a process that led to fixation of specific subsets of genes in the ancestors of these domains. These pre-cellular entities are postulated to have no stable genealogy or chromosome and also lacking a typical cell membrane, thus allowing unrestricted lateral gene transfers (Woese 1998;Kandler 1998). According to these proposals, all differences between archaea and bacteria originated at a pre-cellular stage by non-Darwinian means, but they suggest no rationale as to how or why the observed differences between these two groups arose or evolved. Cavalier-Smith (2002) has suggested the possibility of archaea evolving from Gram-positive bacteria as an adaptation to hyperthermophile or hyperacidity, but it does not explain how various differences in the information transfer genes which distinguish archaea from bacteria arose.

Physiochemical parameter analysis
After finding the evolutionary relationships among these sequences, the attempts to find differences between the physiochemical properties of forty five amino acid sequences of mesophilic bacteria (15), thermophilic bacteria (15) and thermophilic archaea (15) from nitrilase/ cyanide hydratase family have been done (Tables 4, 5, 6). The comparison of mesophilic and thermophilic bacteria for the sequences from nitrilase/cyanide hydratase family has been done and the total number of amino acid residues, molecular weight, theoretical pI and negatively charged residues (Asp ? Gln) in these sequences differed substantially as mesophilic bacteria have more number of amino acid residues ranging between 240 and 345 amino acids whereas thermophilic bacteria ranging between 229 and 291. The molecular weight and negatively charged residues (Asp ? Gln) in nitrilase/cyanide hydratase sequences of mesophilic bacteria were found to be insignificantly high as compared to the thermophilic bacteria (1.08, 1.12-fold, respectively). Theoretical pI varied between 5.06 and 8.75 in case of nitrilase/cyanide hydratase of mesophilic bacteria and it was found to be 5.44 and 9.68 for nitrilase/cyanide hydratase sequences of thermophilic bacteria. It was further found that the average pI value of thermophilic bacteria was significantly higher than that of mesophilic bacterial nitrilases/cyanide hydratases (1.2-fold). The aliphatic index has the significant  (Russell et al. 1997;Jaenicke and Bohm 1998;Ladenstein and Antranikian 1998). The importance of electrostatic interactions (Goldman 1995;Hennig et al. 1995;Xiao and Honig 1999), increased compactness, shortening of loops, increased hydrophobicity and decreased flexibility of a-helical segments and subunit interfaces (Kelly et al. 1993;Russell et al. 1997) have been proposed as important factors conferring thermal stability. All these studies suggest that in thermophilic proteins, stability is achieved through cooperative optimization of several subtle factors rather than any one predominant interaction. Significant differences were found between the nitrilase/ cyanide hydratase sequences of mesophilic bacteria and thermophilic archaea for various physicochemical parameters like number of amino acid residues, molecular weight, positively charged residues (Arg ? Lys), aliphatic index and GRAVY. The mesophilic bacterial nitrilase/ cyanide hydratase sequences have significant number of amino acid residues as compared to nitrilase/cyanide hydratase sequences of thermophilic archaea (1.09-fold). The molecular weight of nitrilases/cyanide hydratases of mesophilic bacteria was found to be insignificantly high as compared to the thermophilic archaea (1.06-fold). The GRAVY of nitrilase/cyanide hydratase sequences from mesophilic bacteria was found to be significantly high (1.64-fold) as compared to thermophilic archaea. The positively charged residues (Arg ? Lys) and aliphatic index values were found to be higher (1.13, 1.16-fold, respectively) in thermophilic archaeal nitrilases/cyanide hydratases as compared to mesophilic bacteria. A statistical analysis shows that the aliphatic index, which is defined as the relative volume of a protein occupied by aliphatic side chains (alanine, valine, isoleucine, and leucine), of proteins of thermophilic bacteria is significantly higher than that of ordinary proteins. The index may be regarded as a positive factor for the increase of thermostability of globular proteins (Atsushi 1980). Due to diversity of 20 amino acids, and to the incredible number of combinations they offer, proteins differ widely in physicochemical properties as well as in substrate specificity (Sharma et al. 2009). The result of this study has also confirmed that amino acid number and their percent composition in sequences belonging to nitrilase/cyanide hydratase family significantly affect the substrate specificity. Several investigators have focused on the problem of the molecular basis of protein thermostability. A number of physicochemical properties have been attributed to the greater stability of the thermophilic proteins (Jaenicke and Bohm 1998;Ladenstein and Antranikian 1998). These families have an entire spectrum, containing proteins from moderately thermophilic to hyperthermophilic organisms and their mesophilic homologs. Not all the differences observed between the thermophilic and mesophilic proteins are due to thermostability. Results of amino acid analysis of three groups of sequences from nitrilase/cyanide hydrates family are shown in Tables 7, 8 and 9. These enzymes contained all 20 common amino acids. The comparison of the amino acid composition of nitrilases/cyanide hydratases of the mesophilic and thermophilic bacteria has shown that Ala, one of the simplest amino acid, was found to be the predominant residue in mesophilic bacteria and Lys and Phe in thermophilic bacteria. The amino acid Gln (1.4-fold) was observed to be significantly high in thermophilic bacterial nitrilases/cyanide hydratases and the amino acid Val (1.29fold) was found to be higher in thermophilic archaeal nitrilases/cyanide hydratases. The comparison of the amino acid residues in mesophilic bacteria and thermophilic archaea has also been done and the amino acid Cys is considered to be an important parameter in the calculation of extinction co-efficient of proteins (Sharma et al. 2009) and its content was 1.6 fold higher in mesophilic bacteria as compared to thermophilic archaea. The amino acids Ala, Gln, His and Thr were (1.41, 1.77, 1.44 and 1.29) significantly higher in mesophilic bacteria, while the amino acids, Glu, Leu and Val (1.22, 1.19 and 1.26) were higher in thermophilic archaea.
Analysis of the amino acid composition of helices in thermophilic proteins appears to indicate that a number of Gly residues are enhanced as compared to those of mesophilic proteins (Warren and Petsko 1995). Some workers found that the decreased Gln content may minimize deamidation which results in increased thermostability of proteins. It has also been suggested that Lys ? Arg and Ser ? Ala are the most frequent mutations in mesophilic to thermophilic substitutions (Arias and Argos 1989). Ala is the best helix-forming residue (Kumar and Bansal 1998;Best et al. 2012), however, the decreased Ala content in thermophilic proteins is still unknown. The most significant observation in the present analysis was that the number of Glu and Lys residues was increased in thermophiles in comparison with mesophiles. The juxtaposition of these residues is perhaps important in imparting thermal stability (Parthasarathy and Murthy 2000). These residues may be appropriate candidates for site-specific mutations leading to enhanced stability.

Conclusion
A number of physicochemical properties of amino acid sequences belonging to nitrilase/cyanide hydratase family from mesophiles and thermophiles have been deduced. They mainly differ in the total number of amino acid, molecular weight, pI, negatively and positively charged residues, aliphatic index, GRAVY and composition of amino acids. The presence of Ala, Gln, His and Thr in mesophilic organisms and the amino acids, Glu, Leu and Val in thermophilic organisms clearly indicate them to be in mesophiles and thermophiles, respectively. As discriminating thermophilic proteins from their mesophilic counterparts is a challenging task, the results of the present work will be quite useful in prediction and selection of the nitrilase/cyanide hydratases for further basic and applied research and it would also help in designing stable proteins.