Structure-based function analysis of putative conserved proteins with isomerase activity from Haemophilus influenzae

Haemophilus influenzae, a Gram-negative bacterium and a member of the family Pasteurellaceae, causes chronic bronchitis, bacteremia, meningitis, etc. The H. influenzae is the first organism whose genome was completely sequenced and annotated. Here, we have extensively analyzed the genome of H. influenzae using available proteins structure and function analysis tools. The objective of this analysis is to assign a precise function to hypothetical proteins (HPs) whose functions are not determined so far. Function prediction of these proteins is helpful in precise understanding of mechanisms of pathogenesis and biochemical pathways important for selecting novel therapeutic target. After an extensive analysis of H. Influenzae genome we have found 13 HPs showing high level of sequence and structural similarity to the enzyme isomerase. Consequently, the structures of HPs have been modeled and analyzed to determine their precise functions. We found these HPs are alanine racemase, lysine 2, 3-aminomutase, topoisomerase DNA-binding C4 zinc finger, pseudouridine synthase B, C and E (Rlu B, C and E), hydroxypyruvate isomerase, nucleoside-diphosphate-sugar epimerase, amidophosphoribosyltransferase, aldose-1-epimerase, tautomerase/MIF, Xylose isomerase-like, have TIM barrel domain and sedoheptulose-7-phosphate isomerase like activity, signifying their corresponding functions in the H. influenzae. This work provides a better understanding of the role HPs with isomerase activities in the survival and pathogenesis of H. influenzae. Electronic supplementary material The online version of this article (doi:10.1007/s13205-014-0274-1) contains supplementary material, which is available to authorized users.


Introduction
Haemophilus influenzae, a member of family Pasteurellaceae, is a non-motile Gram-negative bacterium (Kuhnert, 2008). It is an obligatory human parasite which causes meningitis, sinusitis, epiglottitis, chronic bronchitis and community acquired pneumonia (Apisarnthanarak and Mundy 2005;Eldika and Sethi 2006). The genome of H. influenzae was successfully sequenced (Fleischmann et al. 1995) which revealed 1,740 protein-coding genes, 2 transfer RNA genes, and 18 other RNA genes in a 1.83 Mb single circular chromosome (Fleischmann et al. 1995). H. influenzae requires b-nicotinamide adenine dinucleotide and heme-related compounds for its growth (Markel et al. 2007; Morton et al. 2004a). Hence, it uses numerous mechanisms to obtain heme (Stojiljkovic and Perkins-Balding 2002) using various heme acquisition proteins like Hup protein (Morton et al. 2004b) and HbpA lipoprotein (Morton et al. 2005). It is also evident that the periplasmic iron-binding protein, FbpA (ferric-ion-binding protein A), plays an essential role in procurement of iron from transferrin in H. influenzae (Khun et al. 1998;Kirby et al. 1997). This shows that iron is important for its survival and virulence (Morton et al. 2004a). Furthermore, there is a strict regulation of iron homeostasis in H. influenzae as indicated by the mechanism for heme acquisition in the organism.
H. influenzae strains comprise high antibiotic resistance, including multidrug resistance to ampicillin and chloramphenicol, make the treatment of meningitis and chronic pneumonia more complex (Campos 2001;Pfeifer et al. 2013;Saha et al. 2008). The antibiotic resistance in H. influenzae was strongly associated with the presence of large conjugative plasmids (Leaves et al. 2000). The antibiotic resistances in H. influenzae occur due to various mechanisms which can affect the empirical treatment of infections (Jorgensen, 1991;Kostyanev and Sechanova, 2012;Tristram et al. 2007). There is an increasing prevalence of resistance to antibiotics like aminopenicillins, macrolides, tetracyclines and fluoroquinolones. This is a major associated problem (Jorgensen, 1991;Kostyanev and Sechanova, 2012;Tristram et al. 2007). An extensive genome analysis of the organism may be helpful to find novel drug targets against multidrug-resistant strains.
Analysis of 102 bacterial genomes of the genomic consortium reflects that 45,110 proteins are prearranged in 7,853 orthologous groups with unknown function (Doerks et al. 2004). These proteins are considered as a ''conserved hypothetical proteins (HPs)'', i.e., proteins that have not been functionally characterized and described at biochemical and physiological level in organisms (Galperin and Koonin 2004). The HPs are supposed to be the products of pseudogenes in majority of organisms and comprise a wide fraction of their proteomes (Desler et al. 2012;Galperin 2001). The species-specific phenotypic properties such as pathogenicity in a given organism can be determined by analyzing unique sequences of HPs because these determinants are assumed to be the potent drug targets in pathogenic strains of organisms (Tsoka and Ouzounis 2000). The significance of functional characterization HP can further be understood by recent functional annotation of formerly uncharacterized tRNA modification enzymes (Alexandrov et al. 2002;Jackman et al. 2003;Soma et al. 2003) of the deoxyxylulose pathway (Eisenreich et al. 2001) that plays a central role in cyclic diguanylate bacterial signaling (Galperin 2004;Jenal 2004). We have been working in the area of structure-based rational drug design hence we are searching a novel therapeutic target in pathogenic organism (Hassan et al. 2007a, b;Thakur et al. 2013a). We have successfully annotated the function of HPs from pathogenic organism both at sequence and structure levels (Kumar et al. 2014a, b;Shahbaaz et al. 2014;Sinha et al. 2014).
The biological function cannot be predicted by comparison of sequence similarity alone (Illergard et al. 2009). Structure-based function prediction is often considered as a better tool in comparison to the sequence-based methods. Because in most cases the evolution retains a conserved folding pattern despite of very poor sequence similarity (Hassan and Ahmad 2011;Hassan et al. 2008Hassan et al. , 2013Illergard et al. 2009). Furthermore, identification of binding motifs and catalytic sites is critical for a protein function, which can easily be predicted from the available protein structure (Shapiro and Harris 2000;Singh et al. 2014). Moreover, the process of structure-based rational drug design is completely based on the structural features of a protein molecule (Capdeville et al. 2002;Klebe 2000;Tasleem et al. 2014;Thakur et al. 2013b). Hence, structure analysis of HPs is central to strengthen the process of biological function prediction and development of better therapeutics intervention for the treatment of diseases associated with the pathogen.
Earlier, we have successfully predicted lyases from the same organism (Shahbaaz et al. 2014). Here, extensive sequence analysis of H. influenzae, we identified 13 HPs that possess isomerase-like activity, are listed in Table 1. The isomerase enzymes are directly associated with virulence (Reffuveille et al. 2012;Ren et al. 2005) because these enzymes provide a favorable local environment to pathogens in the host for their growth (Bjornson 1984). It was reported that the enzyme Ess1 prolyl isomerase plays an important role in the pathogenesis of fungi Cryptococcus neoformans (Ren et al. 2005). Isomerases play important role in the generation of resistance against b-lactam antibiotics (Reffuveille et al. 2012). Phosphomannose isomerase is involved in the Leishmania pathogenesis. All these evidences suggest that sequence and structure analysis of isomerase enzymes will be helpful for the better understanding of a precise function of these enzymes and will open a new promising target for structure-based rational drug design.

Sequence retrieval
Extensive analysis of H. influenzae genome shows 1,657 proteins which are encoded by its genome (http://www. ncbi.nlm.nih.gov/genome/?term=haemophilus?influenzae). We have already characterized 429 proteins as HP in H. influenzae and their FASTA sequences were retrieved from UniProt (http://www.uniprot.org/) using the ''Gene ID'' (Shahbaaz et al. 2013). After sequence analysis, we classified all 429 HPs into various classes using the information available in the publically available databases like PDB, Pfam, etc. (Shahbaaz et al. 2013). Here, we selected HPs with isomerase activity for further structure analysis. All tools used in this study are listed in the  Table S1.

Sequence analysis
We used several bioinformatics tools such as PSORTb (Yu et al. 2010b), PSLpred (Bhasin et al. 2005) and CELLO (Yu et al. 2006) to identify the subcellular localization of HPs. Furthermore, we also analyzed the presence of signal peptide using SignalP 4.1 (Emanuelsson et al. 2007) and to identify non-classical secretory pathway protein we used SecretomeP (Bendtsen et al. 2005). To characterize a protein to be a membrane protein, the online servers TMHMM (Krogh et al. 2001) and HMMTOP (Tusnady and Simon 2001) were used. Conserved sequence patterns in protein families were used for the prediction of the functions of HPs (Chen and Jeong 2000). The BLASTp (Altschul et al. 1990) and HHpred (Soding et al. 2005) were used for remote homology detection against various available protein databases such as PDB (Bernstein et al. 1978), SCOP (Hubbard et al. 1999) and CATH (Sillitoe et al. 2013). We further performed domain analysis of proteins for more precise function prediction of HPs [47]. The databases such as Pfam (Punta et al. 2011), PANTHER (Mi et al. 2005), SMART (Letunic et al. 2012), SUPER-FAMILY (Gough et al. 2001), CATH (Sillitoe et al. 2012), CDART (Geer et al. 2002), SYSTERS (Meinel et al. 2005), ProtoNet (Rappoport et al. 2011) and SVMProt (Cai et al. 2003) were used for precise domain annotation in HPs. Similarly, instead of direct sequence similarity, we also used domain architecture and profile-based methods like CDART and SMART for similarity search. The annotation of signature protein sequences was performed using the program MOTIF (Kanehisa 1997) and InterProScan (Quevillon et al. 2005). For the identification of motif sequence, we used MEME suite (Bailey et al. 2009). In addition, we also performed virulence factor prediction using VICMpred (Saha and Raghava 2006) and Virulentpred (Garg and Gupta 2008), since virulence factors are considered as potential drug/vaccine targets (Baron and Coombes 2007). We also acknowledge the importance of understanding the protein function using the information of protein-protein interactions. Therefore, to predict the interaction partners of HPs we used STRING (version-9.05) (Szklarczyk et al. 2011a(Szklarczyk et al. , 2011b.

Structure prediction
For modeling three-dimensional structure of HPs, we used two classes of structure prediction methodologies (Baker and Sali, 2001) (i.e., threading/comparative modeling and de novo or ab initio methods). The MODELLER (Eswar et al. 2006) module of Discovery Studio 3.5 (Accelrys 2013), I-TASSER (Roy et al. 2010) and ROBETTA server (Kim et al. 2004) were used for prediction of a reliable structure of HPs. We used the homology modeling (Marti-Renom et al. 2000) for structure prediction of those HPs where the sequence identity is [30 % between the target and template sequences. We, first, identified templates using sequence similarity search methods like PSI-BLAST (Altschul et al. 1997) present in Discovery Studio 3.5 (Accelrys 2013) for identification of potential templates in protein data bank (PDB). We also used the fold recognition methods like HHpred (Soding et al. 2005) for template identification. The template and query sequences were aligned and used for modeling HPs structures in MOD-ELLER (Eswar et al. 2006).
In case of sequence identity \30 %, we used ab initio modeling protocols for predicting the structure. The I-TASSER (Roy et al. 2010) server uses ab initio algorithms, first generates three-dimensional (3D) atomic models from multiple threading alignments and iterative structural assembly simulations. It inferred function of the HPs using the structural matching of the 3D models with other known proteins and produced outputs contain fulllength tertiary as well as secondary structure predictions, ligand-binding sites, Enzyme Commission (EC) numbers, etc. (Roy et al. 2010).
Similarly, ROBETTA server (Kim et al. 2004) also uses ab initio or de novo methods to predict the structure of proteins whose structural analogs do not exist in the PDB. First, it uses the alignment method, called K*Sync, to align the query sequence onto the parent structure. Then it models variable regions by allowing them to explore conformational space with fragments in a fashion similar to the de novo protocol in context of the template. Second, when no structural homolog is accessible, server modeled the domains using Rosetta de novo protocol (Misura et al. 2006), which allows the full length of the domain to explore conformational space via fragment inclusion, generating a sizeable decoy collection from which the concluding models are chosen.
The resulting models are optimized and then energy minimization was carried out using CHARM-22 from Accelrys Discovery Studio 3.5 and the steepest descent algorithm of GROMOS from Deepview (Kaplan and Littlejohn 2001). We further refined the predicted models of HPs using a side chain refinement protocol of Discovery studio 3.5 using force fields, like CHARMM (Brooks et al. 2009), and backbone-dependent rotamer library of SCWRL4 (Krivov et al. 2009) predicts positions of the side chains which are used for refinement of predicted protein structures. The loop refinement protocol of MODELLER (Eswar et al. 2006) is also used for improving the quality of predicted models.

Structure validation
The quality of predicted HPs models were analyzed on SAVES server (Structural Analysis and Verification Server). The modeled protein structures are validated using PROCHECK (Laskowski et al. 1996), WHAT_CHECK (Hooft et al. 1996;Vriend 1990), ERRAT (Colovos and Yeates 1993), VERIFY_3D (Eisenberg et al. 1997;Luthy et al. 1992) and PROVE (Pontius et al. 1996) services present in SAVES server. PROCHECK validated the stereo-chemical quality of a protein structure by analyzing the overall structure and residue-by-residue geometry of the protein. Similarly, WHAT_CHECK also analyzes the stereo-chemical parameters of the residues in HPs. The ERRAT server of UCLA (University of California, Los Angeles) verifies the structures HPs by performing the statistical analysis of the patterns of non-bonded atomic interactions. Further, VERIFY_3D provides a visual analysis of the quality of HPs structures by determining the compatibility of predicted model of HP with its own primary structure. The PyMOL (DeLano 2002), a molecular graphics system, is used for visualization of protein structure and for calculating the r.m.s. deviation between the target HP and the template.

Structure analysis
Structure similarity is more consistent than sequence similarity (Taylor and Orengo 1989). Since the structures of homologous proteins are more conserved than their sequences (Chothia and Lesk 1986). We used varieties of protein structure analysis tools for the prediction of function of HPs. CASP (Critical Assessment of protein Structure Prediction) contains firestar (Lopez et al. 2011), COACH (Yang et al. 2013), COFACTOR (Roy et al. 2012), 3DLigandSite (Wass et al. 2010), TM-SITE (Yang et al. 2013) and S-SITE (Yang et al. 2013), which were used for predicting catalytic and ligand-binding residues in protein sequences. We also used information available in literature about the templates used in protein modeling to identify the catalytic residues in HPs. Furthermore, active pocket sites in the predicted structures of HPs were identified using POCASA (Yu et al. 2010) and Pocket-Finder (Laurie and Jackson, 2005) servers. The PPM server (Lomize et al. 2012) was used for calculating spatial positions in membranes of HPs. The ProFunc (Laskowski et al. 2005) web server was used for structurebased function annotation and for predicting structural motifs associated with catalytic functions. Function predictions of HPs are also complimented by DALI server that compares the target structure with known structure submitted in PDB. The secondary structure elements are computed from atomic resolution protein structures of HPs using the STRIDE web server (Heinig and Frishman 2004).

HP P44506
HP P44506 is localized in the cytoplasm and devoid of signal peptide and transmembrane helix (Table S2). Sequence analysis reveals that this HP is having alanine racemase activity (Table S3 and S4). The MEME suite discovered three sequence-based motifs in the HP namely, 151 0 -ENLPHLCLRGLM, 209 0 -PSAIKCGSTMV, 76 0 -EWHFIG ( Table 2). The virulence factor analysis shows that HP P44506 is a virulent protein according to Viru-lentPred and a metabolism molecule according to VICMpred (Table S3). The functional protein association networks predicted by the String (Szklarczyk et al. 2011a, b) indicates that HP P44506 shows close interaction with holliday junction resolvase-like protein, pyrroline-5-carboxylate reductase, coproporphyrinogen III oxidase, cell division protein FtsZ, putative deoxyribonucleotide triphosphate pyrophosphatase, homoserine O-acetyltransferase, phosphatase and cell division protein according to STRING analysis (Szklarczyk et al. 2011a, b).
The sequence of HP P44506 was also annotated in the Unirpot database. We found that pyridoxal 5 0 -phosphate (PLP) is a cofactor for this protein, clearly indicated its role in the pyridoxal 5 0 -phosphate binding. It is interesting to note that sequence similarities searches showed that HP P44506 belongs to the uncharacterised protein family UPF0001, which is primarily involved in the biosynthesis of amino acids and amino acid-derived metabolites. Finally, family and domain database search analysis clearly indicates that HP P44506 containing N-terminal alanine racemase domain, PLP-binding barrel, belongs to racemases and epimerases and actis on amino acids and derivatives.
Three-dimensional structure of P44506 was predicted by MODELLER. HP P44506, which shows a sequence homology of 61 and 57 % with templates, PLP-binding protein (PDB ID-1W8G) and pyridoxal phosphate-binding protein (PDB ID-3SY1), respectively. The energy of minimized structure was validated showing 99.5 % of residues in the allowed region of the Ramachandran plot (Ramachandran et al. 1963) (Table 3). The root mean square deviation (RMSD) of the predicted model with that of templates 1W8G, 3SY1 and 4A3Q was 0.223 Å 2 , 0.243 Å 2 and 3.997 Å 2 , respectively (Table 3), indicating a close functionality. The TM score of HP model with 1W8G, 3SY1 and 4A3Q is 0.6229, 0.5130 and 0.2651, respectively, showing that 1W8G and 3SY1 belong to the fold which is similar to that of P44506 (Table 3). Structure comparison and analysis revealed that P44506 contains (a/ b) 8 TIM barrel at the N terminus (Fig. 1a), a characteristics of carrying a phosphate-binding site. The overall structure of P44506 contains ten a-helices, three 3 10 helices and eight b-strands forming the characteristic TIM barrel. This prediction is complimented using various binding site prediction servers. The structure also shows the presence of isolated b bridge at ILE138 (Fig. 1b). The P44506 TIM barrel domain contains eight b-strands (b1-b8) with characteristic PLP (pyridoxal-5-phosphate) binding site at Lys35 identified by structure similarity with the templates (Table S5). Further, Pocket-Finder analysis shows that the active site cavity may contain Lys35, Asn56, Tyr57, Gln235 and Asn236 (Fig. 1c).
The DALI server shows high structure similarity of P44506 with proteins with functionality of alanine racemase (Table S6). We observed a significant match with lysine-preferred racemases (Z score = 20.9), alanine racemase (Z score = 20.8), etc. The aligned residues are usually in the range of 221-628 with RMSD in the range of 0.3-3.1 Å 2 , and similarity usually ranges from 12 to 62 %. We also observed a close structural similarity to D-serine dehydratase. Furthermore, ProFunc (Table S6) server revealed eight motifs in the InterPro (Mulder et al. 2002) database with pyridoxal 5 0 -phosphate-dependent enzyme motif. An extensive sequence and structure analyses strongly suggest that HP P44506 is a PLP-dependent alanine racemase. Alanine racemase is a PLP-dependent enzyme which is important for bacterial cell wall biosynthesis in which it catalyzes the inter-conversion of alanine enantiomers (Noda et al. 2004).
The sequence of HP P44641 was also annotated in the Unirpot database to explore its possible function. We found that HP P44641 is annotated as an enzyme L-lysine 2,3aminomutase which produces (R)-beta-lysine from (S)alpha-lysine (L-lysine). This protein has several cofactor binding sites including [4Fe-4S] cluster and PLP-binding motif. Family and domain database search analysis indicates that HP P44641 belongs to the radical sam   (Frey et al. 2008). The members of this family participate in more than 40 distinct biochemical transformations, and most of the members are not characterized biochemically so far. GO analysis suggest that this is a protein which involved in metabolic process, possesses isomerase like catalytic activity, and a metal-binding protein which specifically binds to the 4 iron and 4 sulfur. Structure of P44641 was predicted by MODELLER using lysine-2, 3-aminomutase (PDB ID-2A5H) as template. P44641 shows a sequence similarity of 34 % with 2A5H and TM score of 0.3718. The RMSD value after aligning target and template was found to be 0.241 Å 2 , indicating close structural similarity ( Table 3). The predicted model of HP P44641 is comprised of (a/b) 8 TIM barrel fold (Fig. 2a) containing eight b-strands in the barrel. The overall structure contains twelve a-helices, six 3 10 helices and ten b-strands. P44641 also contains isolated b bridge at Ile24, Val55, Ser90 and Val291 (Fig. 2b). We observed three SAM-binding sites in this HP at Cys121, Cys125 and Cys128 (Table S5). We predicted that the active site residues of P44641 are Cys121, Val123, Cys125, Cys128, Arg130, Arg131 and Ser164 (Fig. 2c).

HP P46494
HP P46494 is predicted to be localized in cytoplasm and periplasm as suggested by PSLpred and CELLO, respectively (Table S2). This protein is secretory in nature but represented in red, green, yellow, blue and pink, respectively (this illustration is applicable for all figures). c Residues present in the active site pocket are illustrated in stick lacks signal peptide and transmembrane helix. The function analysis reveals that the HP P46494 comprises DNA topoisomerase activity (Table S3 and S4). The INTER-PROSCAN and MOTIF tools identified domain with a function of DNA topoisomerase (type IA, Zn finger). This prediction is further confirmed by MEME suite, which identified three signature sequences in P46494, namely 76 0 -FGMFIGCSHYPECDFVV, 1 0 -MNQSLFHH, 115 0 -RRGRQGKIFY a signature sequence for DNA topoisomerase I, a zinc metalloprotein with three repetitive zincbinding domains (Tse-Dinh and Beran-Steed 1988). This protein is non-virulent and involved in cellular processes (Table S3). STRING database suggests several interaction partners such as DNA topoisomerase III, shikimate 5-dehydrogenase, ABC transporter ATP-binding protein, DNA-3-methyladenine glycosylase, DNA processing chain A, recombination regulator RecX, peptide deformylase, methionyl-tRNA formyltransferase and recombinase A. Gene ontology analysis suggests that HP P46494 is involved in the DNA binding and causes a topological change in the DNA; hence, it has type 1 DNA topoisomerase-like activity.
Due to the unavailability of any reliable template in the PDB, we were unable to predict the structure of HP P46494 using homology modeling. Here, we used Robetta server for the prediction of structure of P46494 using the Rosetta de novo protocol. The predicted model shows most of the residue in the allowed region of Ramachandran plot (Table 3). Overall structure is similar to domain II of DNA topoisomerase type I (Champoux 2001) (Fig. 3a). The secondary structure prediction shows that HP P46494 consists of 13 b-strands and single a-helix (Fig. 3b) of seven residues (Leu47, Gln48, Arg49, Ser50, Glu51, His52 and Lys53). Isolated b-bridges are present at Asp42, Cys145, Phe150 and Phe176 ( Figure S4). We observed zinc-binding sites at Cys15, Cys18, Cys35, Cys41, Cys104, Cys107, Cys145 and Cys148 (Table S5). Extensive analysis of P46494 predicted that active site may consist of Cys15, Cys18, Cys35 and Cys41 (Fig. 3c).
We also identified Pro151 is a membrane-embedded residue according to PPM server which calculates rotational and translational positions in a protein structure. The twisting in topoisomerase is essential for its biological activity and Pro151 is one of the essential residues for such conformational changes in this enzyme during catalysis. The structure similarity using DALI server shows a model which is similar to 2GAI only (Z score = 0.4, RMSD = 6.0 Å 2 ) (Table S6). ProFunc has identified three motifs as zf-C4_Topoisom, etc. Further, six ligand-binding templates are also recognized in reference to P46494. These analyses suggest that P46494 is a DNA topoisomerase IA (Zn finger)-like protein. DNA topoisomerase type IA has an exclusive mechanism of strand passage over an enzyme-bridged, ssDNA gate, consequently allowing them to carry out varied reactions in processing structures crucial for replication, recombination and repair ).

HP P44827
HP P44827 is localized in the cytoplasm, lacks any transmembrane helix and is not involved in any secretory pathway (Table S2). HP P44827 contains ribosomal large subunit pseudouridine synthase E activity as suggested by sequence analysis (Table S3 and S4). The MEME suite also predicted a similar function for HP P44827 along with the three annotated motifs 84 0 -VYAAGRLDRDSEGLLIL TNNGELQHRLADPKFKTEKTYWVQVEGI, 51 0 -TKVV LFNK PFDVLTQFTDEQGRATLKD, and 178 0 -WLEIKI-SEGRNRQVRRMTAHIGFP (Table 2). Uniprot has also annotated this HP as ribosomal large subunit pseudouridine synthase E (rluE) which is responsible for synthesis of pseudouridine from uracil-2457 in 23S ribosomal RNA. Such enzymes catalyze the isomerization of specific uridines in an RNA molecule to pseudouridines (5-ribosyluracil, psi). The domain surface is populated by conserved, charged residues that define a likely RNA-binding site. Further, P44827 is involved in metabolism and a non-virulent protein (Table S3). The STRING database suggests that HP P44827 interacts with lipoprotein E, b-hexosaminidase, 23S rRNA pseudouridylate synthase C, adenylosuccinate lyase, transport protein and tRNA-specific 2-thiouridylase MnmA. Three-dimensional structure of P44827 was predicted by MODELLER (Fig. 4a) using pseudouridine synthase Rlu E (PDB ID-2OLW), pseudouridine synthase Rlu E (PDB ID-2OML) and ribosomal small subunit pseudouridine synthase A (PDB ID-1KSK) as templates with sequence identity of 66, 66 and 31 %, respectively ( Table 3). The refined model shows RMSD of 0.233, 0.604 and 1.662 with their templates 2OLW, 2OML and 1KSK, respectively, indicating closer structural and functional similarity. The calculated TM scores between templates and target were found to be 0.73306, 0.74206 and 0.71557, respectively, which further support the functional similarity. Overall structure of HP P44827 adopts an a/b-fold attribute bifurcated, typically antiparallel b-sheet, present in all W synthases. It also contains four conserved helices, i.e., three a-helices and one 3 10helix that group next to the b-sheets (Fig. 4b) with an additional a-helix. We found only three central strands of b-sheet, namely, b2, b3 and b6, instead of four strands  (Fig. 4a). These strands are highly conserved in W synthases, and cleft certainly contains an active site of pseudouridine synthase enzyme. An isolated beta bridges was observed at Met2, Pro24, Ser29, Thr36 and Gly217 (Fig. 4c). The residues Ile231, Leu234, Gln236, Thr237 and Leu240 are found to be embedded in membrane. Active site analysis suggested that Asp91 is essential for function of this enzyme. Further analysis revealed the active site of HP P44827 contains Leu90, Asp91, Ser94, Asn188, Arg189, Arg192 and Leu205 (Fig. 4c).
Structure similarity searches clearly indicates that HP P44827 has a close structure similarity to the small subunit of pseudouridine synthase (Z score = 23.7, RMSD = 3.3 Å 2 ), and hence this protein may possess pseudouridine synthase-like activity. We found a similar structural pattern with six pseudouridine synthase on ProFunc analysis. These observations suggest that HP P44827 may be a pseudouridine synthase E. There are five characterized subfamilies of W synthases in prokaryotes on the basis of sequence conservation (Gustafsson et al. 1996). The pseudouridine synthase RluE is classified as a member of RsuA family (Del Campo et al. 2001) and modifies the single site W2457 on a stem of 23S RNA.

HP Q57151
The sequence analysis showed that the HP Q57151 is localized in cytoplasm and is not involved in secretory mechanisms (Table S2). Sequence-based function analysis clearly indicates that HP Q57151 is a hydroxypyruvate isomerase and a non-virulent protein (Table S3 and S4). We identified three motif repeats in HP Q57151 as 99 0 -CPNVHIM, 71 0 -WGGSAI, 78 0 -DYFHAQ (Table 2). The predicted functional partners for Q57151 are 3-hydroxyisobutyrate dehydrogenase, putative aldolase, glycerate dehydrogenase, glycerol-3-phosphate regulon repressor, gluconate permease, D-xylose transporter subunit XylF and cAMP-regulatory protein indicating its importance for the survival of the organism.
Uniprot annotation suggests that HP Q57151 is a putative hydroxypyruvate isomerase which catalyzes the reversible isomerization between hydroxypyruvate and 2-hydroxy-3-oxopropanoate. Domain annotation suggests that HP Q57151 contains a structural motif with a beta/ alpha TIM barrel which is found in several proteins families including xylose isomerase. Family analysis suggests that HP Q57151 belongs to the hydroxypyruvate isomerase Hyi and possesses hydroxypyruvate isomerase activity.  (Table 3). The overall structure is comprised of a TIM barrel fold (Table 3; Fig. 5a) (Gerlt and Raushel 2003;Wierenga 2001). Overall structure consists of eight ab-fold unit, with eight parallel b-strands located in the interior and eight a-helices on the exterior of the barrel. Instead of (a/b) eightfold we observed seven b-sheets in TIM barrel. Furthermore, two isolated b-bridges are observed at Ser207 and His212 (Fig. 5b). The active sites are located at C terminal end of b strand in ab loops of TIM barrel (Fig. 5a). The manganese-binding sites are located at the C-terminal ends of b-strands. We predicted Glu143, Asp178, Gln204 and Glu240 as important residues for binding (Fig. 5c). These predictions are supported by structure-based active site prediction servers (Table S5).
The predicted structure of HP Q57151 is quite similar to those of D-tagatose 3-epimerase (Z score = 26.1, RMSD = 2.3 Å 2 ), L-ribulose 3-epimerase (Z score = 26.1, RMSD = 2.4 Å 2 ), etc., indicating that this HP may act as an epimerase. Moreover, structure-based function prediction using ProFunc shows this protein may acts as hydroxypyruvate isomerase, xylose isomerase-like, etc. All these finding suggest that the HP Q57151 is actually hydroxypyruvate isomerase which catalyzes a reversible conversion of hydroxypyruvate from tartronate semialdehyde (de Windt and van der Drift 1980).

HP P44094
HP P44094 is a cytoplasmic, non-virulent and non-secretory protein (Table S2). We observed that HP P44094 contains a nucleoside-diphosphate-sugar epimerase domain (Table S3 and S4). MEME suite analysis suggests the presence of three significant motifs in the sequence of Q57151 namely 149 0 -MCELLINDYSRKGFVDGIVVRLP TICIRPGKPNKAASSFVSSIMREPLHG, 55 0 -CPVSEE and 291 0 -QALALGFKV (Table 2). STRING analysis suggests that gluconate permease, putative aldolase, 3-hydroxyisobutyrate dehydrogenase and glycerol-3-phosphate regulon repressor are the functional network partner of HP P44094. Sequence similarities search suggest that HP P44094 belongs to the NAD(P)-dependent epimerase/ dehydratase family. However, a detail annotation of this HP is not available at the Uniprot.
Structure of HP P44094 was modeled using nucleosidediphosphate-sugar epimerase (PDB ID-2HRZ) as  (Table 3). The overall structure of HP P44094 contains 12 b-strands, 13 a-helices and two 3 10 helices (Fig. 6a). There are two isolated b-bridges at Ile131 and Ile287 (Fig. 6b). We observed an N-terminal NADbinding Rossmann-fold domain which spans over b1-b7 and a1-a8. Active site prediction analysis shows that Tyr143 is responsible for the activity of HP P44094 (Table  S5). The active site may contain Val79, Ser80, Ser119, Leu120, Tyr143, Leu170, Pro171, Thr172, Ser185 and Trp283 (Fig. 6c). The Leu232 and Pro233 are found to be membrane-embedded residues. The structure similarity analysis shows high similarity with NDP-sugar epimerases with z score in the range 32.8-33.4 and RMSD of 2.6 Å 2 . Further analysis shows the presence of NAD (P)-binding Rossmann-fold domains and NAD-dependent epimerase/ dehydratase activity. On the basis of sequence and structure analyses, we successfully annotated the function of P44094 as nucleoside-diphosphate-sugar epimerase (UDP-glucose 4-epimerase). UDP-glucose 4-epimerase catalyzes the reversible inter-conversion of UDP-glucose and UDP-galactose which results in the formation of glucose-and galactose-containing exopolysaccharides (Dormann and Benning 1998).

HP P45104
HP P45104 is localized in cytoplasm and lacks signal peptide (Table S2). It contains the domain with activity of ribosomal large subunit pseudouridine synthase (Table S3 and S4). The MEME suite analysis shows the presence of three significant motifs namely 176 0 -WIAVGRLDI NTSGLLLFTTDGELANRLMHPSREVEREYSVRV FGQ, 140 0 -CRVLMYYKPEGELCTRSDPEGRATVFD and 256 0 -WYDVTLMEGRNREVRRLWESQGIQ, indicating a functional resemblance with ribosomal large subunit pseudouridine synthase B (Table 2). This protein is also annotated as rluB in the Uniprot database and belongs to the pseudouridine synthase RsuA family. Interaction networking partners of HP P45104 are 23S rRNA pseudouridine synthase D, transcriptional regulator CysB, 23S rRNA pseudouridylate synthase C, tRNA pseudouridine synthase B, GTP-binding protein EngA, 30S ribosomal protein S1 and cytidylate kinase. This also confirms its predicted function.
Here, we used ITASSER server for the prediction of structure of HP P45104. We found 97.8 % residues of P45104 are present in the allowed region of Ramachandran plot. The TM score was found to be 0.66428, indicating the predicted structure contains the similar fold present in Fig. 6 Representation of model structure of HP P44094. a Overall structure of P44094 shown in cartoon model with membrane is represented as non-bonded spheres. b Secondary structure of HP P44094. c Representation of the active site residues of P44094 in stick model ribosomal large subunit pseudouridine synthase F (PDB ID-3DH3). The structure analysis shows 11 a-helices, 13 b-strands and two 3 10 -helix in the structure of HP P45104 (Fig. 7a). The presence of isolated b-bridges is found at Thr36, Leu152, Thr163, Ala178, Lys292 and Arg299 (Fig. 7b). The structure contains an N-terminal S4 domain or a-L RNA-binding motif (77-171) which connects through a linker to catalytic domain (142-309). The active site structure of P45104 adopts mixed a/b fold, which is common in all W synthases. There are eight-stranded antiparallel bifurcated b-sheet flanked by loops. The cleft of the active site is located in the center of the b-sheet in P45104. The active site contain conserved residue Asp183 which is essential for the activity of enzyme (Table S5). We predicted active site residues Gly180, Leu182, Asp183, Tyr213, Arg270, Leu283 and Arg285 in the structure of HP P45104 (Fig. 7c). The P45104 shows Ala99 to be a membrane-embedded residue.
HP P45104 shows a close resemblance with the ribosomal large subunit pseudouridine synthase B and ribosomal large subunit pseudouridine synthase F. These findings are clearly indicating that HP P45104 may functions as a large subunit pseudouridine synthase B. This enzyme catalyzes the conversion of U2605 to

HP P71373
HP P71373 was predicted as a virulent protein localized in the cytoplasm (Table S2). This protein is also a nonsecretory protein and lacks transmembrane helix. The function prediction shows that HP P71373 may be an epimerase amidophosphoribosyltransferase (Table S3 and S4). Motif analysis also suggests the presence of epimerase activity in the HP P71373 (Table 2). HP P71373 was also annotated as epimerase family protein HI_1208 in the uniprot database and belongs to the NAD(P)-dependent epimerase/dehydratase family. The STRING predicts arginine repressor, malate dehydrogenase, ferrochelatase, lipoyltransferase, 2-oxoglutarate dehydrogenase E2 component dihydrolipoamide succinyltransferase and dihydrolipoamide acetyltransferase as functional networking partners.

HP P44160
HP P44160 is a secretory protein present in the cytoplasm (Table S2). There is no transmembrane helix present in the sequence of P44160. The motif and domain analysis suggests that the HP P44160 is an aldose 1-epimerase enzyme which is important for metabolic pathways like glycolysis and gluconeogenesis (Chittori et al. 2007) (Table S3 and  S4). Uniprot annotation has also indicated that HP P44160 is a putative glucose-6-phosphate 1-epimerase which converts a-D-glucose 6-phosphate to b-D-glucose 6-phosphate. Furthermore, GO analysis indicated that this protein is involved in the carbohydrate metabolic process. Interestingly, sequence similarity search also suggest that this HP belongs to the glucose-6-phosphate 1-epimerase family. The HP P44160 is a virulent protein involved in cellular process. It contains three motifs predicted by MEME suite, namely 86 0 -QPAHGT, 75 0 -PICYPW and 29 0 -CGWNTKNF PC ( Table 2). The predicted partners for P44160 are glucose-6-phosphate isomerase, glucose-specific PTS system component, keto-hydroxyglutarate-aldolase/keto-deoxyphosphogluconate aldolase, transaldolase B, deoxyribosephosphate aldolase, transketolase, fructose-bisphosphate aldolase, aldose 1-epimerase and UDP-glucose 4-epimerase, indicating the role HP P44160 in carbohydrate metabolism.
We used MODELLER for structure prediction of HP P44160 using putative mutarotase (PDB ID-2HTA) and hexose-6-phosphate mutarotase (PDB ID-2CIR) as templates. The predicted model shows 99.6 % residues in the allowed region and very high fold similarity with the templates. The structure of HP P44160 adopts a b- Fig. 8 Representation of model structure of HP P71373. a Cartoon model representation of overall structure in which membrane is represented by non-bonded atoms. b Predicted secondary structure. c Representation of the active site residues of HP P71373 in stick model sandwich fold made up of 21 b-strands, one a-helix and three 3 10 -helices (Fig. 9a, b). All 20 b-strands are arranged in three anti-parallel b-sheets in P44160. The three b-sheets are organized in two layers. The first layer consists of two sheets S1 (b1-b5) and S3 (b13-b20), while the other layer contains S2 (b6-b12). The a3 and a4 are present on the same side connecting b18 to b19, while a1 connects b5 to b6 and a2 connects b11 to b12 (Fig. 9a). The active site is a b-D-galactose binding pocket that contains Arg71, Phe81, His89, His151, Tyr153, Asp193, Trp227 and Glu249 (Fig. 9c). The DALI search shows that the structure of HP P44160 is highly similar to those of epimerases like hexose-6-phosphate mutarotase (Z score = 31.6, RMSD = 2.0 Å 2 ), glucose-6-phosphate 1-epimerase (Z score = 31.5, RMSD = 2.0 Å 2 ) etc. Similarly, ProFunc also shows that HP may contain epimerase activity. The aldose 1-epimerase are the enzymes that catalyze the anomeric inter-conversion of aldose sugars like D-glucose, etc., into their a and b forms (Graille et al. 2006).

HP O86237
The HP O86237 is a cytoplasmic protein showing tautomerase/MIF activity (Table S2 and Table 3). GO annotation has also indicated that HP O86237 is involved in the cellular aromatic compound metabolic process and possesses isomerase activity. Interestingly, family and domain database search has also indicated that this HP belongs to the 4-oxalocrotonate_tautomerase family. These predictions were further confirmed by understanding the interaction network of O86237 using STRING database which shows HP O86237 interacts with anthranilate phosphoribosyl transferase, bifunctional indole-3-glycerol phosphate synthase/phosphoribosylanthranilate isomerase, anthranilate synthase component II and anthranilate synthase component I.
The crystal structure of HP O86237 has been determined (PDB id: 1MWW) comprised of a tautomerase MIF fold. HP O86237 shows a close similarity to putative 4-oxalocrotonate tautomerase (PDB ID 4LKB), malonate semialdehyde decarboxylase (PDB ID-3MLC), malonate semialdehyde decarboxylase (PDB ID-4LHP) and macrophage migration inhibitory factor (PDB ID-4DH4). The O86237 shows the presence of three a-helices, four bstrands and three 3 10 -helices (Fig. 10a). We observed a ba-b fold in the predicted model of HP O86237 (Fig. 10b). This fold is a characteristics of tautomerase superfamily which includes members like macrophage migration inhibitory factor (MIF) and D-dopachrome tautomerase. (Almrud et al. 2002). The active site of HP O86237 Fig. 9 Representation of model structure of HP P44160. a Showing characteristic b-sandwich topology. b Detailed description of secondary structure using STRIDE. c The active site pocket is illustrated in stick representation contains Met1, Ile32, Lys36, Met67, Trp109 and Phe111 (Fig. 10c). DALI server further indicates that HP O86237 is structurally similar to the malonate semialdehyde decarboxylase (Z score = 18.0, RMSD = 1.6 Å 2 ), putative tautomerase (Z score = 15.3, RMSD = 1.9 Å 2 ), etc. Moreover, ProFunc analysis suggests that HP O86237 may have tautomerase/MIF function. These findings help us to propose the function of HP O86237 as a tautomerase/MIF, a key regulatory cytokine of innate and adaptive immune responses (Donn and Ray 2004).

HP Q57152
The PSLpred server shows that HP Q57152 is localized in periplasm, while CELLO suggests cytoplasmic localization (Table S2). This is a virulent protein involved in cellular processes with tRNA pseudouridine synthase C activity (Table S3 and S4). Uniprot annotation has also indicated that HP Q57152 is similar to the N-terminal of E. carotovora exoenzyme regulation regulon ORF1 and the C-terminal part is colinear with YqcB. YqcC-like structural domain found in the N-terminal of some tRNA pseudouridine synthase C proteins, as well as other uncharacterised proteins.
Results were validated by STRING which shows interaction network contains putative sulfate transport protein CysZ, penicillin-binding protein 1B, N-acetylmuramic acid-6-phosphate etherase, opacity protein, anhydro-Nacetylmuramic acid kinase, ATP-dependent helicase along with Q57152. These predictions are further validated by MEME suite which identified three sequence-based motifs namely, 51 0 -WVFIPRM, 72 0 -AISPYI and 38 0 -FSIDTM. I-TASSER server was used to predict the structure of HP Q57152 using solution NMR Structure of protein YqcC (PDB ID-2HGK) as a template. Both structures are occupying the similar fold and show close structural similarity (Table 3). We observed 96.9 % of residues are present in the allowed region of Ramachandran plot. Overall structure adopts bromodomain-like fold which has characteristic all a-helix topology (Fig. 11a). Structure of HP Q57152 contains four a-helix and two 3 10helices (Fig. 11b). The 3DLigandSite predicts the active of Q57152 contains Leu21, Trp22, Gln23, Ser44, Ala45, Glu46, Glu47, Ala80 and Met81 (Fig. 11c). Further structure analysis shows that the HP may contains bfructofuranosidase like activity. Function prediction shows variable results indicating HP Q57152 may have multiple functional sites. Subcellular localization of HP P44268 suggests that it is localized in cytoplasm with no transmembrane helix and is not involved in any secretory pathways (Table S2). Sequence-based function predictions show that HP P44268 may possess xylose isomerase activity (Table S3 and S4). Uniprot annotation has also indicated that HP P44268 contains a Xyl_isomerase-like TIM barrel domain and belongs to the UPF0276 family that functionally uncharacterised. We further validated our prediction by analyzing the interaction network of P44268 which primarily includes RNA polymerase sigma factor and phosphate transport regulator. We identified three sequence-based motifs in the HP P44268 are 249 0 -KGTVWD, 99 0 -CEC-EGH and 35 0 -ENWSKM which are helpful in validating the annotation results. We obtained the DUF692 family protein, a domain of unknown function (PDB ID-3BWW) and L-ribulose 3-epimerase (PDB ID-3VYL) as a templates for homology modeling of HP P44268, which are showing a sequence identity of 73 and 50 %, respectively. The model is showing 98.9 % residues in the allowed region showing high fold similarity with the templates ( Table 3). The STRIDE assignment of secondary structure shows that structure of HP P44268 contains 10 b-strands, 12 a-helix and two 3 10 -helices with isolated b-bridges at Met111 and His146 (Fig. 12a). The structure of HP P44268 folds into a TIM alpha/beta-barrel. But TIM barrel of P44268 shows (a/b) 7 topology instead of (a/b) 8 with seven b-strands (Fig. 12b). The active site pocket of HP P44268 contains manganese-binding sites at Glu139, Asp172, Asn175, His204 and Glu272 (Fig. 12c). The protein structure is found similar to those of epimerases like L-ribulose 3-epimerase, xylose isomerase domain protein TIM barrel and D-tagatose 3-epimerase (Table S6), while the ProFunc shows that the protein has xylose isomerase like activity. The xylose isomerase is responsible for the isomerization of the pentoses sugars like methyl pentose and even glucose in the bacterial cells (Sanchez and Smiley 1975).

HP P52606
Sequence analysis of HP P52606 indicates that this protein is localized in the cytoplasm (Table S2). Sequence similarity search and domain analysis show that P51606 may have sedoheptulose 7-phosphate isomerase like activity (Table S3 and S4). Uniprot annotation has also indicated Fig. 13 Representation of model structure of HP P52606. a Illustration of beta sandwich topology of P52606. b Predicted Secondary structural elements in P52606. c Stick representation of active site residues that HP P52606 is involved in carbohydrate metabolic process and acts as isomerase. Furthermore, sequence similarity search suggest that HP P52606 contains a SIS, a phosphosugar-binding domain and belongs to the DiaA subfamily, a DnaA initiator-associating protein DiaA which is required for the timely initiation of chromosomal replication via direct interactions with the DnaA initiator protein, required for DNA replication. We further validated the result using protein-protein interaction analysis that shows that this protein interacted with bifunctional heptose 7-phosphate kinase/heptose 1-phosphate adenyltransferase, antigen, chromosomal replication initiation protein, imidazole glycerol-phosphate dehydratase/histidinol phosphatase and D-heptose 1, 7-bisphosphate phosphatase.
Due to moderate similarity ([30 %) of HP P52606 with the crystal structure of Escherichia coli DiaA (PDB ID-2YVA) and phosphoheptose isomerases (Table 3), we used MODELLER for structure prediction. The predicted model show high value of TM score ([0.85) indicating a close fold similarity with the templates. Similarly, low RMSD value (\0.550) shows high structure similarity of target and templates. The refined model showed 99.4 % residues in the allowed region of Ramachandran plot. The overall structure of HP P52606 contains seven a-helix and five bstrands (Fig. 13a). The predicted structure of P52606 revealed a SIS domain that contains central five-stranded parallel sheet, flanked by seven a-helices that results in three-layered a-b-a sandwich. The a1, a2 and a7 are present on one side of this sandwich and a3, a4, a5, a6 are present on other side (Fig. 13b). Furthermore, the active site is comprised of Val49, Ser50, Arg51, Ser52, Pro118, Leu119, Glu168 (Fig. 13c). The function of P52606 as a sedoheptulose 7-phosphate isomerase was further validated from the DALI and ProFunc servers (Table S6). The sedoheptulose 7-phosphate isomerase catalyzes the isomerization of D-sedoheptulose 7-phosphate into D-glycero-D-manno-heptose 7-phosphate, the first step in the formation of ADP heptose (Taylor et al. 2008).

Conclusions
The isomerases have an important role in the virulence of pathogens such as UDP N-acetylgalactosamine 4-epimerase which is found to be involved in the formation of smooth lipopolysaccharide and is essential for the virulence of mesophilic aeromonas hydrophila serotype O34 (Canals et al. 2006). Similarly, UDP-glucose 4-epimerase encoded by galE gene plays an important part in lipopolysaccharide biosynthesis which is one of the main virulence factors of bacterial pathogens (Fry et al. 2000). Our extensive analysis of structures of 13 isomerases characterized from 429 HPs of H. influenzae is helpful in identification of putative drug targets for better drug design. HP P71373 annotated as nucleoside-diphosphatesugar epimerase. Four HPs were identified as a virulent protein which can be used to better understand the virulence mechanism of H. influenzae and search for a potential target for therapeutic intervention. Isomerases clearly play a central role in the relationship between bacteria and the host. Our structure-based function elucidation provides an insight how microbes interact with the hosts and will contribute significantly to our understanding of both the isomerase molecule and bacterial pathogenesis in the future.