In silico characterization of hypothetical proteins obtained from Mycobacterium tuberculosis H37Rv

  • Utkarsh Raj
  • Aman Kumar Sharma
  • Imlimaong Aier
  • Pritish Kumar VaradwajEmail author
Original Article


Tuberculosis is one of the oldest diseases with a death rate of 1.5 million per year. Tuberculosis spreads from one person to another through Mycobacterium tuberculosis. This bacteria belongs to the family Mycobacteriaceae, genus Mycobacterium, member of the tuberculosis complex. Mycobacterium tuberculosis is an acid-fast, aerobic, rod-shaped bacteria, ranging from 2 to 4 Â µm in length and 0.2 to 0.5 Â µm in width. Tuberculosis spreads through infected people via sneezing, coughing, etc., with humans acting as the host for the bacteria. The genome of Mycobacterium tuberculosis H37Rv encodes 3906 proteins, of which 1055 are hypothetical proteins (HPs), wherein the functions of the proteins are unknown. The sequences of 1055 HPs of Mycobacterium tuberculosis were analyzed and the functions of 578 HPs were subsequently predicted with a high level of confidence. Several enzymes, transporters and binding proteins of 1055 HPs in M. tuberculosis were analyzed and potential targets were discovered which contribute to the overall survival of the bacteria. The analysis will be of relevance in understanding the mechanism of the bacteria and will prove to be beneficial in the discovery of new drugs.


Mycobacterium Hypothetical proteins Functional annotation ROC analysis 



The authors acknowledge the Department of Applied Sciences, Indian Institute of Information Technology, Allahabad, for providing computing facility.

Supplementary material

13721_2017_147_MOESM1_ESM.docx (141 kb)
S1 Table List of predicted physicochemical properties of 1055 HPs from Mycobacterium Tuberculosis (DOCX 140 kb)
13721_2017_147_MOESM2_ESM.docx (87 kb)
S2 Table List of predicted subcellular localizations of 1055 HPs from Mycobacterium Tuberculosis (DOCX 86 kb)
13721_2017_147_MOESM3_ESM.docx (93 kb)
S3 Table List of predicted results of HMMER, Blast and INTERPROSCAN for 1055 HPs from Mycobacterium Tuberculosis (DOCX 93 kb)
13721_2017_147_MOESM4_ESM.docx (71 kb)
S4 Table List of predicted results of SUPERFAMILY and Pfam for 1055 HPs from Mycobacterium Tuberculosis (DOCX 70 kb)
13721_2017_147_MOESM5_ESM.docx (55 kb)
S5 Table List of predicted virulence factors from 1055 HPs from Mycobacterium Tuberculosis by using VICMPred and Virulentpred (DOCX 54 kb)
13721_2017_147_MOESM6_ESM.docx (45 kb)
S6 Table List of annotated function of 100 proteins with known function from Mycobacterium Tuberculosis using BLASTp, HMMER, INTERPROSCAN, SUPERFAMILY and Pfam for ROC analysis. (DOCX 44 kb)
13721_2017_147_MOESM7_ESM.docx (42 kb)
S7 Table Functionally annotated HPs from Mycobacterium Tuberculosis with high level of confidence (DOCX 41 kb)
13721_2017_147_MOESM8_ESM.docx (16 kb)
S8 Table List of more virulent HPs of 578 HPs of Mycobacterium Tuberculosis with high level of confidence (DOCX 16 kb)


  1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410CrossRefGoogle Scholar
  2. Cantoni GL (1952) The nature of the active methyl donor formed enzymatically from l-methionine and adenosinetriphosphate1, 2. J Am Chem Soc 74(11):2942–2943CrossRefGoogle Scholar
  3. Chowdhury L, Khan MI, Deb K, Kamal S (2016) MetaG: a graph-based metagenomic gene analysis for big DNA data. Netw Model Anal Health Inform Bioinform 5(1):1–16CrossRefGoogle Scholar
  4. Emanuelsson O, Brunak S, von Heijne G, Nielsen H (2007) Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc 2(4):953–971CrossRefGoogle Scholar
  5. Fu LM, Fu-Liu CS (2002) Is Mycobacterium tuberculosis a closer relative to Gram-positive or Gram-negative bacterial pathogens? Tuberculosis 82(2):85–90CrossRefGoogle Scholar
  6. Garg A, Gupta D (2008) VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens. BMC Bioinform 9(1):1CrossRefGoogle Scholar
  7. Gasteiger E, Gattiker A, Hoogland C, Ivanyi I, Appel RD, Bairoch A (2003) ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res 31(13):3784–3788CrossRefGoogle Scholar
  8. Geider K, Hoffmann-Berling H (1981) Proteins controlling the helical structure of DNA. Annu Rev Biochem 50(1):233–260CrossRefGoogle Scholar
  9. Gough J, Karplus K, Hughey R, Chothia C (2001) Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol 313(4):903–919CrossRefGoogle Scholar
  10. Gupta P, Raj U, Varadwaj PK (2015) Computational intelligence in data mining. In: Jain LC, Behera HS, Mandal JK, Mohapatra DP (eds) Proceedings of the International Conference on CIDM, 20–21 December 2014, vol. 1, pp 447–458. Springer, IndiaGoogle Scholar
  11. Gupta S, Singh Y, Kumar H, Raj U, Rao AR, Varadwaj PK. (2016) Identification of novel abiotic stress proteins in Triticum aestivum through functional annotation of hypothetical proteins. Interdiscip Sci Comput Life Sci. doi: 10.1007/s12539-016-0178-3
  12. Ioerger TR, Feng Y, Ganesula K, Chen X, Dobos KM, Fortune S, Jacobs WR, Mizrahi V, Parish T, Rubin E, Sassetti C (2010) Variation among genome sequences of H37Rv strains of Mycobacterium tuberculosis from multiple laboratories. J Bacteriol 192(14):3645–3653CrossRefGoogle Scholar
  13. Kamal MS, Nimmy SF (2016) StrucBreak: a computational framework for structural break detection in DNA sequences. Interdiscip Sci Comput Life Sci. doi: 10.1007/s12539-016-0158-7
  14. Kamal MS, Nimmy SF, Parvin S (2016a) Performance evaluation comparison for detecting DNA structural break through big data analysis. Comput Syst Sci Eng 31:1–15Google Scholar
  15. Kamal S, Ripon SH, Dey N, Ashour AS, Santhi V (2016b) A MapReduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset. Comput Methods Programs Biomed 131:191–206CrossRefGoogle Scholar
  16. Krogh A, Larsson B, Von Heijne G, Sonnhammer EL (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305(3):567–580CrossRefGoogle Scholar
  17. Kumar A, Shanmughavel P (2011) Computational annotation for hypothetical proteins of Mycobacterium tuberculosis. J Comput Sci Syst Biol 1:050–062Google Scholar
  18. Latchman DS (1997) Transcription factors: an overview. Int J Biochem Cell Biol 29(12):1305–1312CrossRefGoogle Scholar
  19. Mazandu GK, Mulder NJ (2012) Functional prediction and analysis of Mycobacterium tuberculosis hypothetical proteins. Int J Mol Sci 13(6):7283–7302CrossRefGoogle Scholar
  20. Mishra S, Raj U, Varadwaj PK (2016) Annotation of hypothetical proteins: a functional genomics approach. In: Computational Biology and Bioinformatics, vol. 2016, CRC Press, pp 135–158. doi: 10.1201/b20026-10
  21. Munksgaard B (2004) Mycobacterium tuberculosis. Am J Transplant 4(Suppl 10):37–41Google Scholar
  22. Nambi S, Long JE, Mishra BB, Baker R, Murphy KC, Olive AJ, Nguyen HP, Shaffer SA, Sassetti CM (2015) The oxidative stress network of Mycobacterium tuberculosis reveals coordination between radical detoxification systems. Cell Host Microbe 17(6):829–837CrossRefGoogle Scholar
  23. Nancy YY, Wagner JR, Laird MR, Melli G, Rey S, Lo R, Dao P, Sahinalp SC, Ester M, Foster LJ, Brinkman FS (2010) PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics 26(13):1608–1615CrossRefGoogle Scholar
  24. Noda-García L, Juárez-Vázquez AL, Ávila-Arcos MC, Verduzco-Castro EA, Montero-Morán G, Gaytán P, Carrillo-Tripp M, Barona-Gómez F (2015) Insights into the evolution of enzyme substrate promiscuity after the discovery of (βα) 8 isomerase evolutionary intermediates from a diverse metagenome. BMC Evol Biol 15(1):1CrossRefGoogle Scholar
  25. Pemberton LF, Paschal BM (2005) Mechanisms of receptor-mediated nuclear import and nuclear export. Traffic 6(3):187–198CrossRefGoogle Scholar
  26. Rai S, Raj U, Tichkule S, Kumar H, Mishra S, Sharma N, Buddham R, Raghav D, Varadwaj PK (2016) Recent trends in in-silico drug discovery. Int J Comput Biol (IJCB) 5(1):54–76Google Scholar
  27. Raj U, Kumar H, KumarVaradwaj P (2015a) 3D structure generation, molecular dynamics and docking studies of IRHOM2 protein involved in cancer & rheumatoid arthritis. Curr Comput Aided Drug Des 11(4):325–335CrossRefGoogle Scholar
  28. Raj U, Kumar H, Gupta S, Varadwaj PK (2015b) Novel DOT1L receptor natural inhibitors involved in mixed lineage leukemia: a virtual screening, molecular docking and dynamics simulation study. Asian Pac J Cancer Prev 16(9):3817–3825CrossRefGoogle Scholar
  29. Raj U, Kumar H, Gupta S, Varadwaj PK (2015c) Exploring dual inhibitors for STAT1 and STAT5 receptors utilizing virtual screening and dynamics simulation validation. J Biomol Struct Dyn 20:1–5Google Scholar
  30. Raj U, Kumar H, Varadwaj PK (2016) Molecular docking and dynamics simulation study of flavonoids as BET bromodomain inhibitors. J Biomol Struct Dyn 5:1–2Google Scholar
  31. Ripon SH, Kamal S, Hossain S, Dey N (2016) theoretical analysis of different classifiers under reduction rough data set: a brief proposal. Int J Rough Sets Data Anal (IJRSDA) 3(3):1–20CrossRefGoogle Scholar
  32. Saha S, Raghava GPS (2006) VICMpred: an SVM-based method for the prediction of functional proteins of Gram-negative bacteria using amino acid patterns and composition. Genom Proteom Bioinform 4(1):42–47CrossRefGoogle Scholar
  33. Söding J, Biegert A, Lupas AN (2005) The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 33(suppl 2):W244–W248CrossRefGoogle Scholar
  34. Yu CS, Chen YC, Lu CH, Hwang JK (2006) Prediction of protein subcellular localization. Proteins: Struct, Funct, Bioinf 64(3):643–651CrossRefGoogle Scholar
  35. Zhang Y, Rowley JD (2006) Chromatin structural elements and chromosomal translocations in leukemia. DNA Repair 5(9):1282–1297CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Wien 2017

Authors and Affiliations

  • Utkarsh Raj
    • 1
  • Aman Kumar Sharma
    • 2
  • Imlimaong Aier
    • 1
  • Pritish Kumar Varadwaj
    • 1
    Email author
  1. 1.Department of Applied SciencesIndian Institute of Information Technology-AllahabadAllahabadIndia
  2. 2.Department of Applied ChemistrySardar Vallabhbhai National Institute of TechnologySuratIndia

Personalised recommendations