Investigation of DNA discontinuity for detecting tuberculosis

  • Sonia Farhana Nimmy
  • Md. Golam Sarowar
  • Nilanjan Dey
  • Amira S. AshourEmail author
  • K. C. Santosh
Original Research


Discontinuity in long Deoxyribonucleic Acid (DNA) sequences creates harmful diseases. Changes in the DNA structure refers to changes in the human immunity system. Tuberculosis is a critical disease that causes coughing, fatigue, unintentional weight loss and fever on aged people due to the disorder in the DNA. Breaks or mutations over long DNA sequences are the pivotal reasons for this fatal disease. This study developed an automated machine learning technique to assess the total number of such breaks in the long DNA sequences. Data cleansing and deep neural network techniques are applied to handle this big data. The National Center for Biotechnology Information (NCBI) database has been used to extract the amino acid sequences for Tuberculosis disease from the big DNA datasets. Results reveal that the proposed automated approach is significantly effective for the determination of DNA sequence breaks for the tuberculosis diseases due to the high sensitivity of Markov chain as well as the effective normalization techniques. This approach fixed the size of the training datasets and recursively divide the whole dataset into certain length. The study also adopts multiple predictions approaches, such as the hidden Markov chain, Box-Cox transformation and linear transformation to forecast about the breaks for any long positions of the training and testing datasets. The results demonstrated that hidden the Markov chain model provided faster analysis with more accurate and reliable results.


Hypothetical proteins Deep neural network Hidden Markov model Tuberculosis Normalization 



  1. Anandakumar S, Shanmughavel P (2008a) Computational annotation for hypothetical proteins of Mycobacterium tuberculosis, J ComputSciSystBiol 1:050–062. Google Scholar
  2. Anandakumar S, Shanmughavel P (2008b) Computational annotation for hypothetical proteins of mycobacterium tuberculosis, J Comput Sci Syst Biol 641046, JCSB/Vol. 1, TamilNaduGoogle Scholar
  3. Barik MR et al (2018) Normalised quantitative polymerase chain reaction for diagnosis of tuberculosis-associated uveitis. Tuberculosis 110:30–35CrossRefGoogle Scholar
  4. Berman H, Henrick K, Nakamura H, Markley JL (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res 35:D301–D303CrossRefGoogle Scholar
  5. Bibicu D, Moraru L, Biswas A (2013) Thyroid nodule recognition based on feature selection and pixel classification methods. J Digit Imaging 26(1):119–128. CrossRefGoogle Scholar
  6. Box GEP, Cox DR (1964) An analysis of transformations. J R Stat Soc B 26:211–252zbMATHGoogle Scholar
  7. Burkett KM, McNeney B, Graham J (2016) Sampletrees and Rsampletrees: sampling gene genealogies conditional on SNP genotype data. Bioinformatics 32(10):1568–1570CrossRefGoogle Scholar
  8. Canaan S, Sulzenbacher G, Zamboni V, Calvo LS, Frassinetti F, Maurin D, Cambillau C, Bourne Y (2005) Crystal structure of the conserved hypothetical protein Rv1155 from Mycobacterium tuberculosis. FEBS Lett 579:215–221. (ISSN 0014-5793)CrossRefGoogle Scholar
  9. Cavalcante RG, Patil S, Weymouth TE, Bendinskas KG, Karnovsky A, Maureen A (2016) Sartor ConceptMetab: exploring relationships among metabolite sets to identify links among biomedical concepts. Bioinformatics 32(10):1536–1543CrossRefGoogle Scholar
  10. Debasree S, Piya P, Abhirupa G, Sudipto S (2016) Computational framework for prediction of peptide sequences that may mediate multiple protein interactions in cancer associated hub proteins. PLos One 11(5):e0155911CrossRefGoogle Scholar
  11. Deng L, Yu D (2014) Deep learning: methods and applications (PDF). Found Trends Signal Process 7(3–4):1–199Google Scholar
  12. Deng SP, Zhu L, Huang DS (2016) Predicting hub genes associated with cervical cancer through gene co-expression networks. IEEE/ACM Trans Comput Biol Bioinform 13:27–35CrossRefGoogle Scholar
  13. Desalegn D (2017) Factors affecting tuberculosis case detection in Kersa District, South West Ethiopia. J Clin Tuber Other Mycobact Dis 9:1–4. (ISSN 2405–5794)CrossRefGoogle Scholar
  14. Dhulekar N, Ray S, Yuan D, Baskaran A, Oztan B, Larsen M, Yene B (2016) Prediction of growth factor-dependent cleft formation during branching morphogenesis using a dynamic graph-based growth model. IEEE/ACM Trans Comput Biol Bioinform 13:350–363CrossRefGoogle Scholar
  15. Doerks T, van Noort V, Minguez P, Bork P (2012a) Annotation of the M. tuberculosis hypothetical orfeome: adding functional information to more than half of the uncharacterized proteins. PLoS One 7(4):e34302. CrossRefGoogle Scholar
  16. Doerks T, Noort VV, Minguez P, Bork P (2012b) Annotation of the M. tuberculosis hypothetical orfeome: adding functional information to more than half of the uncharacterized proteins. PLoS One 7(4):e34302. CrossRefGoogle Scholar
  17. Domínguez JG, Schmidt B (2016) ParDRe: faster parallel duplicated reads removal tool for sequencing studies. Bioinformatics 32(10):1562–1564CrossRefGoogle Scholar
  18. Dong Q, Hu Z (2016) Statistics of visual responses to object stimuli from primate AIT neurons to DNN neurons. arXiv preprint. arXiv:1612.03590
  19. Edelman A, Heller S, Johnsson SL (1994) Index transformation algorithms in a linear algebra framework. IEEE Trans Parallel Distrib Syst 5(12):1302–1309CrossRefGoogle Scholar
  20. Erhan D, Bengio Y, Courville A, Manzagol P, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660MathSciNetzbMATHGoogle Scholar
  21. Fdez JA, Alonso JM (2016) A survey of fuzzy systems software: taxonomy, current research trends and prospects. IEEE Trans Fuzzy Syst 24:40–56CrossRefGoogle Scholar
  22. Fernández-Calleja V, Hernández P, Schvartzman JB, de Lacoba MG, Krimer DB (2017) Differential gene expression analysis by RNA-seq reveals the importance of actin cytoskeletal proteins in erythroleukemia cells. PeerJ 5:e3432CrossRefGoogle Scholar
  23. Hogeweg L, Sánchez CI, Maduskar P, Philipsen R, Story A, Dawson R, Theron G, Dheda K, Peters-Bax L, Van Ginneken B (2015) Automatic detection of tuberculosis in chest radiographs using a combination of textural, focal, and shape abnormality analysis. IEEE Trans Med Imaging 34(12):2429–2442CrossRefGoogle Scholar
  24. Hooda R, Sofat S, Kaur S, Mittal A, Meriaudeau F (2017) Deep-learning: a potential method for tuberculosis detection using chest radiography. In: Signal and image processing applications (ICSIPA), 2017 IEEE International Conference on. IEEE, Piscataway. Google Scholar
  25. Hripcsak G, Knirsch CA, Jain NL, Pablos-Mendez A (1997) Automated tuberculosis detection. J Am Med Inform Assoc 4(5):376–381CrossRefGoogle Scholar
  26. Hsieh SY, Chou YU (2016) A faster cDNA microarray gene expression data classifier for diagnosing diseases. IEEE/ACM Trans Comput Biol Bioinform 13:43–54CrossRefGoogle Scholar
  27. Joshua TB, Laura VC, Nathan CW, Sally AS, Mark NA, Nicholas WA, Benjamin S, Ken OB, Derek JR (2014) DNA repair pathways and their therapeutic potential in lung cancer. Lung Cancer Manag 3:159–173CrossRefGoogle Scholar
  28. Kamal MS, Nimmy SF (2017) StrucBreak: a computational framework for structural break detection in DNA sequences. Interdiscip Sci Comput Life Sci 9(4):512–527CrossRefGoogle Scholar
  29. Kamal MS, Sarowar MG, Dey N, Ashour AS, Ripon SH, Panigrahi BK, Tavares JMR (2017) Self-organizing mapping based swarm intelligence for secondary and tertiary proteins classification. Int J Mach Learn Cyber. Google Scholar
  30. Kant S, Srivastava MM (2018) Towards Automated Tuberculosis detection using Deep Learning, eprint arXiv:1801.07080, Computer Science—Computer Vision and Pattern Recognition, 2018 arXiv:180107080KGoogle Scholar
  31. Kumar K, Prakash A, Anjum F, Islam A, Ahmad F, Hassan MI (2015) Structure-based functional annotation of hypothetical proteins from Candida dubliniensis: a quest for potential drug targets. 3 Biotech 5(4):561–576. CrossRefGoogle Scholar
  32. Kumar A, Sharma A, Kaur G, Makkar P, Kaur J (2016), Functional characterization of hypothetical proteins of Mycobacterium tuberculosis with possible esterase/lipase signature: a cumulative in silico and in vitro approach,
  33. Lawn SD (2015) Advances in diagnostic assays for tuberculosis. Cold Spring Harbor Perspect Med 5(12):a017806. CrossRefGoogle Scholar
  34. Li X, Jin X, Wang H, Zhang X, Lin Z (2016) Structure, evolution, and comparative genomics of tetraploid cotton based on a high-density genetic linkage map. DNA Res 23(3):283–293CrossRefGoogle Scholar
  35. Liao S, Tammaro M, Yan H (2016) The structure of ends determines the pathway choice and Mre11 nuclease dependency of DNA double-strand break repair. Nucleic Acids Res 15Google Scholar
  36. Lin Y, Zhang H, Zhu N, Wang X, Han Y, Chen M, Jiang J, Si S (2018) Identification of TB-E12 as a novel FtsZ inhibitor with anti-tuberculosis activity. Tuberculosis 110:79–85CrossRefGoogle Scholar
  37. Liu Y, Zhao M (2016) lnCaNet: pan-cancer co-expression network for human lncRNA and cancer genes. Bioinformatics 32(10):1595–1597CrossRefGoogle Scholar
  38. Machado MR, Pantano S (2016), SIRAH tools: mapping, backmapping and visualization of coarse-grained models. Bioinformatics 32(10):1568–1570CrossRefGoogle Scholar
  39. Mazandu GK, Mulder NJ (2012) Function prediction and analysis of Mycobacterium tuberculosis hypothetical proteins. Int J Mol Sci 13(6):7283–7302. CrossRefGoogle Scholar
  40. Melendez J, Sánchez CI, Philipsen RHHM, Maduskar P, Dawson R, Theron G, Dheda K, van Ginneken B (2016) An automated tuberculosis screening strategy combining X-ray-based computer-aided detection and clinical information. Sci Rep 6:25265. CrossRefGoogle Scholar
  41. Meyer MJ, Geske P, Haiyuan Y (2016) BISQUE: locus- and variant-specific conversion of genomic, transcriptomic and proteomic database identifiers. Bioinformatics 32(10):1598–2000CrossRefGoogle Scholar
  42. ”Mycobacterium tuberculosis”. Sanger Institute. 2007-03-29. Retrieved 2008-11-16Google Scholar
  43. Nahid P, Kim PS, Evans CA, Alland D, Barer M, Diefenbach J, Swindells S (2012) Clinical research and development of tuberculosis diagnostics: moving from silos to synergy. J Infect Dis 205(Suppl 2):S159–S168. CrossRefGoogle Scholar
  44. Nicolau I, Ling D, Tian L, Lienhardt C, Pai M (2012) Research questions and priorities for tuberculosis: a survey of published systematic reviews and meta-analyses. PLoS One 7(7):e42479. CrossRefGoogle Scholar
  45. O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B, Robbertse B, White BS, Ako-Adjei D, Astashyn A, Badretdin A, Bao Y, Blinkova Y, Brover V, Chetvernin V, Choi J, Cox E, Ermolaeva O, Farrell CM, Goldfarb T, Gupta T, Haft D, Hatcher E, Hlavina W, Joardar VS, Kodali VK, Wenjun L, Donna M, Patrick M, Kelly M, Mc MRM, O’Neill K, Shashikant P, Sanjida HR, Daniel R, Riddick LD, Conrad S, Andrei S, Susan SS, Hanzhen S, Francoise TN, Igor T, Raymond ET, Anjana RV, Craig W, Wendy DW, Melissa W, AviKimchi JL, Tatiana T, DiCuccio M, Paul K, Terence DM, Kim DP (2016) Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation, Nucleic Acids Res, 4(44):D733–D745 (Database issue)CrossRefGoogle Scholar
  46. Palacios A, Sanchez L, Couso I (2016) An extension of the FURIA classification algorithm to low quality data through fuzzy rankings and its application to the early diagnosis of dyslexia. Neurocomputing 176:60–71CrossRefGoogle Scholar
  47. Rabiner LH, Juang BH (1986) An introduction to hidden Markov models, IEEE ASSp MagazineGoogle Scholar
  48. Rivera-Borroto OM, García-de la Vega JM, Marrero-Ponce Y, Grau R (2016) Relational agreement measures for similarity searching of cheminformatic data sets. IEEE/ACM Trans Comput Biol Bioinf (TCBB) 13(1):158–167CrossRefGoogle Scholar
  49. Robertson BD, Altmann D, Barry C, Bishai B, Cole S, Dick T, Duncan K, Dye C, Ehrt S, Esmail H, Flynn J (2012) Detection and treatment of subclinical tuberculosis. Tuberculosis 92(6):447–452CrossRefGoogle Scholar
  50. Rodolfo A, Shirolkar A, Fraze C, Stout DA (2011) Characterization of myocardium muscle biostructure using first order features. Dig J Nanomater Biostruct 6(3):1357–1363 (Published: JUL-SEP)Google Scholar
  51. Sáez JA, Galar M, Luengo J, Herrera F (2016), INFFC: an iterative class noise filter based on the fusion of classifiers with noise sensitivity control. Inf Fusion 27:505–636CrossRefGoogle Scholar
  52. Sáez JA, Luengo J, Herrera F (2016), Evaluating the classifier behavior with noisy data considering performance and robustness: the equalized loss of accuracy measure. Neurocomputing 176:26–35CrossRefGoogle Scholar
  53. Sancho-Asensio A, Orriols-Puig A, Casillas J (2016) Evolving association streams. Inf Sci. 334–335:250–272CrossRefGoogle Scholar
  54. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117CrossRefGoogle Scholar
  55. Schwertman P, Bekker-Jensen S, Mailand N (2016) Regulation of DNA double-strand break repair by ubiquitin and ubiquitin-like modifiers. Nat Rev Mol Cell Biol 17:379–394CrossRefGoogle Scholar
  56. Shi S, Lin N, Zhang Y, Huang C, Liu L, Lu B, Cheng J (2013) Research on Markov property analysis of driving cycle. In: IEEE vehicle power and propulsion conference (VPPC), Beijing, pp 1–5Google Scholar
  57. Sivashankari S, Shanmughavel P (2006) Functional annotation of hypothetical proteins—a review. Bioinformation 1(8):335–338CrossRefGoogle Scholar
  58. Weng J, Ahuja N, Huang TS (1997), “Learning recognition and segmentation of 3-D objects from 2-D images. In: Proceedings of 4th International Conference Computer Vision, Berlin, Germany, pp. 121–128Google Scholar
  59. WHO (2009) Global tuberculosis control: a short update to the ReportGoogle Scholar
  60. Yafei L, Li Q (2016) A semi-parametric statistical model for integrating gene expression profiles across different platforms. BMC Bioinform 17(5)Google Scholar
  61. Youyou Z et al (2016) Long noncoding RNA LINP1 regulates repair of DNA double-strand breaks in triple-negative breast cancer. Nat Struct Mol BiolGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringNotre Dame UniversityDhakaBangladesh
  2. 2.Department of Computer Science and EngineeringEast West UniversityDhakaBangladesh
  3. 3.Department of Information TechnologyTechno India College of TechnologyKolkataIndia
  4. 4.Department of Electronics and Electrical Communications Engineering, Faculty of EngineeringTanta UniversityTantaEgypt
  5. 5.Department of Computer ScienceUniversity of South Dakota (USD)VermillionUSA

Personalised recommendations