Disease Detection and Identification Using Sequence Data and Information Retrieval Methods

  • Sankranti Joshi
  • Pai M. Radhika
  • Pai M. M. Manohara
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 43)


Current clinical methods base disease detection and identification heavily on the description of symptoms by the patient. This leads to inaccuracy because of the errors that may arise in the quantification of the symptoms and also does not give a complete idea about the presence of any particular disease. The prediction of cellular diseases is still more challenging; for we have no measure on the exact quantity, quality and extremeness. The typical symptoms for these diseases are visible at a later stage allowing the disease to silently progress. This paper provides an efficient and novel way of detection and identification of pancreatitis and breast cancer using a combination of sequence data and information retrieval algorithms to provide the most accurate result. The developed system maintains a knowledge base of the mutations of the diseases causing breast cancer and pancreatitis and thus uses techniques of protein sequence scoring and information retrieval for providing the best match of patient protein sequence with the mutations stored. The system has been tested with mutations available online and gives 98 % accurate results.


Levenshtein edit distance Needleman Wunsch Parallel programming Sequence data Disease detection 


  1. 1.
  2. 2.
    Mayeus, R.: Biomarkers: potential use and limitations. NeuroRx*: J. Am. Soc. Exp. Neuro Therapeutics 2(1), 182–188 (2004)Google Scholar
  3. 3.
    McClean, P.: Blast: basic local alignment search tool. (2004)
  4. 4.
    Desomnd, H.G., Paul, S.M.: CLUSTAL: a package for performing multiple sequence alignment on microcomputer. Gene 73(1), 237–244 (1990)Google Scholar
  5. 5.
    Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acid Res. 22 (22), 4673–4680 (1994)Google Scholar
  6. 6.
  7. 7.
  8. 8.
    Smith, R.A., Cokkinides, V.: American cancer society guidelines for the early detection of cancer. Am. Cancer Soc. 56(1), 11–25 (2006)Google Scholar
  9. 9.
    Parag, D., Singh, D., Singh, A.: Mining lung cancer data and other diseases data using data mining techniques: a survey. Int. J. Comput. Eng. Technol. 4(2), 508–516 (2013)Google Scholar
  10. 10.
    Acharya, U.R., Sankaranarayanan, M., Nayak, J., Xiang, C., Tamura T.: Automatic identification of cardiac heath using modeling: a comparative study, Elsevier Inf. Sci. 178(23), 4571–4582 (2008)Google Scholar
  11. 11.
    Human genome project.
  12. 12.
    Saha, R., Killian, S., Donofrio, R.S.: DNA vaccines: a mini review. Recent Pat. DNA Gene Seq. 5(2) (2011)Google Scholar
  13. 13.
    Majtán, T., Bukovska, G., Timko, J.: DNA microarray-techniques and applications in microbial systems. Folia Microbiol. 49(6), 635–664 (2004)Google Scholar
  14. 14.
    Franzen, C., Müller, A.: Molecular techniques for detection, species differentiation, and phylogenetic analysis of microsporidia. Clin. Microbiol. Rev. 12(2), 243 (1999)Google Scholar
  15. 15.
    Shouman, M., Turner, T., Stocker, R.: Using data mining techniques in heart disease diagnosis and treatment. In: Proceedings of International Conference on Electronics, Communications and Computers, vol. 6, no. 9, pp. 173–177. IEEE, Alexandria, March 2012Google Scholar
  16. 16.
    Alizadehsani, R., Habibi, J.: A data mining approach for diagnosis of coronary artery disease. Comput. Methods Progr. Biomed. 111(1), 52–61 (2013)CrossRefGoogle Scholar
  17. 17.
    Kaur, H., Wasan, S.K.: Empirical study on applications of data mining techniques in healthcare. J. Comput. 2, 194–200 (2006)Google Scholar
  18. 18.
    Dragusin, R., Petcu, P.: Rare disease diagnosis as an information retrieval task. In: Proceeding in International Conference of Theoretical Information Retrieval, pp. 356–359. Springer, Berlin (2011)Google Scholar
  19. 19.
    Maity, A., Sivakumar, P., Rajasekhara Babu, M., Pradeep Reddy, Ch.: Performance evolution of heart sound information retrieval system in multi-core environment. IJCSIT 3(3), 4404–4407 (2012)Google Scholar

Copyright information

© Springer India 2016

Authors and Affiliations

  • Sankranti Joshi
    • 1
  • Pai M. Radhika
    • 1
  • Pai M. M. Manohara
    • 1
  1. 1.Department of Information and Communication TechnologyManipal Institute of TechnologyManipalIndia

Personalised recommendations