An Efficient Framework for Prediction in Healthcare Data Using Soft Computing Techniques

  • Veena H. Bhat
  • Prasanth G. Rao
  • S. Krishna
  • P. Deepa Shenoy
  • K. R. Venugopal
  • L. M. Patnaik
Part of the Communications in Computer and Information Science book series (CCIS, volume 192)


Healthcare organizations aim at deriving valuable insights employing data mining and soft computing techniques on the vast data stores that have been accumulated over the years. This data however, might consist of missing, incorrect and most of the time, incomplete instances that can have a detrimental effect on the predictive analytics of the healthcare data. Preprocessing of this data, specifically the imputation of missing values offers a challenge for reliable modeling. This work presents a novel preprocessing phase with missing value imputation for both numerical and categorical data. A hybrid combination of Classification and Regression Trees (CART) and Genetic Algorithms to impute missing continuous values and Self Organizing Feature Maps (SOFM) to impute categorical values is adapted in this work. Further, Artificial Neural Networks (ANN) is used to validate the improved accuracy of prediction after imputation. To evaluate this model, we use PIMA Indians Diabetes Data set (PIDD), and Mammographic Mass Data (MMD). The accuracy of the proposed model that emphasizes on a preprocessing phase is shown to be superior over the existing techniques. This approach is simple, easy to implement and practically reliable.


Imputation soft computing categorical data continuous data 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cabena, P., Hadjinian, P., Stadler, R., Verhees, J., Zanasi, A.: Discovering Data Mining: from Concepts to Implementation. Prentice Hall, Englewood Cliffs (1998)Google Scholar
  2. 2.
    Acuna, E., Rodriguez, C.: The Treatment of Missing Values and its Effect in the Classifier Accuracy. In: Multiscale Methods in Science and Engineering. LNCS, pp. 639–647. Springer, Heidelberg (2004)Google Scholar
  3. 3.
    Peng, L., Lei, L.: A Review of Missing Data Treatment Methods. Intelligent Information Management Systems and Technologies 1(3), 412–419 (2005)Google Scholar
  4. 4.
    Bhat, V.H., Rao, P.G., Shenoy, P.D., Venugopal, K.R., Patnaik, L.M.: An Efficient Prediction Model for Diabetic Database Using Soft Computing Techniques. In: Sakai, H., Chakraborty, M.K., Hassanien, A.E., Ślęzak, D., Zhu, W. (eds.) RSFDGrC 2009. LNCS, vol. 5908, pp. 328–335. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  5. 5.
    Mehala, B., Ranjit, J.T.P., Vivekanandan, K.: Selecting Scalable Algorithms to Deal with Missing Values. International Journal of Recent Trends in Engineering 1(2) (2009)Google Scholar
  6. 6.
    Batista, G.E.A.P.A., Monard, M.C.: K-Nearest Neighbour as Imputation Method. Experimental Results. Tech. Report 186, ICMC-USP (2002)Google Scholar
  7. 7.
    Breault, J.L.: Data Mining Diabetic Databases: Are Rough Sets a Useful Addition? Artificial Intelligence in Medicine 27, 227–236 (2003)CrossRefGoogle Scholar
  8. 8.
    King, M.A., Elder IV, J.F., et al.: Evaluation of Fourteen Desktop Data Mining Tools. In: Proc. of IEEE International Conference on Systems, Man and Cybernetics, San Diego, CA (1998)Google Scholar
  9. 9.
    Khan, A.H.: Multiplier-free Feedforward Networks. In: Proc. of the IEEE International Joint Conference on Neural Networks (IJCNN), Honolulu, Hawaii, vol. 3, pp. 2698–2703 (2002)Google Scholar
  10. 10.
    Elsayad, A.M.: Predicting the Severity of Breast Masses with Ensemble of Bayesian Classifiers. Journal of Computer Science 6(5), 576–584 (2010)CrossRefGoogle Scholar
  11. 11.
    Machine Learning Database Repository at the University of California, Irvine,
  12. 12.
    Kayaer, K., Yildirim, T.: Medical Diagnosis on Pima Indian Diabetes using General Regression Neural Networks. In: Proc. of the International Conference on Artificial Neural Networks/International Conference on Neural Information Processing, Istanbul, Turkey, pp. 181–184 (2003)Google Scholar
  13. 13.
    Aslam, M.W., Nandi, A.K.: Detection of Diabetes using Genetic Programming. In: 18th European Signal Processing Conference, Denmark, pp. 1184–1188 (August 2010)Google Scholar
  14. 14.
    Magnani, M.: Techniques for Dealing with Missing Data in Knowledge Discovery Tasks. By Department of Computer Science, University of Bologna (2004)Google Scholar
  15. 15.
    Estébanez, C., Aler, R., José, M.: Method Based on Genetic Programming for Improving the Quality of Data Sets in Classification Problems. International Journal of Computer Science and Applications 4(1), 69–80 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Veena H. Bhat
    • 1
    • 2
  • Prasanth G. Rao
    • 3
  • S. Krishna
    • 1
  • P. Deepa Shenoy
    • 1
  • K. R. Venugopal
    • 1
  • L. M. Patnaik
    • 4
  1. 1.Department of Computer Science and EngineeringUniversity Visvesvaraya College of EngineeringBangaloreIndia
  2. 2.IBS-BangaloreBangaloreIndia
  3. 3.BangaloreIndia
  4. 4.Defense Institute of Advanced TechnologyPuneIndia

Personalised recommendations