Advertisement

Cluster Computing

, Volume 22, Supplement 1, pp 2437–2460 | Cite as

A parameter based growing ensemble of self-organizing maps for outlier detection in healthcare

  • Samir Elmougy
  • M. Shamim HossainEmail author
  • Ahmed S. Tolba
  • Mohammed F. Alhamid
  • Ghulam Muhammad
Article
  • 395 Downloads

Abstract

Outlier detection is critical for many applications such as healthcare, health insurance, medical diagnosis, predictive analytics, pattern recognition, intrusion detection, anomaly or defect detection, video surveillance, credit card fraud detection and text mining. Outlier detection techniques could be statistics, distance- or model based. Techniques, which are based on a single method for outlier detection usually have weaknesses and strengths and are mostly unstable. Outlier detection ensembles harness the strengths of individual detectors and result in stable performance. This paper presents a new parameter based growing self-organizing maps ensemble (GSOME) for outlier detection in multivariate patterns. For outlier detection, the proposed GSOME transforms non-linear relationships between high dimensional patterns into a simple 1D geometric relationship. Whatever the pattern dimensionality is, it is mapped to a single point of a line. The dispersion of mapped points will be used to locate the outliers and measure the degree of outlyingness. Several experiments on both real and synthetic data sets show the promising performance of the proposed GSOME.

Keywords

Self-organizing map Outlier detection Anomaly detection Ensembles Diversity Healthcare 

Notes

Acknowledgements

This work was supported by the Deanship of Scientific Research at King Saud University, Riyadh, Saudi Arabia, through the Research Group Project under Grant RG -1436-023.

References

  1. 1.
    Christy, A., MeeraGandhi, G., Vaithyasubramanian, S.: Cluster based outlier detection algorithm for healthcare data. Procedia Comput. Sci. 50, 209–215 (2015)CrossRefGoogle Scholar
  2. 2.
    Muhammad, G.: Automatic speech recognition using interlaced derivative pattern for cloud based healthcare system. Clust. Comput. 18(2), 795–802 (2015)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Vembandasamy, K., Karthikeyan, T.: Novel outlier detection in diabetics classification using data mining techniques. Int. J. Appl. Eng. Res. 11(2), 1400–1403 (2016)Google Scholar
  4. 4.
    Hu, L., et al.: Software defined healthcare networks. IEEE Wirel. Commun. 22(6), 67–75 (2015)CrossRefGoogle Scholar
  5. 5.
    Hossain, M.S., Muhammad, G., Alamri, A.: Smart healthcare monitoring: a voice pathology detection paradigm for smart cities. Multimed. Syst. (2017).  https://doi.org/10.1007/s00530-017-0561-x
  6. 6.
    Hossain, M.S., Muhammad, G.: Cloud-assisted industrial internet of things (IIoT)—enabled framework for health monitoring. Comput. Netw. 101(2016), 192–202 (2016)CrossRefGoogle Scholar
  7. 7.
    Hossain, M.S., Muhammad, G.: Cloud-assisted speech and face recognition framework for health monitoring. Mob. Netw. Appl. 20(3), 391–399 (2015)CrossRefGoogle Scholar
  8. 8.
    Hu, Y., Duan, K., Zhang, Y. et al.: Simultaneously aided diagnosis model for outpatient departments via healthcare big data analytics. Multimed Tools Appl. (2016).  https://doi.org/10.1007/s11042-016-3719-1
  9. 9.
    Hauskrecht, M., Batal, I., Hong, C., Nguyen, Q., Cooper, G.E., Visweswaran, S., Clermont, G.: Outlier-based detection of unusual patient-management actions. An ICU study. J. Biomed. Inform. 64, 211–221 (2017)CrossRefGoogle Scholar
  10. 10.
    Laurikkala, J., Juhola, M., Kentala, E.: Informal identification of outliers in medical data. In: Intelligent Data Analysis in Medicine and Pharmacology (IDAMAP-2000), A Workshop at the 14th European Conference on Artificial Intelligence (ECAI-2000), Berlin, Germany, August 20–25 (2000)Google Scholar
  11. 11.
    Hauskrecht, M., Batal, I., Valko, M., Visweswaran, S., Cooper, G.F., Clermont, G.: Outlier detection for patient monitoring and alerting. J. Biomed. Inf. 46(1), 47–55 (2013).  https://doi.org/10.1016/j.jbi.2012.08.004
  12. 12.
    Ypma, R., Duin, P.W.: Novelty detection using self-organizing maps. In: Kasabov, N., Kozma, R., Ko, K., O’Shea, R., Coghill, G., Gedeon, T. (eds.) Progress in Connectionist-Based Information Systems, vol. 2, pp. 1322–1325. Springer, London (1997)Google Scholar
  13. 13.
    Banerjee, A., Chandola, V., Lazarevic, A., Kumar, V., Srivastava, J.: Anomaly Detection: A Tutorial. In: SIAM Data Mining Conference, Atlanta, GA (2008)Google Scholar
  14. 14.
    Song, X., Wu, M., Jermaine, C., Ranka, S.: Conditional anomaly detection. IEEE Trans. Knowl. Data Eng. 19(5), 631–645 (2007)CrossRefGoogle Scholar
  15. 15.
    Olivetti & Oracle Research Laboratory, The Olivetti & Oracle Research Laboratory Face Database of Faces. http://www.cam-orl.co.uk/facedatabase.html
  16. 16.
    TILDA, Textile defect image database. University of Freiburg, Germany (1996)Google Scholar
  17. 17.
    Geman, S., et al.: Neural networks and the bias/variance dilemma. Neural Comput. 4, 1–58 (1992)CrossRefGoogle Scholar
  18. 18.
    Zhang, Y., Meratnia, N., Havinga, P.J.M.: Outlier Detection Techniques for Wireless Sensor Network: A Survey. University of Twente, Enschede (2008)Google Scholar
  19. 19.
    Ghaemi, R., Sulaiman, M.N., Ibrahim, I., Mustapha, N.: A Survey: Clustering Ensembles Techniques. World Academy of Science, Engineering and Technology, Singapore (2009)Google Scholar
  20. 20.
    Lazarevic, A., Kumar, V.: Feature bagging for outlier detection. In: KDD, pp. 157–166 (2005)Google Scholar
  21. 21.
    Hellerstein, J.M.: Quantitative data cleaning for large databases. http://db.cs.berkeley.edu/jmh/papers/cleaning-unece.pdf (Last visited in 2010)
  22. 22.
    Hodge, V.J., Austin, J.A.: Survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)CrossRefzbMATHGoogle Scholar
  23. 23.
    Fausette, V.L.: Fundamentals of Neural Networks. Prentice Hall, Upper Saddle River (1993)Google Scholar
  24. 24.
    Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Jagadish, H.V., Mumick, I.S. (Eds.). Proceedings of the ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4-6, pp. 103–114. ACM Press, New York (1996)Google Scholar
  25. 25.
    Ester, M., Kriegel, H-P., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, pp. 226–231 (1996)Google Scholar
  26. 26.
    Stolfo, S.J., Prodromidis, A.L., Tselepis, S., Lee, W., Fan, D.W., Chan, P.K.: JAM: Java agents for meta-learning over distributed databases. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 74–81 (1997)Google Scholar
  27. 27.
    Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International Group, Belmont, CA (1984)zbMATHGoogle Scholar
  28. 28.
    Cohen, W.W.: Fast effective rule induction. In: International Conference on Machine Learning, pp. 115–123 (1995)Google Scholar
  29. 29.
    Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: algorithms and applications. VLDB J. 8, 237–253 (2000)CrossRefGoogle Scholar
  30. 30.
    Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)CrossRefzbMATHGoogle Scholar
  31. 31.
    Saunders, R., Gero, J.S.: A curious design agent: a computational model of novelty-seeking behavior in design. In: Proceedings of the Sixth Conference on Computer Aided Architectural Design Research in Asia (CAADRIA2001), Sydney, pp. 725–738(2001a)Google Scholar
  32. 32.
    Vesanto, J., Himberg, J., Siponen, M., Simula, O.: Enhancing SOM based data visualization. In: Proceedings of the 5th International Conference on Soft Computing and Information/Intelligent Systems. Methodologies for the Conception, Design and Application of Soft Computing, vol. 1, pp. 64–67. Singapore: World Scientific (1998)Google Scholar
  33. 33.
    Graham, W., Rohan, B., Hongxing, H., Hawkins, S., Gu, L.: A comparative study of RNN for outlier detection in data mining. In: ICDM ’02 Proceedings of the 2002 IEEE International Conference on Data Mining IEEE Computer Society Washington, DC, USA (2002)Google Scholar
  34. 34.
    Hawkins, S., Hongxing, H., Graham, W., Rohan, B., Baxter, A.: Outlier Detection Using Replicator Neural Networks, DaWaK, pp. 170–180. Springer, New York (2002)Google Scholar
  35. 35.
    Kohonen, T.: Self-Organizing Maps. Springer, New York (2001)CrossRefzbMATHGoogle Scholar
  36. 36.
    Jiawei, H., Micheline, K., Pei, P.: Data Mining: Concepts and Techniques, 3rd edn. Elsevier, New York (2010)Google Scholar
  37. 37.
    Saunders, R., Gero, J.S.: Designing for interest and novelty: motivating design agents. In: Proceedings of CAAD Futures 2001, pp. 725–738. Eindhoven (2001)Google Scholar
  38. 38.
    Marsland, S.: On-line novelty detection through self-organization, with application to inspection robotics. Ph.D. thesis, Faculty of Science and Engineering, University of Manchester, UK (2001)Google Scholar
  39. 39.
    Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorization. J. Inf. Fusion 6(1), 5–20 (2005)CrossRefGoogle Scholar
  40. 40.
    Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles. Mach. Learn. 51, 181–207 (2003)CrossRefzbMATHGoogle Scholar
  41. 41.
    Savdra, C., Salas, R., Moreno, S., Allende, H.: Fusion of self organizing maps. In: Prudhomme et al. (eds.) LNCS 4507, (2007); ISMIS, LNAI 4994 (2008)Google Scholar
  42. 42.
    Vesanto, J., Himberg, J., Alhoniemi, E., Parhankangas, J.: Self-Organizing Map in Matlab: the SOM Toolbox. In: Proceedings of the Matlab DSP Conference, pp. 35–40. Espoo, Finland (1999)Google Scholar
  43. 43.
    Moglu, F., Alpaydin, E.: Combining multiple representations for pen-based handwritten digit recognition. Turk J. Electr. Eng. 9(1) (2001)Google Scholar
  44. 44.
    Xue, Z., Shang, Y., Feng, A.: Semi-supervised outlier detection based on fuzzy rough C-means clustering. Math Comput. Simul. 80(9) (2010)Google Scholar
  45. 45.
    Buizza, R., Palmer, T.N.: Impact of Ensemble Size on Ensemble Prediction, European Centre for Medium-Range Weather Forecasts, Reading, Berkshire, UK (1988)Google Scholar
  46. 46.
    UC Irvine machine learning repository. http://archive.ics.uci.edu/ml/index.html (2010)

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  • Samir Elmougy
    • 1
  • M. Shamim Hossain
    • 2
    Email author
  • Ahmed S. Tolba
    • 1
  • Mohammed F. Alhamid
    • 2
  • Ghulam Muhammad
    • 3
  1. 1.Department of Computer Science, Faculty of Computers and InformationMansoura UniversityMansouraEgypt
  2. 2.Department of Software Engineering, College of Computer and Information SciencesKing Saud UniversityRiyadhSaudi Arabia
  3. 3.Department of Computer Engineering, College of Computer and Information SciencesKing Saud UniversityRiyadhSaudi Arabia

Personalised recommendations