Skip to main content
Log in

A soft computing model based on asymmetric Gaussian mixtures and Bayesian inference

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

A novel unsupervised Bayesian learning framework based on asymmetric Gaussian mixture (AGM) statistical model is proposed since AGM is shown to be more effective compared to the classic Gaussian mixture model. The Bayesian learning framework is developed by adopting sampling-based Markov chain Monte Carlo (MCMC) methodology. More precisely, the fundamental learning algorithm is a hybrid Metropolis–Hastings within Gibbs sampling solution which is integrated within a reversible jump MCMC learning framework, a self-adapted sampling-based implementation, that enables model transfer throughout the mixture parameters learning process and therefore automatically converges to the optimal number of data groups. Furthermore, in order to handle high-dimensional vectors of features, a dimensionality reduction algorithm based on mixtures of distributions is included to tackle the irrelevant and extraneous features. The performance comparison between AGM and other popular models is given, and both synthetic and real datasets extracted from challenging applications such as intrusion detection, spam filtering and image categorization are evaluated to show the merits of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Al-Janabi S (2017) Pragmatic miner to risk analysis for intrusion detection (PMRA-ID). In: International conference on soft computing in data science. Springer, pp 263–277

  • Al-Janabi S, Alkaim AF (2019) A nifty collaborative analysis to predicting a novel tool (DRFLLS) for missing values estimation. Soft Comput. https://doi.org/10.1007/s00500-019-03972-x

  • Al-Janabi S, AlShourbaji I (2016) A study of cyber security awareness in educational environment in the middle east. JIKM. https://doi.org/10.1142/S0219649216500076

    Article  Google Scholar 

  • Al-Janabi S, Alwan E (2017) Soft mathematical system to solve black box problem through development the FARB based on hyperbolic and polynomial functions. In: 10th International conference on developments in eSystems engineering, DeSE 2017, Paris, France, June 14–16, 2017. IEEE, pp 37–42. https://doi.org/10.1109/DeSE.2017.23

  • Al-Janabi S, Mahdi M (2019) Evaluation prediction techniques to achievement an optimal biomedical analysis. Int J Grid Util Comput. https://doi.org/10.1007/s00500-019-03959-8

    Article  Google Scholar 

  • Al-Janabi S, Rawat S, Patel A, Al-Shourbaji I (2015) Design and evaluation of a hybrid system for detection and prediction of faults in electrical transformers. Int J Electr Power Energy Syst 67:324–335

    Article  Google Scholar 

  • Al-Janabi S, Salman MA, Fanfakh A (2018a) Recommendation system to improve time management for people in education environments. J Eng Appl Sci 13(24):10182–10193

    Google Scholar 

  • Al-Janabi S, Al-Shourbaji I, Salman MA (2018b) Assessing the suitability of soft computing approaches for forest fires prediction. Appl Comput Inf 14(2):214–224

    Google Scholar 

  • Ashfaq RAR, Wang XZ, Huang JZ, Abbas H, He YL (2017) Fuzziness based semi-supervised learning approach for intrusion detection system. Inf Sci 378:484–497

    Article  Google Scholar 

  • Azam M, Bouguila N (2015) Unsupervised keyword spotting using bounded generalized gaussian mixture model with ICA. In: 2015 IEEE global conference on signal and information processing, GlobalSIP 2015, Orlando, FL, USA, December 14–16, 2015. IEEE, pp 1150–1154

  • Bouguila N (2007) Spatial color image databases summarization. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, ICASSP 2007, Honolulu, Hawaii, USA, April 15–20, 2007. IEEE, pp 953–956

  • Bouguila N (2011a) Bayesian hybrid generative discriminative learning based on finite liouville mixture models. Pattern Recognit 44(6):1183–1200

    Article  Google Scholar 

  • Bouguila N (2011b) Count data modeling and classification using finite mixtures of distributions. IEEE Trans Neural Netw 22(2):186–198

    Article  Google Scholar 

  • Bouguila N, Elguebaly T (2012) A fully bayesian model based on reversible jump MCMC and finite beta mixtures for clustering. Expert Syst Appl 39(5):5946–5959

    Article  Google Scholar 

  • Bouguila N, Ziou D (2004a) Dirichlet-based probability model applied to human skin detection [image skin detection]. In: 2004 IEEE international conference on acoustics, speech, and signal processing, ICASSP 2004, Montreal, Quebec, Canada, May 17–21, 2004. IEEE, pp 521–524

  • Bouguila N, Ziou D (2004b) Improving content based image retrieval systems using finite multinomial dirichlet mixture. In: Proceedings of the 2004 14th IEEE signal processing society workshop on machine learning for signal processing. IEEE, pp 23–32

  • Bouguila N, Ziou D (2004c) A powreful finite mixture model based on the generalized dirichlet distribution: Unsupervised learning and applications. In: 17th International conference on pattern recognition, ICPR 2004, Cambridge, UK, August 23–26, 2004. IEEE Computer Society, pp 280–283

  • Bouguila N, Ziou D, Monga E (2006) Practical bayesian estimation of a finite beta mixture through gibbs sampling and its applications. Stat Comput 16(2):215–225

    Article  MathSciNet  Google Scholar 

  • Bouguila N, Ziou D, Hammoud RI (2009) On bayesian analysis of a finite generalized dirichlet mixture via a metropolis-within-gibbs sampling. Pattern Anal Appl 12(2):151–166

    Article  MathSciNet  Google Scholar 

  • Bouguila N, Ziou D, Boutemedjet S (2011) Simultaneous non-gaussian data clustering, feature selection and outliers rejection. In: Kuznetsov SO, Mandal DP, Kundu MK, Pal SK (eds) Pattern Recognition and Machine Intelligence-4th International Conference, PReMI 2011, Moscow, Russia, June 27–July 1, 2011. Proceedings, Lecture Notes in Computer Science, vol 6744. Springer, pp 364–369

  • Bourouis S, Mashrgy MA, Bouguila N (2014) Bayesian learning of finite generalized inverted dirichlet mixtures: application to object classification and forgery detection. Expert Syst Appl 41(5):2329–2336

    Article  Google Scholar 

  • Boutemedjet S, Ziou D, Bouguila N (2007) Unsupervised feature selection for accurate recommendation of high-dimensional image data. In: Platt JC, Koller D, Singer Y, Roweis ST (eds) Advances in Neural information processing systems 20, proceedings of the twenty-first annual conference on neural information processing systems, Vancouver, British Columbia, Canada, December 3–6, 2007. Curran Associates, Inc., Red Hook, pp 177–184

    Google Scholar 

  • Boutemedjet S, Bouguila N, Ziou D (2009) A hybrid feature extraction selection approach for high-dimensional non-gaussian data clustering. IEEE Trans Pattern Anal Mach Intel 31(8):1429–1443

    Article  Google Scholar 

  • Buczak AL, Guven E (2016) A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun Surv Tutor 18(2):1153–1176

    Article  Google Scholar 

  • Casella G, Robert CP, Wells MT (2004) Mixture models, latent variables and partitioned importance sampling. Stat Methodol 1(1–2):1–18

    Article  MathSciNet  Google Scholar 

  • Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, Prague, vol 1, pp 1–2

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (methodological) 39:1–38

    MathSciNet  MATH  Google Scholar 

  • Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889

    MathSciNet  MATH  Google Scholar 

  • Elguebaly T, Bouguila N (2011) Bayesian learning of finite generalized gaussian mixture models on images. Signal Process 91(4):801–820

    Article  Google Scholar 

  • Elguebaly T, Bouguila N (2013) Simultaneous bayesian clustering and feature selection using RJMCMC-based learning of finite generalized dirichlet mixture models. Signal Process 93(6):1531–1546

    Article  Google Scholar 

  • Elguebaly T, Bouguila N (2014) Background subtraction using finite mixtures of asymmetric gaussian distributions and shadow detection. Mach Vis Appl 25(5):1145–1162

    Article  Google Scholar 

  • Elguebaly T, Bouguila N (2015) Simultaneous high-dimensional clustering and feature selection using asymmetric gaussian mixture models. Image Vis Comput 34:27–41

    Article  Google Scholar 

  • Fan W, Bouguila N (2011) Infinite dirichlet mixture model and its application via variational bayes. In: 2011 10th international conference on machine learning and applications and workshops (ICMLA), vol 1. IEEE, pp 129–132

  • Geman S, Donald G (1987) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. In: Readings in computer vision. Morgan Kaufmann, pp 564–584

  • Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. J R Stat Soc Ser C (Applied Statistics) 28(1):100–108

    MATH  Google Scholar 

  • Hastings WK (1970) Monte carlo sampling methods using markov chains and their applications. Biometrika 57(1):97–109

    Article  MathSciNet  Google Scholar 

  • Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intel 97(1–2):273–324

    Article  Google Scholar 

  • Lab K (2018) Spam: share of global email traffic 2014–2017. https://www.statista.com/statistics/420391/spam-email-traffic-share/

  • Law MH, Figueiredo MA, Jain AK (2004) Simultaneous feature selection and clustering using mixture models. IEEE Trans Pattern Anal Mach Intel 26(9):1154–1166

    Article  Google Scholar 

  • Lee W, Stolfo SJ (1998) Data mining approaches for intrusion detection. In: Rubin AD (ed) Proceedings of the 7th USENIX security symposium, San Antonio, TX, USA, January 26–29 1998. USENIX Association, Berkeley

    Google Scholar 

  • Li L-J, Fei-Fei L (2007) What, where and who? classifying events by scene and object recognition. Iccv 2(5):6

    Google Scholar 

  • Lowe DG (1999) Object recognition from local scale-invariant features. In: The proceedings of the seventh IEEE international conference on Computer vision, vol 2. IEEE, pp 1150–1157

  • Luengo D, Martino L (2013) Fully adaptive gaussian mixture metropolis-hastings algorithm. In: IEEE international conference on acoustics, speech and signal processing, ICASSP 2013, Vancouver, BC, Canada, May 26–31, 2013. IEEE, pp 6148–6152

  • Mao KZ (2005) Identifying critical variables of principal components for unsupervised feature selection. IEEE Trans Syst Man Cybern Part B 35(2):339–344

    Article  MathSciNet  Google Scholar 

  • Mark H, Erik R, George F, Jaap Suermondt of Hewlett-Packard Labs (2018) UCI machine learning repository: spambase data set. http://archive.ics.uci.edu/ml/datasets/Spambase

  • Najar F, Bourouis S, Zaguia A, Bouguila N, Belghith S (2018) Unsupervised human action categorization using a riemannian averaged fixed-point learning of multivariate GGMM. In: International conference image analysis and recognition. Springer, pp 408–415

  • Patel A, Al-Janabi S, AlShourbaji I, Pedersen JM (2015) A novel methodology towards a trusted environment in mashup web applications. Comput Secur 49:107–122. https://doi.org/10.1016/j.cose.2014.10.009

    Article  Google Scholar 

  • Raudys S, Jain AK (1991) Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Trans Pattern Anal Mach Intell 13(3):252–264

    Article  Google Scholar 

  • Richardson S, Green PJ (1997) On bayesian analysis of mixtures with an unknown number of components (with discussion). J R Stat Soc Ser B (Statistical Methodology) 59(4):731–792

    Article  Google Scholar 

  • Stephens M (2000) Bayesian analysis of mixture models with an unknown number of components-an alternative to reversible jump methods. Annals Stat 28:40–74

    Article  MathSciNet  Google Scholar 

  • Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD CUP 99 data set. In: 2009 IEEE symposium on computational intelligence for security and defense applications, CISDA 2009, Ottawa, Canada, July 8–10, 2009. IEEE, pp 1–6

  • Tsai C, Chiu C (2008) Developing a feature weight self-adjustment mechanism for a k-means clustering algorithm. Comput Stat Data Anal 52(10):4658–4672

    Article  MathSciNet  Google Scholar 

  • Wen CK, Jin S, Wong KK, Chen JC, Ting P (2015) Channel estimation for massive mimo using gaussian-mixture bayesian learning. IEEE Trans Wirel Commun 14(3):1356–1368

    Article  Google Scholar 

  • Yang J, Liao X, Yuan X, Llull P, Brady DJ, Sapiro G, Carin L (2015) Compressive sensing by learning a gaussian mixture model from measurements. IEEE Trans Image Process 24(1):106–119

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The completion of this research was made possible thanks to the Natural Sciences and Engineering Research Council of Canada (NSERC) (Grant No. 6656) and Concordia University via a Research Chair Tier II.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuai Fu.

Ethics declarations

Conflicts of interest

All the authors declare that this article has no conflict of interest.

Ethical approval

This does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fu, S., Bouguila, N. A soft computing model based on asymmetric Gaussian mixtures and Bayesian inference. Soft Comput 24, 4841–4853 (2020). https://doi.org/10.1007/s00500-019-04238-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-019-04238-2

Keywords

Navigation