SPMoE: a novel subspace-projected mixture of experts model for multi-target regression problems

Hadavandi, Esmaeil; Shahrabi, Jamal; Hayashi, Yoichi

doi:10.1007/s00500-015-1623-7

SPMoE: a novel subspace-projected mixture of experts model for multi-target regression problems

Methodologies and Application
Published: 03 March 2015

Volume 20, pages 2047–2065, (2016)
Cite this article

Soft Computing Aims and scope Submit manuscript

Esmaeil Hadavandi¹,
Jamal Shahrabi¹ &
Yoichi Hayashi²

429 Accesses
22 Citations
3 Altmetric
Explore all metrics

Abstract

In this paper, we focus on modeling multi-target regression problems with high-dimensional feature spaces and a small number of instances that are common in many real-life problems of predictive modeling. With the aim of designing an accurate prediction tool, we present a novel mixture of experts (MoE) model called subspace-projected MoE (SPMoE). Training the experts of the SPMoE is done using a boosting-like manner by a combination of ideas from subspace projection method and the negative correlation learning algorithm (NCL). Instead of using whole original input space for training the experts, we develop a new cluster-based subspace projection method to obtain projected subspaces focused on the difficult instances at each step of the boosting approach for training the diverse experts. The experts of the SPMoE are trained on the obtained subspaces using a new NCL algorithm called sequential NCL. The SPMoE is compared with the other ensemble models using three real cases of high-dimensional multi-target regression problems; the electrical discharge machining, energy efficiency and an important problem in the field of operations strategy called the practice–performance problem. The experimental results show that the prediction accuracy of the SPMoE is significantly better than the other ensemble and single models and can be considered to be a promising alternative for modeling the high-dimensional multi-target regression problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Maximizing adjusted covariance: new supervised dimension reduction for classification

Article 02 April 2024

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Notes

To avoid confusion we must emphasize that the subspace projections in each step of the boosting approach ($S_t )$ are constructed using only difficult instances and the whole training set is transformed with this projection to obtain the next NN expert.

References

Aho T, Zenko B, Dzeroski S, Elomaa T (2012) Multi-target regression with rule ensembles. J Mach Learn Res 13:1–48
MathSciNet MATH Google Scholar
Amoako-Gyampah K, Acquaah M (2008) Manufacturing strategy, competitive strategy and firm performance: An empirical study in a developing economy environment. Int J Prod Econ 111:575–592
Article Google Scholar
Ao SI (2011) A hybrid neural network cybernetic system for quantifying cross-market dynamics and business forecasting. Soft Comput 15:1041–1053
Article Google Scholar
Asadi S, Hadavandi E, Mehmanpazir F, Nakhostin MM (2012) Hybridization of evolutionary Levenberg–Marquardt neural networks and data pre-processing for stock market prediction. Knowl-Based Syst 35:245–258
Article Google Scholar
Barbosa BHG, Bui LT, Abbass HA, Aguirre LA, Braga AP (2011) The use of coevolution and the artificial immune system for ensemble learning. Soft Comput 15:1735–1747
Article Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140
MathSciNet MATH Google Scholar
Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. Inf Fusion 6(1):5–20
Article Google Scholar
Chen H, Yao X (2009) Regularized negative correlation learning for neural network ensembles. IEEE Trans Neural Netw/Publ IEEE Neural Netw Counc 20:1962–1979. doi:10.1109/TNN.2009.2034144
Article Google Scholar
Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40:139–157
Article Google Scholar
Ebrahimpour R, Sadeghnejad N, Arani SAAA, Mohammadi N (2013) Boost-wise pre-loaded mixture of experts for classification tasks. Neural Comput Appl 22(1):365–377
Article Google Scholar
Enki DG, Trendafilov NT, Jolliffe IT (2013) A clustering approach to interpretable principal components. J Appl Stat 40:583–599
Article MathSciNet Google Scholar
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: The thirteenth international conference on machine learning (ICML’96). Morgan Kaufman, San Francisco, pp 148–156
Fu X, Wang L (2003) Data dimensionality reduction with application to simplifying RBF network structure and improving classification performance. IEEE Trans Syst Man Cybern Part B Cybern Publ IEEE Syst Man Cybern Soc 33:399–409. doi:10.1109/TSMCB.2003.810911
Article Google Scholar
García-Pedrajas N, Maudes-Raedo J, García-Osorio C, Rodríguez-Díez JJ (2012) Supervised subspace projections for constructing ensembles of classifiers. Inf Sci 193:1–21
Article Google Scholar
Hadavandi E, Shavandi H, Ghanbari A (2010) Integration of genetic fuzzy systems and artificial neural networks for stock price forecasting. Knowl-Based Syst 23:800–808
Article Google Scholar
Hajirezaie M, Moattar Husseini SM, Abdollahzadeh Barfourosh A (2010) Modeling and evaluating the strategic effects of improvement programs on the manufacturing performance using neural networks. Afr J Bus Manag 4:414–424
Google Scholar
Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12:993–1001
Article Google Scholar
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20:832–844
Article Google Scholar
Hochberg Y (1988) A sharper bonferroni procedure for multiple tests of significance. Biometrika 75:800–803
Article MathSciNet MATH Google Scholar
Jacobs RA, Jordan MJ, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput 3:79–87
Article Google Scholar
Karalič A, Bratko I (1997) First order regression. Mach Learn 26:147–176
Article MATH Google Scholar
Kheradpisheh SR, Sharifizadeh F, Nowzari-Dalini A, Ganjtabesh M, Ebrahimpour R (2014) Mixture of feature specified experts. Inf Fusion 20:242–251
Article Google Scholar
Kocev D, Vens C, Struyf J, Džeroskia S (2013) Tree ensembles for predicting structured outputs. Pattern Recognit 46:817–833
Article Google Scholar
Kotsiantis S (2011) Combining bagging, boosting, rotation forest and random subspace methods. Artif Intell Rev 35:1–18
Article Google Scholar
Liu Y, Yao X (1999) Ensemble learning via negative correlation. Neural Netw 12:1399–1404
Article Google Scholar
Luengo J, García S, Herrera F (2009) A study on the use of statistical tests for experimentation with neural networks: analysis of parametric test conditions and non-parametric tests. Expert Syst Appl 36:7798–7808
Article Google Scholar
Masoudnia S, Ebrahimpour R, Arani SAAA (2012) Combining features of negative correlation learning with mixture of experts in proposed ensemble methods. Appl Soft Comput 12:3539–3551
Article Google Scholar
Masoudnia S, Ebrahimpour R (2014) Mixture of experts: a literature survey. Artif Intell Rev 42(2):275–293
Article Google Scholar
McKay R, Abbass HA (2001) Anti-correlation: a diversity promoting mechanism in ensemble learning. Aust J Intell Inf Process Syst 7:139–149
Google Scholar
Mitra V, Wang C-J, Banerjee S (2006) Lidar detection of underwater objects using a neuro-SVM-based architecture. IEEE Trans Neural Netw 17:717–731
Na García-Pedrajas, Ce García-Osorio (2007) Nonlinear boosting projections for ensemble construction. J Mach Learn Res 8:1–33
Na García-Pedrajas, Ortiz-Boyer D (2008) Boosting random subspace method. Neural Netw 21:1344–1362
Article MATH Google Scholar
Nguyen MH, Abbass HA, McKay R (2008) Analysis of CCME: coevolutionary dynamics, automatic problem decomposition and regularization. IEEE Trans Syst Man Cybern Part C 38:100–109
Article Google Scholar
Nicholas J, Ledwith A, Perks H (2011) New product development best practice in SME and large organisations: theory vs practice. Eur J Innov Manag 24:227–251
Article Google Scholar
Pardo C, Diez-Pastor JF, García-Osorio C, Rodríguez JJ (2013) Rotation forests for regression. Appl Math Comput 219:9914–9924
Article MathSciNet MATH Google Scholar
Qannari EM, Vigneau E, Courcoux P (1997) Clustering of variables, application in consumer and sensory studies. Food Qual Prefer 8:423–428
Article Google Scholar
Rodríguez JJ, Kuncheva LI (2006) Rotation forest : a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28:1619–1630
Article Google Scholar
Shahrabi J, Hadavandi E, Asadi S (2013) Developing a hybrid intelligent model for forecasting problems: case study of tourism demand time series. Knowl-Based Syst 43:112–122
Article Google Scholar
Sheskin DJ (2003) Handbook of parametric and non-parametric statistical procedures. CRC Press, USA
Book MATH Google Scholar
Soffritti G (1999) Hierarchical clustering of variables: a comparison among strategies of analysis. Commun Stat Simul Comput 28:977–999
Article MATH Google Scholar
Spyromitros-Xioufis E, Tsoumakas G, Groves W, Vlahavas I (2014) Multi-label classification methods for multi-target regression. arXiv:1211.6581
Tian J, Li M, Chen F, Kou J (2012) Coevolutionary learning of neural network ensemble for complex classification tasks. Pattern Recognit 45:1373–1385
Article MATH Google Scholar
Tsanas A, Xifara A (2012) Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools. Energy Build 49:560–567
Article Google Scholar
Verikas A, Kalsyte Z, Bacauskiene M, Gelzinis A (2010) Hybrid and ensemble-based soft computing techniques in bankruptcy prediction: a survey. Soft Comput 4:995–1010
Article Google Scholar
Vigneau E, Qannari E (2003) Clustering of variables around latent components. Simul Comput 32:1131–1150
Article MathSciNet MATH Google Scholar
Wang LP, Fu XJ (2005) Data mining with computational intelligence. Springer, Berlin
MATH Google Scholar
Yuksel SE, Wilson JN, Gader PD (2012) Twenty years of mixture of experts. IEEE Trans Neural Netw Learn Syst 23:1177–1193. doi:10.1109/TNNLS.2012.2200299
Article Google Scholar
Zar JH (1999) Biostatistical analysis. Prentice Hall, USA
Zhai J-h Xu, H-y Wang X-z (2012) Dynamic ensemble extreme learning machine based on sample entropy. Soft Comput 16:1493–1502
Article Google Scholar
Zho Y, Karypis G (2005) Hierarchical clustering algorithms for document datasets. Data Min Knowl Discov 10:141–168
Article MathSciNet Google Scholar

Download references

Acknowledgments

The authors express deep gratitude to Prof. Moattar Husseini and Dr. Hajirezaei for many constructive suggestions and supports. Also, The authors wish to express their gratitude to two anonymous referees for their helpful comments, which greatly helped us to improve our paper.

Author information

Authors and Affiliations

Department of Industrial Engineering, Amirkabir University of Technology, P.O. Box 15875-4413, Tehran, Iran
Esmaeil Hadavandi & Jamal Shahrabi
Department of Computer Science, Meiji University Tama-ku, Kawasaki, 214-8571, Japan
Yoichi Hayashi

Authors

Esmaeil Hadavandi
View author publications
You can also search for this author in PubMed Google Scholar
Jamal Shahrabi
View author publications
You can also search for this author in PubMed Google Scholar
Yoichi Hayashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jamal Shahrabi.

Additional information

Communicated by V. Loia.

Appendix

The Friedman test ranks the models so the best performing model gets the rank of 1, the second best rank 2, and so on. Let $r_i^j $ be the rank of the $j$th of kth models on the $i$th of $N$ observations (treatments). The Friedman test compares the average ranks of models, $R_j =\frac{1}{N}\mathop \sum \nolimits _{i=1}^N r_i^j $. Under the null hypothesis, which states that all the models are equivalent and so their ranks $R_{j}$ should be equal, the Friedman statistic:
$$\begin{aligned} \chi _F^2 =\frac{12N}{k(k+1)}\left[ {\mathop \sum \limits _{j=1}^k jR_j^2 -\frac{k(k+1)^2}{4}} \right] \end{aligned}$$
(16)
$\chi _F^2 $ is distributed according to $\chi ^2$ with $k - 1$ df.
Bonferroni–Dunn is a post hoc test that can be used after the Friedman test when it rejects the null hypothesis. This method assumes that the performance of two models is significantly different if the corresponding average ranks differ by at least the critical difference:
$$\begin{aligned} \mathrm{CD}=\frac{q_\alpha }{\sqrt{\frac{k( {k+1})}{6N}} } \end{aligned}$$
(17)
${q}_{\alpha } $ value is the critical value ${Q}{'}$ for a multiple non-parametrical comparison with a control
Holm’s test: it is a multiple comparison procedure that can work with a control model and be compared with the remaining methods. The test statistics for comparing the ith and jth method using this procedure is:
$$\begin{aligned} z=\frac{R_i -R_j }{\sqrt{\frac{k( {k+1})}{6N_{ds} }} }. \end{aligned}$$
(18)
The $z$ value is used to find the corresponding probability from the table of normal distribution, which is then compared with an appropriate level of confidence $\alpha $. In the Bonferroni–Dunn comparison, this $\alpha $ value is always $ \alpha /(k-1)$, but Holm’s test adjusts the value for $\alpha $ to compensate for multiple comparisons. Holm’s test is a step-up procedure that sequentially tests the hypothesis ordered by their significance. We will denote the ordered $p $values by p$_{1}$, p$_{2}$ $\ldots $ so that $p_1 \le p_2 \le \cdots \le p_{k-1} $. Holm’s test compares each $p_{i}$ with $ \alpha /(k-i)$, starting from the most significant $p$ value. If $p_{1}$ is below$ \alpha /(k-1)$, the corresponding hypothesis is rejected and we are allowed to compare $p_{2}$ with $\alpha /(k-2)$. If the second hypothesis is rejected, the test proceeds with the third, and so on. As soon as a certain null hypothesis cannot be rejected, all the remaining hypotheses are retained as well.
Hochberg’s procedure: it is a step-up procedure that works in the opposite direction from Holm’s method, comparing the largest $p$ value with $\alpha $, the next largest with $\alpha /2$ and so forth until it encounters a hypothesis that it can reject. All hypotheses with smaller $p$ values are then rejected as well.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hadavandi, E., Shahrabi, J. & Hayashi, Y. SPMoE: a novel subspace-projected mixture of experts model for multi-target regression problems. Soft Comput 20, 2047–2065 (2016). https://doi.org/10.1007/s00500-015-1623-7

Download citation

Published: 03 March 2015
Issue Date: May 2016
DOI: https://doi.org/10.1007/s00500-015-1623-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SPMoE: a novel subspace-projected mixture of experts model for multi-target regression problems

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Maximizing adjusted covariance: new supervised dimension reduction for classification

Learning from imbalanced data: open challenges and future directions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SPMoE: a novel subspace-projected mixture of experts model for multi-target regression problems

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Maximizing adjusted covariance: new supervised dimension reduction for classification

Learning from imbalanced data: open challenges and future directions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation