Novel Automatic Filter-Class Feature Selection for Machine Learning Regression

Wollsen, Morten Gill; Hallam, John; Jørgensen, Bo Nørregaard

doi:10.1007/978-3-319-47898-2_8

Morten Gill Wollsen⁷,
John Hallam⁸ &
Bo Nørregaard Jørgensen⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 529))

Included in the following conference series:

INNS Conference on Big Data

2294 Accesses
2 Citations

Abstract

With the increased focus on application of Big Data in all sectors of society, the performance of machine learning becomes essential. Efficient machine learning depends on efficient feature selection algorithms. Filter feature selection algorithms are model-free and therefore very fast, but require a threshold to function. We have created a novel meta-filter automatic feature selection, Ranked Distinct Elitism Selection Filter (RDESF) which is fully automatic and is composed of five common filters and a distinct selection process.

To test the performance and speed of RDESF it will be benchmarked against 4 other common automatic feature selection algorithms: Backward selection, forward selection, NLPCA and PCA as well as using no algorithms at all. The benchmarking will be performed through two experiments with two different data sets that are both time-series regression-based problems. The prediction will be performed by a Multilayer Perceptron (MLP).

Our results show that RDESF is a strong competitor and allows for a fully automatic feature selection system using filters. RDESF was only outperformed by forward selection, which was expected as it is a wrapper which includes the prediction model in the feature selection process. PCA is often used in machine learning litterature and can be considered the default feature selection method. RDESF outperformed PCA in both experiments in both prediction error and computational speed. RDESF is a new step into filter-based automatic feature selection algorithms that can be used for many different applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cavanaugh, J.E.: Unifying the derivations for the akaike and corrected akaike information criteria. Stat. Probab. Lett. 33(2), 201–208 (1997)
Article MathSciNet MATH Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)
Book MATH Google Scholar
Granger, C.: Some recent development in a concept of causality. J. Econometrics 39, 199–211 (1988)
Article MathSciNet Google Scholar
Guttman, L.: Some necessary conditions for common-factor analysis. Psychometrika 19(2), 149–161 (1954)
Article MathSciNet MATH Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)
Article Google Scholar
Kramer, M.A.: Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 37(2), 233–243 (1991)
Article Google Scholar
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Nautical Almanac Office: Almanac for Computers 1990. U.S Government Printing Office (1989)
Google Scholar
Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: IEEE International Conference on Neural Networks, pp. 586–591. IEEE (1993)
Google Scholar
May, R., Dandy, G., Maier, H.: Review of Input Variable Selection Methods for Artificial Neural Networks. InTech, April 2011
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Cogn. Model. 5, 1 (1988)
Google Scholar
Shanmugan, K.S., Breipohl, A.M.: Random Signals: Detection, Estimation, and Data Analysis. Wiley, New York (1988)
Google Scholar
Shannon, C.E.: A mathematical theory of communication. SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (2001)
Article MathSciNet Google Scholar
Shlens, J.: A tutorial on principal component analysis. arXiv preprint arXiv:1404.1100 (2014)
Wang, P.G., Mikael, S., Nielsen, K.P., Wittchen, K.B., Kern-Hansen, C.: Reference climate dataset for technical dimensioning in building, construction and other sectors. DMI Technical reports (2013)
Google Scholar
Whiteson, S., Stone, P., Stanley, K.O., Miikkulainen, R., Kohl, N.: Automatic feature selection in neuroevolution. In: GECCO (2005)
Google Scholar
Zamora-Martínez, F., Romeu, P., Botella-Rocamora, P., Pardo, J.: On-line learning of indoor temperature forecasting models towards energy efficiency. Energy Build. 83, 162–172 (2014)
Article Google Scholar
Zhang, G., Patuwo, B.E., Hu, M.Y.: Forecasting with artificial neural networks: the state of the art. Int. J. Forecast. 14(1), 35–62 (1998)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Center for Energy Informatics, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
Morten Gill Wollsen & Bo Nørregaard Jørgensen
Center for BioRobotics, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
John Hallam

Authors

Morten Gill Wollsen
View author publications
You can also search for this author in PubMed Google Scholar
John Hallam
View author publications
You can also search for this author in PubMed Google Scholar
Bo Nørregaard Jørgensen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Morten Gill Wollsen .

Editor information

Editors and Affiliations

School of Computing and Communications, Lancaster University , Lancaster, United Kingdom
Plamen Angelov
Data Engineering Lab, Dept. of Informatics, Aristotle University of Thessaloniki , Thessaloniki, Greece
Yannis Manolopoulos
Lab of Forest Informatics (FiLAB), Democritus University of Thrace , Orestiada, Greece
Lazaros Iliadis
WPC Information Systems Faculty, Arizona State University , Tempe, Arizona, USA
Asim Roy
Electrical Engineering Dept, (ICA), Pontifical Catholic Univ of Rio de Janei , Rio de Janeiro, Rio de Janeiro, Brazil
Marley Vellasco

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wollsen, M.G., Hallam, J., Jørgensen, B.N. (2017). Novel Automatic Filter-Class Feature Selection for Machine Learning Regression. In: Angelov, P., Manolopoulos, Y., Iliadis, L., Roy, A., Vellasco, M. (eds) Advances in Big Data. INNS 2016. Advances in Intelligent Systems and Computing, vol 529. Springer, Cham. https://doi.org/10.1007/978-3-319-47898-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-47898-2_8
Published: 08 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47897-5
Online ISBN: 978-3-319-47898-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics