Dimension Estimation Using Autoencoders and Application

Bahadur, Nitish; Lewandowski, Brian; Paffenroth, Randy

doi:10.1007/978-981-16-3357-7_4

Nitish Bahadur¹⁸,
Brian Lewandowski¹⁹ &
Randy Paffenroth²⁰

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1395))

879 Accesses

Abstract

Dimension Estimation (DE) and Dimension Reduction (DR) are two closely related topics, but with quite different goals. In DR, one attempts to project a random vector, either linearly or nonlinearly, to a lower-dimensional space that preserves the information contained in the original higher-dimensional space. However, in DE, one attempts to estimate the intrinsic dimensionality or number of latent variables in a set of measurements of a random vector. DE and DR are closely linked because reducing the dimension to a smaller value than suggested by DE will likely lead to information loss. In particular, when considering linear methods such as Principal Component Analysis (PCA), DE and DR are often accomplished simultaneously. However, in this paper, we will focus on a particular class of deep neural networks called autoencoders (AEs), which are used extensively for DR but are less well studied for DE. We show that several important questions arise when using AEs for DE, above and beyond those that arise for more classic DR/DE techniques such as PCA. We address AE architectural choices and regularization techniques that allow one to transform AE latent layer representations into estimates of intrinsic dimension. We demonstrate the effectiveness of our techniques on synthetic, image processing benchmark problems, and most importantly, diverse applications such as the analysis of financial markets and network security.

This chapter is an extension of the conference paper with additional results (This document does not contain technology or Technical Data controlled under either the U.S. International Traffic Arms Regulations or the U.S. Export Administration Regulations).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We use the notation \(\theta \) for the nonlinear activation function instead of the more standard \(\sigma \), as \(\sigma \) is also used to denote singular values.
2.
S&P 500 is an equity index that measures the stock performance of 500 large companies listed on stock exchanges in the United States.

References

Abdulhammed, R., Faezipour, M., Musafer, H., Abuzneid, A.: Efficient network intrusion detection using PCA-based dimensionality reduction of features. In: 2019 International Symposium on Networks, Computers and Communications (ISNCC). IEEE (2019). https://doi.org/10.1109/isncc.2019.8909140
Abramowitz, M., Stegun, I.A., Romer, R.H.: Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables (1988)
Google Scholar
Albanese, C., Jackson, K., Wiberg, P.: Dimension reduction in the computation of value-at-risk. The Journal of Risk Finance 3(4), 41–53 (2002)
Article Google Scholar
Alexeev, V., Tapon, F.: Equity portfolio diversification: how many stocks are enough? evidence from five developed markets. Evidence from Five Developed Markets, 28 November 2012. FIRN Research Paper (2012)
Google Scholar
Bahadur, N., Paffenroth, R., Gajamannage, K.: Dimenslon estlmatlon of equlty markets. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 5491–5498. IEEE (2019)
Google Scholar
Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Network anomaly detection: methods, systems and tools. IEEE Commun. Surv. & Tutor. 16(1), 303–336 (2014). https://doi.org/10.1109/surv.2013.052213.00046
Buczak, A.L., Guven, E.: A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Communications Surveys Tutorials 18(2), 1153–1176 (2016). https://doi.org/10.1109/COMST.2015.2494502
Article Google Scholar
Elsayed, M.S., Le-Khac, N.A., Dev, S., Jurcut, A.D.: Ddosnet: a deep-learning model for detecting network attacks (2020)
Google Scholar
Evans, J.L., Archer, S.H.: Diversification and the reduction of dispersion: An empirical analysis. J. Financ. 23(5), 761–767 (1968)
Google Scholar
Golub, G., Reinsch, C.: Singular value decomposition and least squares solutions. In: Numerische Mathematik, p. 18. Springer (1970)
Google Scholar
Hotelling, H.: Analysis of a complex of statistical variables into principal components. Journal of educational psychology 24(6), 417 (1933)
Article Google Scholar
Hoyle, B., Rau, M.M., Paech, K., Bonnett, C., Seitz, S., Weller, J.: Anomaly detection for machine learning redshifts applied to sdss galaxies. Mon. Not. R. Astron. Soc. 452(4), 4183–4194 (2015)
Article Google Scholar
Jolliffe, I.: Principal Component Analysis. Wiley Online Library (2002)
Google Scholar
LeCun, Y.: The mnist database of handwritten digits (1998). http://yann.lecun.com/exdb/mnist/
Lee, I.: Big data: Dimensions, evolution, impacts, and challenges. Bus. Horiz. 60(3), 293–303 (2017)
Article Google Scholar
Lee, J.A., Verleysen, M.: Nonlinear Dimensionality Reduction. Springer Science & Business Media (2007)
Google Scholar
Li, C., Farkhoor, H., Liu, R., Yosinski, J.: Measuring the intrinsic dimension of objective landscapes (2018). arXiv:1804.08838
Mira, J., Sandoval, F.: From Natural to Artificial Neural Computation: International Workshop on Artificial Neural Networks, Malaga-Torremolinos, Spain, 7–9 June 1995: Proceedings, vol. 930. Springer Science & Business Media (1995)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)
Google Scholar
Nassirtoussi, A.K., Aghabozorgi, S., Wah, T.Y., Ngo, D.C.L.: Text mining of news-headlines for forex market prediction: A multi-layer dimension reduction algorithm with semantics and sentiment. Expert Syst. Appl. 42(1), 306–324 (2015)
Article Google Scholar
Ng, A., et al.: Sparse autoencoder. CS294A Lecture Notes 72(2011), 1–19 (2011)
Google Scholar
Parsons, T.L., Rogers, T.: Dimension reduction for stochastic dynamical systems forced onto a manifold by large drift: a constructive approach with examples from theoretical biology. J. Phys. A: Math. Theor. 50(41), 415601 (2017)
Article MathSciNet Google Scholar
Plaut, E.: From principal subspaces to principal components with linear autoencoders (2018). arXiv:1804.10253
Rathnayaka, R., Wang, Z., Seneviratna, D., Nagahawatta, S.: An econometric evaluation of Colombo stock exchange: evidence from ARMA & PCA approach. In: Proceedings of the 2nd International Conference on Management and Economics, p. 10 (2013)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. Technicl report, California Univ San Diego La Jolla Inst for Cognitive Science (1985)
Google Scholar
Samaria, F.S., Harter, A.C.: Parameterisation of a stochastic model for human face identification. In: Proceedings of 1994 IEEE Workshop on Applications of Computer Vision, pp. 138–142. IEEE (1994)
Google Scholar
Sharafaldin, I., Lashkari, A.H., Ghorbani, A.A.: Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: Proceedings of the 4th International Conference on Information Systems Security and Privacy. SCITEPRESS - Science and Technology Publications (2018). https://doi.org/10.5220/0006639801080116
Statman, M.: How many stocks make a diversified portfolio? Journal of financial and quantitative analysis 22(3), 353–363 (1987)
Article Google Scholar
Tang, G.Y.: How efficient is naive portfolio diversification? an educational note. Omega 32(2), 155–160 (2004)
Article Google Scholar
Van Der Maaten, L., Postma, E., Van den Herik, J.: Dimensionality reduction: a comparative. J Mach Learn Res 10(66–71), 13 (2009)
Google Scholar
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103. ACM (2008)
Google Scholar
Wang, X.: On the effects of dimension reduction techniques on some high-dimensional problems in finance. Operations Research 54(6), 1063–1078 (2006)
Article MathSciNet Google Scholar
Wang, Y., Yao, H., Zhao, S.: Auto-encoder based dimensionality reduction. Neurocomputing 184, 232–242 (2016)
Article Google Scholar
Wani, M.A., Bhat, F.A., Afzal, S., Khan, A.I.: Advances in Deep Learning. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-6794-6
Wani, M.A., Kantardzic, M., Sayed-Mouchaweh, M. (eds.): Deep Learning Applications. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-1816-4
Wani, M.A., Khoshgoftaar, T.M., Palade, V. (eds.): Deep Learning Applications, vol. 2. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-6759-9
Wylie, C.R., Barrett, L.C., Wylie, C.R.: Advanced Engineering Mathematics (1960)
Google Scholar
Zhao, W., Du, S.: Spectral-spatial feature extraction for hyperspectral image classification: A dimension reduction and deep learning approach. IEEE Trans. Geosci. Remote Sens. 54(8), 4544–4554 (2016)
Article Google Scholar
Zhou, C., Paffenroth, R.C.: Anomaly detection with robust deep autoencoders. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 665–674 (2017)
Google Scholar

Download references

Acknowledgements

Results in this paper were obtained in part using a high-performance computing system acquired through NSF MRI grant DMS-1337943 to WPI.

Author information

Authors and Affiliations

Worcester Polytechnic Institute, Data Science, 100 Institute Rd, Worcester, MA, 01609, USA
Nitish Bahadur
Worcester Polytechnic Institute, Computer Science, 100 Institute Rd, Worcester, MA, 01609, USA
Brian Lewandowski
Worcester Polytechnic Institute, Mathematical Sciences, Computer Science & Data Science, 100 Institute Rd, Worcester, MA, 01609, USA
Randy Paffenroth

Authors

Nitish Bahadur
View author publications
You can also search for this author in PubMed Google Scholar
Brian Lewandowski
View author publications
You can also search for this author in PubMed Google Scholar
Randy Paffenroth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Randy Paffenroth .

Editor information

Editors and Affiliations

Department of Computer Science, University of Kashmir, Srinagar, India
M. Arif Wani
Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA
Bhiksha Raj
Clemson University, Clemson, SC, USA
Feng Luo
University of Oregon, Eugene, OR, USA
Dejing Dou

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 190806 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bahadur, N., Lewandowski, B., Paffenroth, R. (2022). Dimension Estimation Using Autoencoders and Application. In: Wani, M.A., Raj, B., Luo, F., Dou, D. (eds) Deep Learning Applications, Volume 3. Advances in Intelligent Systems and Computing, vol 1395. Springer, Singapore. https://doi.org/10.1007/978-981-16-3357-7_4

Download citation

DOI: https://doi.org/10.1007/978-981-16-3357-7_4
Published: 13 November 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-3356-0
Online ISBN: 978-981-16-3357-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics