Abstract
High Data Dimensionality Reduction (HDDR) removes the irrelevant features in a complex dataset and incorporates various techniques that could be used to foretell the research outcomes in a predictive model. The major objective of the paper is to analyse and survey the various models on the basis of HDDR and its feature pre-processing methods applied in financial dataset predictions. Numerous techniques of data mining and its strategies were discussed and assessed to ascertain the importance of augmenting the performance of the financial dataset with classifiers. The pre-processing techniques applied in various research works and their outcomes are highlighted. The HDDR methods used in financial prediction of Small Medium Enterprises (SMEs) are studied for existing frameworks and models by different authors. The paper encapsulates the gist of the models, frameworks and algorithms involved in effective elimination of irrelevant features and extraction of best features for best prediction of financial datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chhikara P, Jain N, Tekchandani R, Kumar N (2022) Data dimensionality reduction techniques for Industry 4.0: Research results, challenges, and future research directions. Soft Pract Experience 52(3):658–688
Ray P, Reddy SS, Banerjee T (2021) Various dimension reduction techniques for high dimensional data analysis: a review. Artif Intell Rev 54(5):3473–3515
Thrun MC, Ultsch A (2021) Using projection-based clustering to find distance-and density-based clusters in high-dimensional data. J Classif 38(2):280–312
Ayesha S, Hanif MK, Talib R (2020) Overview and comparative study of dimensionality reduction techniques for high dimensional data. Inf Fusion 59:44–58. https://doi.org/10.1016/j.inffus.2020.01.005
Xu X, Liang T, Zhu J, Zheng D, Sun T (2019) Review of classical dimensionality reduction and sample selection methods for large-scale data processing. Neurocomputing 328:5–15. https://doi.org/10.1016/j.neucom.2018.02.100
Espadoto M, Martins RM, Kerren A, Hirata NS, Telea AC (2019) Toward a quantitative survey of dimension reduction techniques. IEEE Trans Visual Comput Graphics 27(3):2153–2173. https://doi.org/10.1109/TVCG.2019.2944182
Chormunge S, Jena S (2018) Correlation based feature selection with clustering for high dimensional data. J Electr Syst Inf Technol 5(3):542–549. https://doi.org/10.1016/j.jesit.2017.06.004
Solorio-Fernández S, Carrasco-Ochoa JA, MartÃnez-Trinidad JF (2020) A review of unsupervised feature selection methods. Artif Intell Rev 53(2):907–948. https://doi.org/10.1007/s10462-019-09682-y
Nguyen BH, Xue B, Zhang M (2020) A survey on swarm intelligence approaches to feature selection in data mining. Swarm Evol Comput 54:100663. https://doi.org/10.1016/j.swevo.2020.100663
Nilashi M, Ibrahim O, Bagherifard K (2018) A recommender system based on collaborative filtering using ontology and dimensionality reduction techniques. Expert Syst Appl 92:507–520. https://doi.org/10.1016/j.eswa.2017.09.058
Thudumu S, Branch P, Jin J, Singh JJ (2020) A comprehensive survey of anomaly detection techniques for high dimensional big data. J Big Data 7(1):1–30. https://doi.org/10.1186/s40537-020-00320-x
Pes B (2020) Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains. Neural Comput Appl 32(10):5951–5973. https://doi.org/10.1007/s00521-019-04082-3
Liu Z, Lai Z, Ou W, Zhang K, Zheng R (2020) Structured optimal graph based sparse feature extraction for semi-supervised learning. Signal Process 170:107456. https://doi.org/10.1016/j.sigpro.2020.107456
Nkwabi J, Mboya L (2019) A review of factors affecting the growth of small and medium enterprises (SMEs) in Tanzania. Eur J Bus Manage 11(33):1–8. https://doi.org/10.7176/EJBM/11-33-01
Rao P, Kumar S, Madhavan V (2019) A study on factors driving the capital structure decisions of small and medium enterprises (SMEs) in India. IIMB Manage Rev 31(1):37–50. https://doi.org/10.1016/j.iimb.2018.08.010
Lin FJ, Ho CW (2019) The knowledge of entry mode decision for small and medium enterprises. J Innov Knowl 4(1):32–37. https://doi.org/10.1016/j.jik.2018.02.001
Nasution MI, Fahmi M, Prayogi MA (2020) The quality of small and medium enterprises performance using the structural equation model-part least square (SEM-PLS). J Phys: Conf Ser 1477(5):052052. IOP Publishing
Papadopoulos T, Baltas KN, Balta ME (2020) The use of digital technologies by small and medium enterprises during COVID-19: implications for theory and practice. Int J Inf Manage 55:102192. https://doi.org/10.1016/j.ijinfomgt.2020.102192
Hanggraeni D, Åšlusarczyk B, Sulung LAK, Subroto A (2019) The impact of internal, external and enterprise risk management on the performance of micro, small and medium enterprises. Sustainability 11(7):2172. https://doi.org/10.3390/su11072172
Abbas J, Zhang Q, Hussain I, Akram S, Afaq A, Shad MA (2020) Sustainable innovation in small medium enterprises: the impact of knowledge management on organizational innovation through a mediation analysis by using SEM approach. Sustainability 12(6):2407. https://doi.org/10.3390/su12062407
Malakauskas A, Lakštutienė A (2021) Financial distress prediction for small and medium enterprises using machine learning techniques. Eng Econ 32(1):4–14. https://doi.org/10.5755/j01.ee.32.1.27382
Ciampi F, Giannozzi A, Marzi G, Altman EI (2021) Rethinking SME default prediction: a systematic literature review and future perspectives. Scientometrics 126(3):2141–2188. https://doi.org/10.1007/s11192-020-03856-0
Kou G, Xu Y, Peng Y, Shen F, Chen Y, Chang K, Kou S (2021) Bankruptcy prediction for SMEs using transactional data and two-stage multiobjective feature selection. Decis Support Syst 140:113429. https://doi.org/10.1016/j.dss.2020.113429
Sun J, Fujita H, Zheng Y, Ai W (2021) Multi-class financial distress prediction based on support vector machines integrated with the decomposition and fusion methods. Inf Sci 559:153–170. https://doi.org/10.1016/j.ins.2021.01.059
Sankhwar S, Gupta D, Ramya KC, Sheeba Rani S, Shankar K, Lakshmanaprabu SK (2020) Improved grey wolf optimization-based feature subset selection with fuzzy neural classifier for financial crisis prediction. Soft Comput 24(1):101–110. https://doi.org/10.1007/s00500-019-04323-6
Shang H, Lu D, Zhou Q (2021) Early warning of enterprise finance risk of big data mining in internet of things based on fuzzy association rules. Neural Comput Appl 33(9):3901–3909. https://doi.org/10.1007/s00521-020-05510-5
Uthayakumar J, Metawa N, Shankar K, Lakshmanaprabu SK (2020) Financial crisis prediction model using ant colony optimization. Int J Inf Manage 50:538–556. https://doi.org/10.1016/j.ijinfomgt.2018.12.001
Subasi A, Cankurt S (2019) Prediction of default payment of credit card clients using data mining techniques. In: 2019 International engineering conference (IEC), pp 115–120. IEEE. https://doi.org/10.1109/IEC47844.2019.8950597
Laborda R, Olmo J (2021) Volatility spillover between economic sectors in financial crisis prediction: evidence spanning the great financial crisis and covid-19 pandemic. Res Int Bus Financ 57:101402. https://doi.org/10.1016/j.ribaf.2021.101402
Jabeur SB, Sadaaoui A, Sghaier A, Aloui R (2020) Machine learning models and cost-sensitive decision trees for bond rating prediction. J Oper Res Soc 71(8):1161–1179. https://doi.org/10.1080/01605682.2019.1581405
Kim S, Ku S, Chang W, Song JW (2020) Predicting the direction of US stock prices using effective transfer entropy and machine learning techniques. IEEE Access 8:111660–111682. https://doi.org/10.1109/ACCESS.2020.3002174
Cheng KC, Huang MJ, Fu CK, Wang KH, Wang HM, Lin LH (2021) Establishing a multiple-criteria decision-making model for stock investment decisions using data mining techniques. Sustainability 13(6):3100. https://doi.org/10.3390/su13063100
Jan CL (2021) Financial information asymmetry: using deep learning algorithms to predict financial distress. Symmetry 13(3):443. https://doi.org/10.3390/sym13030443
Moradi S, Rafiei FM (2019) A dynamic credit risk assessment model with data mining techniques: evidence from Iranian banks. Financ Innov 5(1):1–27. https://doi.org/10.1186/s40854-019-0121-9
Van Nguyen T, Zhou L, Chong AYL, Li B, Pu X (2020) Predicting customer demand for remanufactured products: a data-mining approach. Eur J Oper Res 281(3):543–558. https://doi.org/10.1016/j.ejor.2019.08.015
Jahangir H, Tayarani H, Baghali S, Ahmadian A, Elkamel A, Golkar MA, Castilla M (2019) A novel electricity price forecasting approach based on dimension reduction strategy and rough artificial neural networks. IEEE Trans Industr Inf 16(4):2369–2381. https://doi.org/10.1109/TII.2019.2933009
Bai Y, Sun Z, Zeng B, Long J, Li L, de Oliveira JV, Li C (2019) A comparison of dimension reduction techniques for support vector machine modeling of multi-parameter manufacturing quality prediction. J Intell Manuf 30(5):2245–2256. https://doi.org/10.1007/s10845-017-1388-1
Huang H, Shi G, He H, Duan Y, Luo F (2019) Dimensionality reduction of hyperspectral imagery based on spatial–spectral manifold learning. IEEE Trans Cybern 50(6):2604–2616. https://doi.org/10.1109/TCYB.2019.2905793
Ding J, Condon A, Shah SP (2018) Interpretable dimensionality reduction of single cell transcriptome data with deep generative models. Nat Commun 9(1):1–13. https://doi.org/10.1038/s41467-018-04368-5
Aydadenta H, Adiwijaya A (2018) A clustering approach for feature selection in microarray data classification using random forest. J Inf Process Syst 14(5):1167–1175. https://doi.org/10.3745/JIPS.04.0087
Selvakumar B, Muneeswaran K (2019) Firefly algorithm-based feature selection for network intrusion detection. Comput Secur 81:148–155. https://doi.org/10.1016/j.cose.2018.11.005
Zhang J, Yu J, Tao D (2018) Local deep-feature alignment for unsupervised dimension reduction. IEEE Trans Image Process 27(5):2420–2432. https://doi.org/10.1109/TIP.2018.2804218
Abdulhammed R, Musafer H, Alessa A, Faezipour M, Abuzneid A (2019) Features dimensionality reduction approaches for machine learning based network intrusion detection. Electronics 8(3):322. https://doi.org/10.3390/electronics8030322
Becht E, McInnes L, Healy J, Dutertre CA, Kwok IW, Ng LG, Ginhoux F, Newell EW (2019) Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 37(1):38–44. https://doi.org/10.1038/nbt.4314
Salo F, Nassif AB, Essex A (2019) Dimensionality reduction with IG-PCA and ensemble classifier for network intrusion detection. Comput Netw 148:164–175. https://doi.org/10.1016/j.comnet.2018.11.010
Townes FW, Hicks SC, Aryee MJ, Irizarry RA (2019) Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol 20(1):1–16. https://doi.org/10.1186/s13059-019-1861-6
Ali F, El-Sappagh S, Islam SR, Ali A, Attique M, Imran M, Kwak KS (2021) An intelligent healthcare monitoring framework using wearable sensors and social networking data. Futur Gener Comput Syst 114:23–43. https://doi.org/10.1016/j.future.2020.07.047
Elezaj O, Yayilgan SY, Abomhara M, Yeng P, Ahmed J (2019) Data-driven intrusion detection system for small and medium enterprises. In: 2019 IEEE 24th international workshop on computer aided modeling and design of communication links and networks (CAMAD), pp 1–7. IEEE. https://doi.org/10.1109/CAMAD.2019.8858166
Stjepić AM, Pejić Bach M, Bosilj Vukšić V (2021) Exploring risks in the adoption of business intelligence in SMEs using the TOE framework. J Risk Financ Manage 14(2):58. https://doi.org/10.3390/jrfm14020058
Gao G, Wang H, Gao P (2021) Establishing a credit risk evaluation system for SMEs using the soft voting fusion model. Risks 9(11):202. https://doi.org/10.3390/risks9110202
Shakya S, Smys S (2021) Big data analytics for improved risk management and customer segregation in banking applications. J ISMAC 3(03):235–249. https://doi.org/10.36548/jismac.2021.3.005
Suma V (2019) Towards sustainable industrialization using big data and internet of things. J ISMAC 1(01):24–37. https://doi.org/10.36548/jismac.2019.1.003
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Mahalingam, R., Jayanthi, K. (2023). Impact of High Dimensionality Reduction in Financial Datasets of SMEs with Feature Pre-processing in Data Mining. In: Bindhu, V., Tavares, J.M.R.S., Vuppalapati, C. (eds) Proceedings of Fourth International Conference on Communication, Computing and Electronics Systems . Lecture Notes in Electrical Engineering, vol 977. Springer, Singapore. https://doi.org/10.1007/978-981-19-7753-4_29
Download citation
DOI: https://doi.org/10.1007/978-981-19-7753-4_29
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7752-7
Online ISBN: 978-981-19-7753-4
eBook Packages: EngineeringEngineering (R0)