Skip to main content

Impact of High Dimensionality Reduction in Financial Datasets of SMEs with Feature Pre-processing in Data Mining

  • Conference paper
  • First Online:
Proceedings of Fourth International Conference on Communication, Computing and Electronics Systems

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 977))

  • 529 Accesses

Abstract

High Data Dimensionality Reduction (HDDR) removes the irrelevant features in a complex dataset and incorporates various techniques that could be used to foretell the research outcomes in a predictive model. The major objective of the paper is to analyse and survey the various models on the basis of HDDR and its feature pre-processing methods applied in financial dataset predictions. Numerous techniques of data mining and its strategies were discussed and assessed to ascertain the importance of augmenting the performance of the financial dataset with classifiers. The pre-processing techniques applied in various research works and their outcomes are highlighted. The HDDR methods used in financial prediction of Small Medium Enterprises (SMEs) are studied for existing frameworks and models by different authors. The paper encapsulates the gist of the models, frameworks and algorithms involved in effective elimination of irrelevant features and extraction of best features for best prediction of financial datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chhikara P, Jain N, Tekchandani R, Kumar N (2022) Data dimensionality reduction techniques for Industry 4.0: Research results, challenges, and future research directions. Soft Pract Experience 52(3):658–688

    Google Scholar 

  2. Ray P, Reddy SS, Banerjee T (2021) Various dimension reduction techniques for high dimensional data analysis: a review. Artif Intell Rev 54(5):3473–3515

    Article  Google Scholar 

  3. Thrun MC, Ultsch A (2021) Using projection-based clustering to find distance-and density-based clusters in high-dimensional data. J Classif 38(2):280–312

    Article  MathSciNet  MATH  Google Scholar 

  4. Ayesha S, Hanif MK, Talib R (2020) Overview and comparative study of dimensionality reduction techniques for high dimensional data. Inf Fusion 59:44–58. https://doi.org/10.1016/j.inffus.2020.01.005

    Article  Google Scholar 

  5. Xu X, Liang T, Zhu J, Zheng D, Sun T (2019) Review of classical dimensionality reduction and sample selection methods for large-scale data processing. Neurocomputing 328:5–15. https://doi.org/10.1016/j.neucom.2018.02.100

    Article  Google Scholar 

  6. Espadoto M, Martins RM, Kerren A, Hirata NS, Telea AC (2019) Toward a quantitative survey of dimension reduction techniques. IEEE Trans Visual Comput Graphics 27(3):2153–2173. https://doi.org/10.1109/TVCG.2019.2944182

    Article  Google Scholar 

  7. Chormunge S, Jena S (2018) Correlation based feature selection with clustering for high dimensional data. J Electr Syst Inf Technol 5(3):542–549. https://doi.org/10.1016/j.jesit.2017.06.004

    Article  Google Scholar 

  8. Solorio-Fernández S, Carrasco-Ochoa JA, Martínez-Trinidad JF (2020) A review of unsupervised feature selection methods. Artif Intell Rev 53(2):907–948. https://doi.org/10.1007/s10462-019-09682-y

    Article  Google Scholar 

  9. Nguyen BH, Xue B, Zhang M (2020) A survey on swarm intelligence approaches to feature selection in data mining. Swarm Evol Comput 54:100663. https://doi.org/10.1016/j.swevo.2020.100663

    Article  Google Scholar 

  10. Nilashi M, Ibrahim O, Bagherifard K (2018) A recommender system based on collaborative filtering using ontology and dimensionality reduction techniques. Expert Syst Appl 92:507–520. https://doi.org/10.1016/j.eswa.2017.09.058

    Article  Google Scholar 

  11. Thudumu S, Branch P, Jin J, Singh JJ (2020) A comprehensive survey of anomaly detection techniques for high dimensional big data. J Big Data 7(1):1–30. https://doi.org/10.1186/s40537-020-00320-x

    Article  Google Scholar 

  12. Pes B (2020) Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains. Neural Comput Appl 32(10):5951–5973. https://doi.org/10.1007/s00521-019-04082-3

    Article  Google Scholar 

  13. Liu Z, Lai Z, Ou W, Zhang K, Zheng R (2020) Structured optimal graph based sparse feature extraction for semi-supervised learning. Signal Process 170:107456. https://doi.org/10.1016/j.sigpro.2020.107456

    Article  Google Scholar 

  14. Nkwabi J, Mboya L (2019) A review of factors affecting the growth of small and medium enterprises (SMEs) in Tanzania. Eur J Bus Manage 11(33):1–8. https://doi.org/10.7176/EJBM/11-33-01

    Article  Google Scholar 

  15. Rao P, Kumar S, Madhavan V (2019) A study on factors driving the capital structure decisions of small and medium enterprises (SMEs) in India. IIMB Manage Rev 31(1):37–50. https://doi.org/10.1016/j.iimb.2018.08.010

    Article  Google Scholar 

  16. Lin FJ, Ho CW (2019) The knowledge of entry mode decision for small and medium enterprises. J Innov Knowl 4(1):32–37. https://doi.org/10.1016/j.jik.2018.02.001

    Article  Google Scholar 

  17. Nasution MI, Fahmi M, Prayogi MA (2020) The quality of small and medium enterprises performance using the structural equation model-part least square (SEM-PLS). J Phys: Conf Ser 1477(5):052052. IOP Publishing

    Google Scholar 

  18. Papadopoulos T, Baltas KN, Balta ME (2020) The use of digital technologies by small and medium enterprises during COVID-19: implications for theory and practice. Int J Inf Manage 55:102192. https://doi.org/10.1016/j.ijinfomgt.2020.102192

    Article  Google Scholar 

  19. Hanggraeni D, Åšlusarczyk B, Sulung LAK, Subroto A (2019) The impact of internal, external and enterprise risk management on the performance of micro, small and medium enterprises. Sustainability 11(7):2172. https://doi.org/10.3390/su11072172

    Article  Google Scholar 

  20. Abbas J, Zhang Q, Hussain I, Akram S, Afaq A, Shad MA (2020) Sustainable innovation in small medium enterprises: the impact of knowledge management on organizational innovation through a mediation analysis by using SEM approach. Sustainability 12(6):2407. https://doi.org/10.3390/su12062407

    Article  Google Scholar 

  21. Malakauskas A, Lakštutienė A (2021) Financial distress prediction for small and medium enterprises using machine learning techniques. Eng Econ 32(1):4–14. https://doi.org/10.5755/j01.ee.32.1.27382

    Article  Google Scholar 

  22. Ciampi F, Giannozzi A, Marzi G, Altman EI (2021) Rethinking SME default prediction: a systematic literature review and future perspectives. Scientometrics 126(3):2141–2188. https://doi.org/10.1007/s11192-020-03856-0

    Article  Google Scholar 

  23. Kou G, Xu Y, Peng Y, Shen F, Chen Y, Chang K, Kou S (2021) Bankruptcy prediction for SMEs using transactional data and two-stage multiobjective feature selection. Decis Support Syst 140:113429. https://doi.org/10.1016/j.dss.2020.113429

    Article  Google Scholar 

  24. Sun J, Fujita H, Zheng Y, Ai W (2021) Multi-class financial distress prediction based on support vector machines integrated with the decomposition and fusion methods. Inf Sci 559:153–170. https://doi.org/10.1016/j.ins.2021.01.059

    Article  MathSciNet  Google Scholar 

  25. Sankhwar S, Gupta D, Ramya KC, Sheeba Rani S, Shankar K, Lakshmanaprabu SK (2020) Improved grey wolf optimization-based feature subset selection with fuzzy neural classifier for financial crisis prediction. Soft Comput 24(1):101–110. https://doi.org/10.1007/s00500-019-04323-6

    Article  Google Scholar 

  26. Shang H, Lu D, Zhou Q (2021) Early warning of enterprise finance risk of big data mining in internet of things based on fuzzy association rules. Neural Comput Appl 33(9):3901–3909. https://doi.org/10.1007/s00521-020-05510-5

    Article  Google Scholar 

  27. Uthayakumar J, Metawa N, Shankar K, Lakshmanaprabu SK (2020) Financial crisis prediction model using ant colony optimization. Int J Inf Manage 50:538–556. https://doi.org/10.1016/j.ijinfomgt.2018.12.001

    Article  Google Scholar 

  28. Subasi A, Cankurt S (2019) Prediction of default payment of credit card clients using data mining techniques. In: 2019 International engineering conference (IEC), pp 115–120. IEEE. https://doi.org/10.1109/IEC47844.2019.8950597

  29. Laborda R, Olmo J (2021) Volatility spillover between economic sectors in financial crisis prediction: evidence spanning the great financial crisis and covid-19 pandemic. Res Int Bus Financ 57:101402. https://doi.org/10.1016/j.ribaf.2021.101402

    Article  Google Scholar 

  30. Jabeur SB, Sadaaoui A, Sghaier A, Aloui R (2020) Machine learning models and cost-sensitive decision trees for bond rating prediction. J Oper Res Soc 71(8):1161–1179. https://doi.org/10.1080/01605682.2019.1581405

    Article  Google Scholar 

  31. Kim S, Ku S, Chang W, Song JW (2020) Predicting the direction of US stock prices using effective transfer entropy and machine learning techniques. IEEE Access 8:111660–111682. https://doi.org/10.1109/ACCESS.2020.3002174

    Article  Google Scholar 

  32. Cheng KC, Huang MJ, Fu CK, Wang KH, Wang HM, Lin LH (2021) Establishing a multiple-criteria decision-making model for stock investment decisions using data mining techniques. Sustainability 13(6):3100. https://doi.org/10.3390/su13063100

    Article  Google Scholar 

  33. Jan CL (2021) Financial information asymmetry: using deep learning algorithms to predict financial distress. Symmetry 13(3):443. https://doi.org/10.3390/sym13030443

    Article  Google Scholar 

  34. Moradi S, Rafiei FM (2019) A dynamic credit risk assessment model with data mining techniques: evidence from Iranian banks. Financ Innov 5(1):1–27. https://doi.org/10.1186/s40854-019-0121-9

    Article  Google Scholar 

  35. Van Nguyen T, Zhou L, Chong AYL, Li B, Pu X (2020) Predicting customer demand for remanufactured products: a data-mining approach. Eur J Oper Res 281(3):543–558. https://doi.org/10.1016/j.ejor.2019.08.015

    Article  Google Scholar 

  36. Jahangir H, Tayarani H, Baghali S, Ahmadian A, Elkamel A, Golkar MA, Castilla M (2019) A novel electricity price forecasting approach based on dimension reduction strategy and rough artificial neural networks. IEEE Trans Industr Inf 16(4):2369–2381. https://doi.org/10.1109/TII.2019.2933009

    Article  Google Scholar 

  37. Bai Y, Sun Z, Zeng B, Long J, Li L, de Oliveira JV, Li C (2019) A comparison of dimension reduction techniques for support vector machine modeling of multi-parameter manufacturing quality prediction. J Intell Manuf 30(5):2245–2256. https://doi.org/10.1007/s10845-017-1388-1

    Article  Google Scholar 

  38. Huang H, Shi G, He H, Duan Y, Luo F (2019) Dimensionality reduction of hyperspectral imagery based on spatial–spectral manifold learning. IEEE Trans Cybern 50(6):2604–2616. https://doi.org/10.1109/TCYB.2019.2905793

    Article  Google Scholar 

  39. Ding J, Condon A, Shah SP (2018) Interpretable dimensionality reduction of single cell transcriptome data with deep generative models. Nat Commun 9(1):1–13. https://doi.org/10.1038/s41467-018-04368-5

    Article  Google Scholar 

  40. Aydadenta H, Adiwijaya A (2018) A clustering approach for feature selection in microarray data classification using random forest. J Inf Process Syst 14(5):1167–1175. https://doi.org/10.3745/JIPS.04.0087

    Article  Google Scholar 

  41. Selvakumar B, Muneeswaran K (2019) Firefly algorithm-based feature selection for network intrusion detection. Comput Secur 81:148–155. https://doi.org/10.1016/j.cose.2018.11.005

    Article  Google Scholar 

  42. Zhang J, Yu J, Tao D (2018) Local deep-feature alignment for unsupervised dimension reduction. IEEE Trans Image Process 27(5):2420–2432. https://doi.org/10.1109/TIP.2018.2804218

    Article  MathSciNet  MATH  Google Scholar 

  43. Abdulhammed R, Musafer H, Alessa A, Faezipour M, Abuzneid A (2019) Features dimensionality reduction approaches for machine learning based network intrusion detection. Electronics 8(3):322. https://doi.org/10.3390/electronics8030322

    Article  Google Scholar 

  44. Becht E, McInnes L, Healy J, Dutertre CA, Kwok IW, Ng LG, Ginhoux F, Newell EW (2019) Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 37(1):38–44. https://doi.org/10.1038/nbt.4314

  45. Salo F, Nassif AB, Essex A (2019) Dimensionality reduction with IG-PCA and ensemble classifier for network intrusion detection. Comput Netw 148:164–175. https://doi.org/10.1016/j.comnet.2018.11.010

    Article  Google Scholar 

  46. Townes FW, Hicks SC, Aryee MJ, Irizarry RA (2019) Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol 20(1):1–16. https://doi.org/10.1186/s13059-019-1861-6

    Article  Google Scholar 

  47. Ali F, El-Sappagh S, Islam SR, Ali A, Attique M, Imran M, Kwak KS (2021) An intelligent healthcare monitoring framework using wearable sensors and social networking data. Futur Gener Comput Syst 114:23–43. https://doi.org/10.1016/j.future.2020.07.047

    Article  Google Scholar 

  48. Elezaj O, Yayilgan SY, Abomhara M, Yeng P, Ahmed J (2019) Data-driven intrusion detection system for small and medium enterprises. In: 2019 IEEE 24th international workshop on computer aided modeling and design of communication links and networks (CAMAD), pp 1–7. IEEE. https://doi.org/10.1109/CAMAD.2019.8858166

  49. Stjepić AM, Pejić Bach M, Bosilj Vukšić V (2021) Exploring risks in the adoption of business intelligence in SMEs using the TOE framework. J Risk Financ Manage 14(2):58. https://doi.org/10.3390/jrfm14020058

    Article  Google Scholar 

  50. Gao G, Wang H, Gao P (2021) Establishing a credit risk evaluation system for SMEs using the soft voting fusion model. Risks 9(11):202. https://doi.org/10.3390/risks9110202

    Article  Google Scholar 

  51. Shakya S, Smys S (2021) Big data analytics for improved risk management and customer segregation in banking applications. J ISMAC 3(03):235–249. https://doi.org/10.36548/jismac.2021.3.005

  52. Suma V (2019) Towards sustainable industrialization using big data and internet of things. J ISMAC 1(01):24–37. https://doi.org/10.36548/jismac.2019.1.003

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Mahalingam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mahalingam, R., Jayanthi, K. (2023). Impact of High Dimensionality Reduction in Financial Datasets of SMEs with Feature Pre-processing in Data Mining. In: Bindhu, V., Tavares, J.M.R.S., Vuppalapati, C. (eds) Proceedings of Fourth International Conference on Communication, Computing and Electronics Systems . Lecture Notes in Electrical Engineering, vol 977. Springer, Singapore. https://doi.org/10.1007/978-981-19-7753-4_29

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-7753-4_29

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-7752-7

  • Online ISBN: 978-981-19-7753-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics