Abstract
Convolutional neural networks (CNNs) have been widely used in image classification tasks and have achieved remarkable results compared with traditional methods. Their main advantage is the ability to extract hidden features automatically using local connectivity and spatial locality. However, CNN cannot be applied to tabular data, mainly due to the unsuitability of the tabular data structure to the CNN input. In this paper, we propose a new generic method for the representation of multidimensional tabular data as color-encoded images that can be used both for data visualization and classification with CNN. Our approach, named FC-Viz (Feature Clustering-Visualization), is based on user-oriented data visualization ideas, such as pixel-oriented techniques, feature clustering, and feature interactions. The proposed approach includes a transformation of each instance of the tabular data into a 2D pixel-based representation, where pixels representing features with strong correlation and interaction are adjacent to each other. We applied FC-Viz to ten multidimensional tabular datasets with dozens to thousands of features and compared its classification and visualization performance with a state-of-the-art tabular data transformation method. The evaluation experiments show that our approach is as accurate as the state-of-the-art, but with much smaller images resulting in much more compact and faster CNN models.
Similar content being viewed by others
Data Availability
All datasets used in the paper are publicly available.
Notes
References
Sun B, Yang L, Zhang W, Lin M, Dong P, Young C, Dong J (2019) Supertml: two-dimensional word embedding for the precognition on structured tabular data. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Shneiderman B (2008) Extreme visualization: squeezing a billion records into a million pixels. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, pp 3–12
Perrot A, Bourqui R, Hanusse N, Lalanne F, Auber D (2015) Large interactive visualization of density functions on big data infrastructure. In: 2015 IEEE 5th symposium on large data analysis and visualization (lDAV), pp 99–106. IEEE
Liu Z, Jiang B, Heer J (2013) IMMENS: real-time visual querying of big data. In: Computer Graphics Forum, vol 32, pp 421–430 . Wiley Online Library
Keim DA, Hao MC, Ladisch J, Hsu M, Dayal U (2001) Pixel bar charts: a new technique for visualizing large multi-attribute data sets without aggregation. In: IEEE symposium on information visualization: INFOVIS 2001, pp 113–120
Keim DA, Kriegel H-P (1996) Visualization techniques for mining large databases: a comparison. IEEE Trans Knowl Data Eng 8(6):923–938
Sharma A, Vans E, Shigemizu D, Boroevich KA, Tsunoda T (2019) Deepinsight: a methodology to transform a non-image data to an image for convolution neural network architecture. Sci Rep 9(1):1–7
Lyu B, Haque A (2018) Deep learning based tumor type classification using gene expression data. In: Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics, pp 89–96
Ma S, Zhang Z (2018) Omicsmapnet: transforming omics data to take advantage of deep convolutional neural network for discovery. arXiv preprint arXiv:1804.05283
Shneiderman B (1992) Tree visualization with tree-maps: 2-d space-filling approach. ACM Trans Graphics (TOG) 11(1):92–99
López-García G, Jerez JM, Franco L, Veredas FJ (2020) Transfer learning with convolutional neural networks for cancer survival prediction using gene-expression data. PLoS ONE 15(3):0230536
Bazgir O, Zhang R, Dhruba SR, Rahman R, Ghosh S, Pal R (2019) Refined (representation of features as images with neighborhood dependencies): a novel feature representation for convolutional neural networks. arXiv preprint arXiv:1912.05687
Rusbult CE, Zembrodt IM (1983) Responses to dissatisfaction in romantic involvements: a multidimensional scaling analysis. J Exp Soc Psychol 19(3):274–293
Han H, Li Y, Zhu X (2019) Convolutional neural network learning for generic data classification. Inf Sci 477:448–465
Feng G, Li B, Yang M, Yan Z (2018) V-CNN: data visualizing based convolutional neural network. In: 2018 IEEE international conference on signal processing, communications and computing (ICSPCC), pp 1–6 . IEEE
Zhu Y, Brettin T, Xia F, Partin A, Shukla M, Yoo H, Evrard YA, Doroshow JH, Stevens RL (2021) Converting tabular data into images for deep learning with convolutional neural networks. Sci Rep 11(1):1–11
Kovalerchuk B, Agarwal B, Kall DC (2020) Solving non-image learning problems by mapping to images. In: 2020 24th international conference information visualisation (IV). IEEE, pp 264–269
Buturović L, Miljković D (2020) A novel method for classification of tabular data using convolutional neural networks. bioRxiv
Sharma A, Kumar D (2020) Classification with 2-d convolutional neural networks for breast cancer diagnosis. arXiv preprint arXiv:2007.03218
Kovalerchuk B, Kalla DC, Agarwal B (2021) Deep learning image recognition for non-images. arXiv preprint arXiv:2106.14350
Keim DA (2000) Designing pixel-oriented visualization techniques: theory and applications. IEEE Trans Visual Comput Graph. 6(1):59–78
Keim DA (1996) Pixel-oriented visualization techniques for exploring very large data bases. J Comput Graph Stat 5(1):58–77
Keim DA (2002) Information visualization and visual data mining. IEEE Trans Vis Comput Graph 8(1):1–8
Ellis G, Dix A (2007) A taxonomy of clutter reduction for information visualisation. IEEE Trans Visual Comput Graph 13(6):1216–1223
Bertini E, Tatu A, Keim D (2011) Quality metrics in high-dimensional data visualization: an overview and systematization. IEEE Trans Visual Comput Graph 17(12):2203–2212
Yang J, Peng W, Ward MO, Rundensteiner EA (2003) Interactive hierarchical dimension ordering, spacing and filtering for exploration of high dimensional datasets. In: IEEE symposium on information visualization 2003 (IEEE Cat. No. 03TH8714), pp. 105–112. IEEE
Behrisch M, Blumenschein M, Kim NW, Shao L, El-Assady M, Fuchs J, Seebacher D, Diehl A, Brandes U, Pfister H et al (2018) Quality metrics for information visualization. In: Computer graphics forum. Wiley Online Library, vol 37, pp 625–662
Ankerst M (2001) Visual data mining with pixel-oriented visualization techniques. In: Proceedings of the ACM SIGKDD workshop on visual data mining. Citeseer
Ankerst M, Berchtold S, Keim DA (1998) Similarity clustering of dimensions for an enhanced visualization of multidimensional data. In: Proceedings IEEE symposium on information visualization (Cat. No. 98TB100258). IEEE, pp 52–60
Maimon O, Rokach L (2005) Data mining and knowledge discovery handbook. Springer, Berlin
Illowsky B, Dean S (2018) Introductory statistics. OpenStax College. Texas.
Molnar C (2020) Interpretable machine learning. Lulu.com, Morrisville
Friedman JH, Popescu BE (2008) Predictive learning via rule ensembles. Ann Appl Stat 2(3):916–954
Sorokina D, Caruana R, Riedewald M, Fink D (2008) Detecting statistical interactions with additive groves of trees. In: Proceedings of the 25th international conference on machine learning, pp 1000–1007
Oh S (2019) Feature interaction in terms of prediction performance. Appl Sci 9(23):5191
Chanda P, Cho Y-R, Zhang A, Ramanathan M (2009) Mining of attribute interactions using information theoretic metrics. In: 2009 IEEE international conference on data mining workshops. IEEE, pp 350–355
Tang X, Dai Y, Sun P, Meng S (2018) Interaction-based feature selection using factorial design. Neurocomputing 281:47–54
Dorigo M, Stützle T (2019) Ant colony optimization: overview and recent advances. Handbook of metaheuristics, pp 311–351
Dorigo M, Gambardella LM (1997) Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans Evol Comput 1(1):53–66
Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11):2579–2605
Schölkopf B, Smola A, Müller K-R (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319
McInnes L, Healy J, Melville J (2018) Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426
Garofolo JS (1993) Timit acoustic phonetic continuous speech corpus. Linguistic Data Consortium
Mitchell TM, Mitchell TM (1997) Machine learning, vol 1. McGraw-Hill, New York
Guyon I, Gunn S, Nikravesh M, Zadeh LA (2008) Feature extraction: foundations and applications, vol 207. Springer, Berlin
Breiman L (1996) Bias, variance, and arcing classifiers. Technical report, 460, Statistics Department, University of California, Berkeley
Li M (2015) Efficiency improvement of ant colony optimization in solving the moderate LTSP. J Syst Eng Electron 26(6):1300–1308
Funding
No funding was received for this work.
Author information
Authors and Affiliations
Contributions
AD: Conceptualization, Methodology, Investigation, Software, Validation, Visualization, Writing. ML: Conceptualization, Methodology, Investigation, Supervision, Writing. NC: Conceptualization, Methodology, Investigation.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Code availability
The code will be made publicly available on GitHub upon the acceptance of this manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Damri, A., Last, M. & Cohen, N. Towards efficient image-based representation of tabular data. Neural Comput & Applic 36, 1023–1043 (2024). https://doi.org/10.1007/s00521-023-09074-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-09074-y