Skip to main content
Log in

Towards efficient image-based representation of tabular data

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Convolutional neural networks (CNNs) have been widely used in image classification tasks and have achieved remarkable results compared with traditional methods. Their main advantage is the ability to extract hidden features automatically using local connectivity and spatial locality. However, CNN cannot be applied to tabular data, mainly due to the unsuitability of the tabular data structure to the CNN input. In this paper, we propose a new generic method for the representation of multidimensional tabular data as color-encoded images that can be used both for data visualization and classification with CNN. Our approach, named FC-Viz (Feature Clustering-Visualization), is based on user-oriented data visualization ideas, such as pixel-oriented techniques, feature clustering, and feature interactions. The proposed approach includes a transformation of each instance of the tabular data into a 2D pixel-based representation, where pixels representing features with strong correlation and interaction are adjacent to each other. We applied FC-Viz to ten multidimensional tabular datasets with dozens to thousands of features and compared its classification and visualization performance with a state-of-the-art tabular data transformation method. The evaluation experiments show that our approach is as accurate as the state-of-the-art, but with much smaller images resulting in much more compact and faster CNN models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

All datasets used in the paper are publicly available.

Notes

  1. https://scikit-learn.org/stable/modules/generated/sklearn.cluster.FeatureAgglomeration.html.

  2. https://scikit-learn.org/stable/modules/clustering.html#hierarchical-clustering.

  3. https://www.cancer.gov/tcga.

  4. https://www.kaggle.com/c/bioresponse/data.

  5. https://jundongl.github.io/scikit-feature/datasets.html.

  6. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html.

  7. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html.

  8. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html.

  9. https://github.com/alok-ai-lab/DeepInsight.

References

  1. Sun B, Yang L, Zhang W, Lin M, Dong P, Young C, Dong J (2019) Supertml: two-dimensional word embedding for the precognition on structured tabular data. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops

  2. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

    Google Scholar 

  3. Shneiderman B (2008) Extreme visualization: squeezing a billion records into a million pixels. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, pp 3–12

  4. Perrot A, Bourqui R, Hanusse N, Lalanne F, Auber D (2015) Large interactive visualization of density functions on big data infrastructure. In: 2015 IEEE 5th symposium on large data analysis and visualization (lDAV), pp 99–106. IEEE

  5. Liu Z, Jiang B, Heer J (2013) IMMENS: real-time visual querying of big data. In: Computer Graphics Forum, vol 32, pp 421–430 . Wiley Online Library

  6. Keim DA, Hao MC, Ladisch J, Hsu M, Dayal U (2001) Pixel bar charts: a new technique for visualizing large multi-attribute data sets without aggregation. In: IEEE symposium on information visualization: INFOVIS 2001, pp 113–120

  7. Keim DA, Kriegel H-P (1996) Visualization techniques for mining large databases: a comparison. IEEE Trans Knowl Data Eng 8(6):923–938

    Article  Google Scholar 

  8. Sharma A, Vans E, Shigemizu D, Boroevich KA, Tsunoda T (2019) Deepinsight: a methodology to transform a non-image data to an image for convolution neural network architecture. Sci Rep 9(1):1–7

    Google Scholar 

  9. Lyu B, Haque A (2018) Deep learning based tumor type classification using gene expression data. In: Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics, pp 89–96

  10. Ma S, Zhang Z (2018) Omicsmapnet: transforming omics data to take advantage of deep convolutional neural network for discovery. arXiv preprint arXiv:1804.05283

  11. Shneiderman B (1992) Tree visualization with tree-maps: 2-d space-filling approach. ACM Trans Graphics (TOG) 11(1):92–99

    Article  Google Scholar 

  12. López-García G, Jerez JM, Franco L, Veredas FJ (2020) Transfer learning with convolutional neural networks for cancer survival prediction using gene-expression data. PLoS ONE 15(3):0230536

    Article  Google Scholar 

  13. Bazgir O, Zhang R, Dhruba SR, Rahman R, Ghosh S, Pal R (2019) Refined (representation of features as images with neighborhood dependencies): a novel feature representation for convolutional neural networks. arXiv preprint arXiv:1912.05687

  14. Rusbult CE, Zembrodt IM (1983) Responses to dissatisfaction in romantic involvements: a multidimensional scaling analysis. J Exp Soc Psychol 19(3):274–293

    Article  Google Scholar 

  15. Han H, Li Y, Zhu X (2019) Convolutional neural network learning for generic data classification. Inf Sci 477:448–465

    Article  Google Scholar 

  16. Feng G, Li B, Yang M, Yan Z (2018) V-CNN: data visualizing based convolutional neural network. In: 2018 IEEE international conference on signal processing, communications and computing (ICSPCC), pp 1–6 . IEEE

  17. Zhu Y, Brettin T, Xia F, Partin A, Shukla M, Yoo H, Evrard YA, Doroshow JH, Stevens RL (2021) Converting tabular data into images for deep learning with convolutional neural networks. Sci Rep 11(1):1–11

    Google Scholar 

  18. Kovalerchuk B, Agarwal B, Kall DC (2020) Solving non-image learning problems by mapping to images. In: 2020 24th international conference information visualisation (IV). IEEE, pp 264–269

  19. Buturović L, Miljković D (2020) A novel method for classification of tabular data using convolutional neural networks. bioRxiv

  20. Sharma A, Kumar D (2020) Classification with 2-d convolutional neural networks for breast cancer diagnosis. arXiv preprint arXiv:2007.03218

  21. Kovalerchuk B, Kalla DC, Agarwal B (2021) Deep learning image recognition for non-images. arXiv preprint arXiv:2106.14350

  22. Keim DA (2000) Designing pixel-oriented visualization techniques: theory and applications. IEEE Trans Visual Comput Graph. 6(1):59–78

    Article  Google Scholar 

  23. Keim DA (1996) Pixel-oriented visualization techniques for exploring very large data bases. J Comput Graph Stat 5(1):58–77

    Google Scholar 

  24. Keim DA (2002) Information visualization and visual data mining. IEEE Trans Vis Comput Graph 8(1):1–8

    Article  MathSciNet  Google Scholar 

  25. Ellis G, Dix A (2007) A taxonomy of clutter reduction for information visualisation. IEEE Trans Visual Comput Graph 13(6):1216–1223

    Article  Google Scholar 

  26. Bertini E, Tatu A, Keim D (2011) Quality metrics in high-dimensional data visualization: an overview and systematization. IEEE Trans Visual Comput Graph 17(12):2203–2212

    Article  Google Scholar 

  27. Yang J, Peng W, Ward MO, Rundensteiner EA (2003) Interactive hierarchical dimension ordering, spacing and filtering for exploration of high dimensional datasets. In: IEEE symposium on information visualization 2003 (IEEE Cat. No. 03TH8714), pp. 105–112. IEEE

  28. Behrisch M, Blumenschein M, Kim NW, Shao L, El-Assady M, Fuchs J, Seebacher D, Diehl A, Brandes U, Pfister H et al (2018) Quality metrics for information visualization. In: Computer graphics forum. Wiley Online Library, vol 37, pp 625–662

  29. Ankerst M (2001) Visual data mining with pixel-oriented visualization techniques. In: Proceedings of the ACM SIGKDD workshop on visual data mining. Citeseer

  30. Ankerst M, Berchtold S, Keim DA (1998) Similarity clustering of dimensions for an enhanced visualization of multidimensional data. In: Proceedings IEEE symposium on information visualization (Cat. No. 98TB100258). IEEE, pp 52–60

  31. Maimon O, Rokach L (2005) Data mining and knowledge discovery handbook. Springer, Berlin

    Book  Google Scholar 

  32. Illowsky B, Dean S (2018) Introductory statistics. OpenStax College. Texas.

    Google Scholar 

  33. Molnar C (2020) Interpretable machine learning. Lulu.com, Morrisville

    Google Scholar 

  34. Friedman JH, Popescu BE (2008) Predictive learning via rule ensembles. Ann Appl Stat 2(3):916–954

    Article  MathSciNet  Google Scholar 

  35. Sorokina D, Caruana R, Riedewald M, Fink D (2008) Detecting statistical interactions with additive groves of trees. In: Proceedings of the 25th international conference on machine learning, pp 1000–1007

  36. Oh S (2019) Feature interaction in terms of prediction performance. Appl Sci 9(23):5191

    Article  Google Scholar 

  37. Chanda P, Cho Y-R, Zhang A, Ramanathan M (2009) Mining of attribute interactions using information theoretic metrics. In: 2009 IEEE international conference on data mining workshops. IEEE, pp 350–355

  38. Tang X, Dai Y, Sun P, Meng S (2018) Interaction-based feature selection using factorial design. Neurocomputing 281:47–54

    Article  Google Scholar 

  39. Dorigo M, Stützle T (2019) Ant colony optimization: overview and recent advances. Handbook of metaheuristics, pp 311–351

  40. Dorigo M, Gambardella LM (1997) Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans Evol Comput 1(1):53–66

    Article  Google Scholar 

  41. Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11):2579–2605

    Google Scholar 

  42. Schölkopf B, Smola A, Müller K-R (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319

    Article  Google Scholar 

  43. McInnes L, Healy J, Melville J (2018) Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426

  44. Garofolo JS (1993) Timit acoustic phonetic continuous speech corpus. Linguistic Data Consortium

  45. Mitchell TM, Mitchell TM (1997) Machine learning, vol 1. McGraw-Hill, New York

    Google Scholar 

  46. Guyon I, Gunn S, Nikravesh M, Zadeh LA (2008) Feature extraction: foundations and applications, vol 207. Springer, Berlin

    Google Scholar 

  47. Breiman L (1996) Bias, variance, and arcing classifiers. Technical report, 460, Statistics Department, University of California, Berkeley

  48. Li M (2015) Efficiency improvement of ant colony optimization in solving the moderate LTSP. J Syst Eng Electron 26(6):1300–1308

    Article  Google Scholar 

Download references

Funding

No funding was received for this work.

Author information

Authors and Affiliations

Authors

Contributions

AD: Conceptualization, Methodology, Investigation, Software, Validation, Visualization, Writing. ML: Conceptualization, Methodology, Investigation, Supervision, Writing. NC: Conceptualization, Methodology, Investigation.

Corresponding author

Correspondence to Mark Last.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Code availability

The code will be made publicly available on GitHub upon the acceptance of this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Damri, A., Last, M. & Cohen, N. Towards efficient image-based representation of tabular data. Neural Comput & Applic 36, 1023–1043 (2024). https://doi.org/10.1007/s00521-023-09074-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09074-y

Keywords

Navigation