Advertisement

Feature selection based on star coordinates plots associated with eigenvalue problems

  • Alberto SanchezEmail author
  • Laura Raya
  • Miguel A. Mohedano-Munoz
  • Manuel Rubio-Sánchez
Original Article
  • 21 Downloads

Abstract

Feature selection consists of choosing a smaller number of variables to work with when analyzing high-dimensional data sets. Recently, several visualization tools, techniques, and feature relevance measures have been developed in order to help users carry out the feature selection. Some of these approaches are based on radial axes methods, where analysts perform backward feature elimination by discarding features that have a low impact on the visualizations. Similarly, in this paper, we propose a new feature relevance measure for star coordinates plots associated with the class of linear dimensionality reduction mappings defined through the solutions of eigenvalue problems, such as linear discriminant analysis or principal component analysis. We show that the approach leads to enhanced feature subsets for class separation or variance maximization in the plots for numerous data sets of the UCI repository. Lastly, in practice, the tool allows analysts to decide which features to discard by examining their relevance and by taking into account previous domain knowledge.

Keywords

Feature selection Eigenvalue problems Linear projections Multidimensional visualization Star coordinates Principal component analysis Linear discriminant analysis 

Notes

Acknowledgements

This work has been supported by the Spanish Ministry of Economy (Grant RTI2018-098694-B-I00). The authors would like to thank Diego Rojo for constructive criticism of the manuscript.

References

  1. 1.
    Albuquerque, G., Eisemann, M., Lehmann, D., Theisel, H., Magnor, M.: Improving the visual analysis of high-dimensional datasets using quality measures. In: IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 19–26 (2010).  https://doi.org/10.1109/VAST.2010.5652433
  2. 2.
    Baumgartner, C., Plant, C., Kailing, K., Kriegel, H.P., Kröger, P.: Subspace selection for clustering high-dimensional data. In: Proceedings of the Fourth IEEE International Conference on Data Mining, ICDM’04, pp. 11–18. IEEE Computer Society, Washington, DC (2004)Google Scholar
  3. 3.
    Bertini, E., Tatu, A., Keim, D.: Quality metrics in high-dimensional data visualization: an overview and systematization. IEEE Trans. Vis. Comput. Graph. 17(12), 2203–2212 (2011).  https://doi.org/10.1109/TVCG.2011.229 CrossRefGoogle Scholar
  4. 4.
    Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1), 245–271 (1997)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)CrossRefGoogle Scholar
  6. 6.
    Chegini, M., Shao, L., Gregor, R., Lehmann, D.J., Andrews, K., Schreck, T.: Interactive visual exploration of local patterns in large scatterplot spaces. Comput. Graph. Forum 37(3), 99–109 (2018).  https://doi.org/10.1111/cgf.13404 CrossRefGoogle Scholar
  7. 7.
    Chen, B., Hong, J., Wang, Y.: The minimum feature subset selection problem. J. Comput. Sci. Technol. 12(2), 145–153 (1997).  https://doi.org/10.1007/BF02951333 MathSciNetCrossRefGoogle Scholar
  8. 8.
    Choo, J., Lee, H., Kihm, J., Park, H.: iVisClassifier: an interactive visual analytics system for classification based on supervised dimension reduction. In: IEEE Symposium on Visual Analytics Science and Technology, pp. 27–34 (2010).  https://doi.org/10.1109/VAST.2010.5652443
  9. 9.
    Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2017). Accessed Aug 2019
  10. 10.
    Diehl, S., Beck, F., Burch, M.: Uncovering strengths and weaknesses of radial visualizations: an empirical approach. IEEE Trans. Vis. Comput. Graph. 16, 935–942 (2010)CrossRefGoogle Scholar
  11. 11.
    Draper, G.M., Livnat, Y., Riesenfeld, R.F.: A survey of radial methods for information visualization. IEEE Trans. Vis. Comput. Graph. 15, 759–776 (2009)CrossRefGoogle Scholar
  12. 12.
    Guo, D.: Coordinating computational and visual approaches for interactive feature selection and multivariate clustering. Inf. Vis. 2(4), 232–246 (2003).  https://doi.org/10.1057/palgrave.ivs.9500053 CrossRefGoogle Scholar
  13. 13.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)zbMATHGoogle Scholar
  14. 14.
    He, X., Cai, D., Yan, S., Zhang, H.J.: Neighborhood preserving embedding. In: Tenth IEEE International Conference on Computer Vision (ICCV’05) vol. 1, 2, pp. 1208–1213 (2005).  https://doi.org/10.1109/ICCV.2005.167
  15. 15.
    He, X., Niyogi, P.: Locality preserving projections. In: Proceedings of the 16th International Conference on Neural Information Processing Systems, NIPS’03, pp. 153–160. MIT Press, Cambridge. http://dl.acm.org/citation.cfm?id=2981345.2981365 (2003). Accessed Aug 2019
  16. 16.
    Huber, P.J.: Projection pursuit. Ann. Stat. 13(2), 435–475 (1985)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Adaptive and Learning Systems for Signal Processing, Communications, and Control. Wiley, Hoboken (2001)CrossRefGoogle Scholar
  18. 18.
    Ingram, S., Munzner, T., Irvine, V., Tory, M., Bergner, S., Möller, T.: DimStiller: workflows for dimensional analysis and reduction. In: IEEE VAST, pp. 3–10. IEEE Computer Society (2010)Google Scholar
  19. 19.
    Jänicke, H., Chen, M.: A salience-based quality metric for visualization. In: Proceedings of the 12th Eurographics/IEEE-VGTC Conference on Visualization, EuroVis’10, pp. 1183–1192. The Eurographics Association, Wiley, Chichester (2010).  https://doi.org/10.1111/j.1467-8659.2009.01667.x CrossRefGoogle Scholar
  20. 20.
    Johansson, S., Johansson, J.: Interactive dimensionality reduction through user-defined combinations of quality metrics. IEEE Trans. Vis. Comput. Graph. 15, 993–1000 (2009)CrossRefGoogle Scholar
  21. 21.
    Jolliffe, I.T.: Principal Component Analysis. Springer Series in Statistics. Springer, Berlin (2010)zbMATHGoogle Scholar
  22. 22.
    Kandogan, E.: Star coordinates: a multi-dimensional visualization technique with uniform treatment of dimensions. In: Proceedings of the IEEE Information Visualization Symposium, Late Breaking Hot Topics, pp. 9–12 (2000)Google Scholar
  23. 23.
    Kandogan, E.: Visualizing multi-dimensional clusters, trends, and outliers using star coordinates. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, KDD’01, pp. 107–116. ACM, New York (2001)Google Scholar
  24. 24.
    Kokiopoulou, E., Chen, J., Saad, Y.: Trace optimization and eigenproblems in dimension reduction methods. Numer. Linear Algebra Appl. 18(3), 565–602 (2011)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Kotsiantis, S.B., Zaharakis, I.D., Pintelas, P.E.: Machine learning: a review of classification and combining techniques. Artif. Intell. Rev. 26(3), 159–190 (2006).  https://doi.org/10.1007/s10462-007-9052-3 CrossRefGoogle Scholar
  26. 26.
    Krause, J., Perer, A., Bertini, E.: Infuse: interactive feature selection for predictive modeling of high dimensional data. IEEE Trans. Vis. Comput. Graph. 20(12), 1614–1623 (2014)CrossRefGoogle Scholar
  27. 27.
    Lehmann, D.J., Theisel, H.: Orthographic star coordinates. IEEE Trans. Vis. Comput. Graph. 19(12), 2615–2624 (2013)CrossRefGoogle Scholar
  28. 28.
    Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 94 (2017)CrossRefGoogle Scholar
  29. 29.
    Markovitch, S., Rosenstein, D.: Feature generation using general constructor functions. Mach. Learn. 49(1), 59–98 (2002)CrossRefGoogle Scholar
  30. 30.
    May, T., Bannach, A., Davey, J., Ruppert, T., Kohlhammer, J.: Guiding feature subset selection with an interactive visualization. In: IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 111–120 (2011).  https://doi.org/10.1109/VAST.2011.6102448
  31. 31.
    McLachlan, G.J.: Discriminant Analysis and Statistical Pattern Recognition. Wiley Series in Probability and Mathematical Statistics. Wiley, Hoboken (2004)zbMATHGoogle Scholar
  32. 32.
    Rauber, P.E., da Silva, R.R.O., Feringa, S., Celebi, M.E., Falcāo, A.X., Telea, A.C.: Interactive image feature selection aided by dimensionality reduction. In: EuroVis Workshop on Visual Analytics (EuroVA). The Eurographics Association (2015)Google Scholar
  33. 33.
    Reris, R., Brooks, J.P.: Principal component analysis and optimization: a tutorial. In: 14th INFORMS Computing Society Conference, pp. 200–211 (2015)Google Scholar
  34. 34.
    Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987).  https://doi.org/10.1016/0377-0427(87)90125-7 CrossRefzbMATHGoogle Scholar
  35. 35.
    Rubio-Sánchez, M., Raya, L., Díaz, F., Sanchez, A.: A comparative study between radviz and star coordinates. IEEE Trans. Vis. Comput. Graph. 22(1), 619–628 (2016)CrossRefGoogle Scholar
  36. 36.
    Rubio-Sánchez, M., Sanchez, A.: Axis calibration for improving data attribute estimation in star coordinates plots. IEEE Trans. Vis. Comput. Graph. 20(12), 2013–2022 (2014)CrossRefGoogle Scholar
  37. 37.
    Rubio-Sánchez, M., Sanchez, A., Lehmann, D.J.: Adaptable radial axes plots for improved multivariate data visualization. Comput. Graph. Forum 36(3), 389–399 (2017).  https://doi.org/10.1111/cgf.13196 CrossRefGoogle Scholar
  38. 38.
    Sanchez, A., Soguero-Ruiz, C., Mora-Jimenez, I., Rivas-Flores, F., Lehmann, D., Rubio-Sanchez, M.: Scaled radial axes for interactive visual feature selection: a case study for analyzing chronic conditions. Expert Syst. Appl. 100, 182–196 (2018).  https://doi.org/10.1016/j.eswa.2018.01.054 CrossRefGoogle Scholar
  39. 39.
    Seo, J., Shneiderman, B.: A rank-by-feature framework for interactive exploration of multidimensional data. Inf. Vis. 4(2), 96–113 (2005).  https://doi.org/10.1057/palgrave.ivs.9500091 CrossRefGoogle Scholar
  40. 40.
    Tatu, A., Bak, P., Bertini, E., Keim, D., Schneidewind, J.: Visual quality metrics and human perception: an initial study on 2d projections of large multidimensional data. In: Proceedings of the International Conference on Advanced Visual Interfaces, AVI ’10, pp. 49–56. ACM, New York (2010).  https://doi.org/10.1145/1842993.1843002
  41. 41.
    Tatu, A., Maaß, F., Färber, I., Bertini, E., Schreck, T., Seidl, T., Keim, D.A.: Subspace search and visualization to make sense of alternative clusterings in high-dimensional data. In: Proceedings IEEE Symposium on Visual Analytics Science and Technology, pp. 63–72. IEEE Computer Society (2012)Google Scholar
  42. 42.
    Velloso, E., Bulling, A., Gellersen, H., Ugulino, W., Fuks, H.: Qualitative activity recognition of weight lifting exercises. In: Proceedings of the 4th Augmented Human International Conference, AH ’13, pp. 116–123. ACM, New York (2013). https://doi.org/10.1145/2459236.2459256
  43. 43.
    Wang, Y., Li, J., Nie, F., Theisel, H., Gong, M., Lehmann, D.J.: Linear discriminative star coordinates for exploring class and cluster separation of high dimensional data. Comput. Graph. Forum 36, 401–410 (2017).  https://doi.org/10.1111/cgf.13197 CrossRefGoogle Scholar
  44. 44.
    Wang, Y., Nie, F., Lehmann, D.J., Gong, M.: Discriminative star coordinates. Technical Report FIN-02-2016, Otto-von-Guericke-Universität Magdeburg (2016)Google Scholar
  45. 45.
    Yang, J., Peng, W., Ward, M.O., Rundensteiner, E.A.: Interactive hierarchical dimension ordering, spacing and filtering for exploration of high dimensional datasets. In: Proceedings of the Ninth Annual IEEE Conference on Information Visualization, INFOVIS’03, pp. 105–112. IEEE Computer Society, Washington (2003)Google Scholar
  46. 46.
    Yang, J., Peng, W., Ward, M.O., Rundensteiner, E.A.: Interactive hierarchical dimension ordering, spacing and filtering for exploration of high dimensional datasets. In: Proceedings of the Ninth Annual IEEE Conference on Information Visualization, INFOVIS’03, pp. 105–112. IEEE Computer Society, Washington, DC. http://dl.acm.org/citation.cfm?id=1947368.1947390 (2003). Accessed Aug 2019
  47. 47.
    Yang, J., Ward, M.O., Rundensteiner, E.A.: Interactive hierarchical displays: a general framework for visualization and exploration of large multivariate data sets. Comput. Graph. 27, 265–283 (2003)CrossRefGoogle Scholar
  48. 48.
    Zupan, J., Novic, M., Li, X., Gasteiger, J.: Classification of multicomponent analytical data of olive oils using different neural networks. Anal. Chim. Acta 292(3), 219–234 (1994)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2020

Authors and Affiliations

  1. 1.Universidad Rey Juan CarlosMadridSpain
  2. 2.Research Center for Computational SimulationMadridSpain
  3. 3.U-tadMadridSpain

Personalised recommendations