VizRank: Data Visualization Guided by Machine Learning

Leban, Gregor; Zupan, Blaž; Vidmar, Gaj; Bratko, Ivan

doi:10.1007/s10618-005-0031-5

VizRank: Data Visualization Guided by Machine Learning

Published: 16 May 2006

Volume 13, pages 119–136, (2006)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Gregor Leban¹,
Blaž Zupan^1,2,
Gaj Vidmar³ &
…
Ivan Bratko^1,4

2165 Accesses
61 Citations
3 Altmetric
Explore all metrics

Abstract

Data visualization plays a crucial role in identifying interesting patterns in exploratory data analysis. Its use is, however, made difficult by the large number of possible data projections showing different attribute subsets that must be evaluated by the data analyst. In this paper, we introduce a method called VizRank, which is applied on classified data to automatically select the most useful data projections. VizRank can be used with any visualization method that maps attribute values to points in a two-dimensional visualization space. It assesses possible data projections and ranks them by their ability to visually discriminate between classes. The quality of class separation is estimated by computing the predictive accuracy of k-nearest neighbor classifier on the data set consisting of x and y positions of the projected data points and their class information. The paper introduces the method and presents experimental results which show that VizRank's ranking of projections highly agrees with subjective rankings by data analysts. The practical use of VizRank is also demonstrated by an application in the field of functional genomics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A Comprehensive Survey of Anomaly Detection Algorithms

Article 26 November 2021

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

References

Bardorfer, A., Munih, M., and Zupan, A. 2001. Upper limb motion analysis using haptic interface. IEEE/ASME Transactions on Mechatronics, 6(3):253–260.
Article Google Scholar
Blake, C. and Merz, C. 1998. UCI repository of machine learning databases.
Brier, G.W. 1950. Verification of forecasts expressed in terms of probabilities. Monthly Weather Review, 78:1–3.
Article Google Scholar
Broder, A.J. 1990. Strategies for efficient incremental nearest neighbor search. Pattern Recognition, 23(1–2):171–178.
Article Google Scholar
Brown, M.P., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C., Furey, T.S., Ares, M.J., and Haussler, D. 2000. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proceedings of the National Academy of Sciences, 97(1):262–267.
Article Google Scholar
Chambers, J.M., Cleveland, W.S., Kleiner, B., and Tukey, P.A. 1983. Graphical Methods for Data Analysis, Chapman and Hall.
Cleveland, W.S. 1993. Visualizing data, New Jersey: Hobart Press (Summit).
Google Scholar
Cleveland, W.S. and McGill, R. 1984. The many faces of a scatter plot. Journal of the American Statistical Association, 79(388):807–822.
Article MathSciNet Google Scholar
Cook, R.D. and Yin, X. 2001. Dimension reduction and visualization in discriminant analysis. Australian and New Zealand Journal of Statistics, 43(2):147–199.
Article MATH MathSciNet Google Scholar
Cutting, J.E. and Vishton, P.M. 1995. Perceiving layout and knowing distances: The integration, relative potency, and contextual use of different information about depth. Handbook of perception and cognition, San Diego, CA: Academic Press, pp. 69–117.
Google Scholar
Dasarathy, B.W. 1991. Nearest neighbor (NN) norms: NN pattern classification techniques, IEEE Computer Society Press.
Demšar, J. and Zupan, B. 2004. From experimental machine learning to interactive data mining, a white paper. AI Lab, Faculty of Computer and Information Science, Ljubljana.
DeRisi, J.L., Iyer, V.R., and Brown, P.O. 1997. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science, 278:680–686.
Article Google Scholar
Diaconis, P. and Friedman, D. 1984. Asymptotics of graphical projection pursuit. Annals of Statistics, 1(12):793–815.
Article Google Scholar
Dillon, I., Modha, D., and Spangler, W. 1998. Visualizing class structure of multidimensional data. Proceedings of the 30th Symposium on the Interface: Computing Science and Statistics, Minneapolis, MN.
Duda, R.O., Hart, P.E., and Stork, D.G. 2001. Pattern Classification, John Wiley and Sons, Inc.
Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. 1998. Cluster analysis and display of genome-wide expression patterns. PNAS, 95(25):14863–14868.
Article Google Scholar
Friedman, J.H., Bentley, J.L., and Finkel, R. 1977. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 3(3):209–222.
Article MATH Google Scholar
Friedman, J.H. and Tukey, J.W. 1974. A projection pursuit algorithm for exploratory data analysis. IEEE Transactions on Computers, 23:881–890.
Article MATH Google Scholar
Grinstein, G., Trutschl, M. and Cvek, U. 2001. High-dimensional visualizations. Proceedings of the Visual Data Mining Workshop, KDD.
Harris, R.L. 1999. Information graphics: A comprehensive illustrated reference, New York: Oxford Press, pp. 290–297.
Google Scholar
Hastie, T., Tibshirani, R., and Friedman, J. 2001. The Elements of Statistical Learning, Springer.
Hoffman, P.E. and Grinstein, G.G. 1999. Dimensional anchors: A graphic primitive for multidimensional multivariate information visualizations. Proc. of the NPIV 99.
Hoffman, P.E., Grinstein, G.G., Marx, K., Grosse, I., and Stanley, E. 1997. DNA visual and analytic data mining. IEEE Visualization, 1:437–441.
Google Scholar
Huber, P. 1985. Projection pursuit (with discussion). Annals of Statistics, 13:435–525.
Article MATH MathSciNet Google Scholar
Inselberg, A. 1981. n-dimensional graphics, part i-lines and hyperplanes, Technical Report G320-2711, IBM Los Angeles Scientific Center.
Kaski, S. and Peltonen, J. 2003. Informative discriminant analysis. Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), 1:329–336.
Google Scholar
Keim, D.A. and Kriegel, H. 1996. Visualization techniques for mining large databases: A comparison. Transactions on Knowledge and Data Engineering, Special Issue on Data Mining, 8(6):923–938.
Article Google Scholar
Kononenko, I. and Simec, E. 1995. Induction of decision trees using relieff. Mathematical and statistical methods in artificial intelligence, Springer Verlag.
Leban, G., Bratko, I., Petrovic, U., Curk, T., and Zupan, B. 2005. Vizrank: Finding informative data projections in functional genomics by machine learning. Bioinformatics, 21(3):413–414.
Article Google Scholar
Nason, G. 1992. Design and Choice of Projection Indices, PhD thesis, University of Bath.
Santini, S. and Jain, R. 1996. The use of psychological similarity measure for queries in image databases.
Santini, S. and Jain, R. 1999. Similarity measures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(9):871–883.
Article Google Scholar
Schucany, W. and Frawley, W. 1973. A rank test for two group concordance. Psychometrika, 2(38):249–258.
Article Google Scholar
Siegel, S. and Castellan, J. 1988. Nonparametric statistics for the behavioral sciences, 2nd edn. McGraw-Hill.
Torkkola, K. 2003. Feature extraction by non-parametric mutual information maximization. Journal of Machine Learning Research, 3:1415–1438.
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

The authors wish to thank Uros Petrovic for the help on analysis of yeast gene expression data set and twelve post-graduate students of University of Ljubljana who for participating in the experiments. We would also like to acknowledge the support from a Program Grant (P2-0209) from Slovenian Research Agency.

Author information

Authors and Affiliations

Faculty of Computer and Information Science, University of Ljubljana, Tržaška 25, Ljubljana, Slovenia
Gregor Leban, Blaž Zupan & Ivan Bratko
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
Blaž Zupan
Institute of Biomedical Informatics, University of Ljubljana, Vrazov trg 2, Ljubljana, Slovenia
Gaj Vidmar
Jozef Stefan Institute, Ljubljana, Slovenia
Ivan Bratko

Authors

Gregor Leban
View author publications
You can also search for this author in PubMed Google Scholar
Blaž Zupan
View author publications
You can also search for this author in PubMed Google Scholar
Gaj Vidmar
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Bratko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gregor Leban.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Leban, G., Zupan, B., Vidmar, G. et al. VizRank: Data Visualization Guided by Machine Learning. Data Min Knowl Disc 13, 119–136 (2006). https://doi.org/10.1007/s10618-005-0031-5

Download citation

Received: 30 April 2004
Accepted: 07 November 2005
Published: 16 May 2006
Issue Date: September 2006
DOI: https://doi.org/10.1007/s10618-005-0031-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

VizRank: Data Visualization Guided by Machine Learning

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A Comprehensive Survey of Anomaly Detection Algorithms

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

VizRank: Data Visualization Guided by Machine Learning

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A Comprehensive Survey of Anomaly Detection Algorithms

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation