Visual Data Mining and Discovery with Binarized Vectors

Kovalerchuk, Boris; Delizy, Florian; Riggs, Logan; Vityaev, Evgenii

doi:10.1007/978-3-642-23241-1_7

Visual Data Mining and Discovery with Binarized Vectors

Boris Kovalerchuk⁵,
Florian Delizy⁵,
Logan Riggs⁵ &
…
Evgenii Vityaev⁶

Chapter

1894 Accesses
4 Citations

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 24))

Abstract

The emerging field of Visual Analytics combines several fields where Data Mining and Visualization play leading roles. The fundamental departure of visual analytics from other approaches is in extensive use of visual analytical tools to discover patterns not only to visualize pattern that have been discovered by traditional data mining methods. High complexity data mining tasks often require employing a multi-level top-down approach, where first at the top levels a qualitative analysis of the complex situation is conducted and top-level patterns are discovered. This paper presents the concept of Monotone Boolean Function Visual Analytics (MBFVA) for such top level pattern discovery. This approach employs binarization and monotonization of quantitative attributes to get a top level data representation. The top level discoveries form a foundation for next more detailed data mining levels where patterns are refined. The approach is illustrated with application to the medical, law enforcement and security domains. The medical application is concerned with discovering breast cancer diagnostic rules (i) interactively with a radiologist, (ii) analytically with data mining algorithms, and (iii) visually. The coordinated visualization of these rules opens an opportunity to coordinate the multi-source rules, and to come up with rules that are meaningful for the expert in the field, and are confirmed with the database. Often experts and data mining algorithms operate at the very different and incomparable levels of detail and produce incomparable patterns. The proposed MBFVA approach allows solving this problem. This paper shows how to represent and visualize binary multivariate data in 2-D and 3-D. This representation preserves the structural relations that exist in multivariate data. It creates a new opportunity to guide the visual discovery of unknown patterns in the data. In particular, the structural representation allows us to convert a complex border between the patterns in multidimensional space into visual 2-D and 3-D forms. This decreases the information overload on the user. The visualization shows not only the border between classes, but also shows a location of the case of interest relative to the border between the patterns. A user does not need to see the thousands of previous cases that have been used to build a border between the patterns. If the abnormal case is deeply inside in the abnormal area, far away from the border between “normal” and “abnormal” classes, then this shows that this case is very abnormal and needs immediate attention. The paper concludes with the outline of the scaling of the algorithm for the large data sets and expanding the approach for non-monotone data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Beilken, C., Spenke, M.: Visual interactive data mining with InfoZoom-the Medical Data Set. In: 3rd European Conf. on Principles and Practice of Knowledge Discovery in Databases, PKDD (1999), http://lisp.vse.cz/pkdd99/Challenge/spenke-m.zip
Groth, D., Robertson, E.: Architectural support for database visualization. In: Workshop on New Paradigms in Information Visualization and Manipulation, pp. 53–55 (1998)
Google Scholar
Hansel, G.: Sur le nombre des functions Bool’eenes monotones de n variables. C.R. Acad. Sci., Paris 262(20), 1088–1090 (1966)
MathSciNet Google Scholar
Inselberg, A., Dimsdale, B.: Parallel coordinates: A tool for visualizing multidimensional Geometry. In: Proceedings of IEEE Visualization 1990, pp. 360–375. IEEE Computer Society Press, Los Alamitos (1990)
Google Scholar
Keim, D., Hao Ming, C., Dayal, U., Meichun, H.: Pixel bar charts: a visualization technique for very large multiattributes data sets. Information Visualization 1(1), 20–34 (2002)
Google Scholar
Keim, D., Müller, W., Schumann, H.: Visual Data Mining. In: EUROGRAPHICS 2002 STAR (2002), http://www.eg.org/eg/dl/conf/eg2002/stars/s3_visualdatamining_mueller.pdf
Keim, D.: Information Visualization and Visual Data Mining. IEEE TVCG 7(1), 100–107 (2002)
MathSciNet Google Scholar
Keller, N., Pilpel, H.: Linear transformations of monotone functions on the discrete cube. Discrete Mathematics 309(12), 4210–4214 (2009)
Article MathSciNet MATH Google Scholar
Korshunov, A.D.: Monotone Boolean Functions. Russian Math. Surveys 58(5), 929–1001 (2003)
Article MathSciNet MATH Google Scholar
Kovalerchuk, B., Delizy, F.: Visual Data Mining using Monotone Boolean functions. In: Kovalerchuk, B., Schwing, J. (eds.) Visual and Spatial Analysis, pp. 387–406. Springer, Heidelberg (2005)
Google Scholar
Kovalerchuk, B., Triantaphyllou, E., Despande, A., Vityaev, E.: Interactive Learning of Monotone Boolean Functions. Information Sciences. Information Sciences 94(1-4), 87–118 (1996)
Article Google Scholar
Kovalerchuk, B., Vityaev, E., Ruiz, J.: Consistent and complete data and “expert” mining in medicine. In: Medical Data Mining and Knowledge Discovery, pp. 238–280. Springer, Heidelberg (2001)
Google Scholar
Kovalerchuk, B., Vityaev, E.: Data Mining in Finance: Advances in Relational and Hybrid Methods. Kluwer/Springer, Heidelberg, Dordrecht (2000)
MATH Google Scholar
Kovalerchuk, B., Perlovsky, L.: Fusion and Mining Spatial Data in Cyber-physical space with Phenomena Dynamic Logic. In: Proceedings of the 2009 International Joint Conference on Neural Networks, Atlanta, Georgia, USA, pp. 2440–2447 (2009)
Google Scholar
Kovalerchuk, B., Perlovsky, L.: Dynamic Logic of Phenomena and Cognition. In: Computational Intelligence: Research Frontiers, pp. 3529–3536. IEEE, Hong Kong (2008)
Google Scholar
Lim, S.: Interactive Visual Data Mining of a Large Fire Detector Database. In: International Conference on Information Science and Applications (ICISA), pp. 1–8 (2010), doi:10.1109/ICISA.2010.5480395
Google Scholar
Lim, S.: On A Visual Frequent Itemset Mining. In: Proc. of the 4th Int’l Conf. on Digital Information Management (ICDIM 2009), pp. 46–51. IEEE, Los Alamitos (2009)
Google Scholar
de Oliveira, M., Levkowitz, H.: From Visual Data Exploration to Visual Data Mining: A Survey. IEEE TVCG 9(3), 378–394 (2003)
Google Scholar
Pak, C., Bergeron, R.: 30 Years of Multidimensional Multivariate Visualization. In: Scientific Visualization, pp. 3–33. Society Press (1997)
Google Scholar
Shaw, C., Hall, J., Blahut, C., Ebert, D., Roberts, A.: Using shape to visualize multivariate data. In: CIKM 1999 Workshop on New Paradigms in Information Visualization and Manipulation, pp. 17–20. ACM Press, New York (1999)
Google Scholar
Ward, M.: A taxonomy of glyph placement strategies for multidimensional data visualization. Information Visualization 1, 194–210 (2002)
Article Google Scholar
Schulz, H., Nocke, T., Schumann, H.: A framework for visual data mining of structures. In: ACM International Conf. Proc Series, vol. 171; Proc. 29th Australasian Computer Science Conf., Hobart, vol. 48, pp. 157–166 (2006)
Google Scholar
Badjio, E., Poulet, F.: Dimension Reduction for Visual Data Mining. In: Stochastic Models and Data Analysis, ASMDA-2005 (2002), http://conferences.telecom-bretagne.eu/asmda2005/IMG/pdf/proceedings/266.pdf
Wong, P., Whitney, P., Thomas, j.: Visualizing Association Rules for Text Mining. In: Proc. of the IEEE INFOVIS, pp. 120–123. IEEE, Los Alamitos (1999)
Google Scholar
Wong, P.C.: Visual Data Mining. In: IEEE CG&A, pp. 20–21 (September/October 1999)
Google Scholar
Zhao, K., Bing, L., Tirpak, T.M., Weimin, X.: A visual data mining framework for convenient identification of useful knowledge. In: Fifth IEEE International Conference on Data Mining, 8 p (2005), doi:10.1109/ICDM.2005.16
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, Central Washington University, Ellensburg, WA, 9896-7520, USA
Boris Kovalerchuk, Florian Delizy & Logan Riggs
Institute of Mathematics, Russian Academy of Sciences, Novosibirsk, 630090, Russia
Evgenii Vityaev

Authors

Boris Kovalerchuk
View author publications
You can also search for this author in PubMed Google Scholar
Florian Delizy
View author publications
You can also search for this author in PubMed Google Scholar
Logan Riggs
View author publications
You can also search for this author in PubMed Google Scholar
Evgenii Vityaev
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Statistics andApplied Probability, University of California , 93106, Santa Barbara, CA, USA
Dawn E. Holmes
Knowledge-Based Engineering, University of South Australia, 5095, Adelaide Mawson Lakes, SA, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kovalerchuk, B., Delizy, F., Riggs, L., Vityaev, E. (2012). Visual Data Mining and Discovery with Binarized Vectors. In: Holmes, D.E., Jain, L.C. (eds) Data Mining: Foundations and Intelligent Paradigms. Intelligent Systems Reference Library, vol 24. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23241-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-23241-1_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23240-4
Online ISBN: 978-3-642-23241-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics