Abstract
Data exploration has been proved to be an efficient solution to learn interesting new insights from dataset in an intuitional way. Typically, discovering interesting patterns and objects over high-dimensional dataset is often very difficult due to its large search space. In this paper, we developed a data exploration method named Decision Analysis of Cross Clustering (DACC) based on subspace clustering. It characterize the data objects in the representation of decision trees over divided clustering subspace, which help users quickly understand the patterns of the data and then make interactive exploration easier. We conducted a series of experiments over the real-world datasets and the results showed that, DACC is superior to the representative data explorative approach in term of efficiency and accuracy, and it is applicable for interactive exploration analysis of high-dimensional data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Idreos, S., Papaemmanouil, O., Chaudhuri, S.: Overview of data exploration techniques, pp. 277–281 (2015)
Abouzied, A., Hellerstein, J., Silberschatz, A.: DataPlay: interactive tweaking and example-driven correction of graphical database queries. In: Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, pp. 207–218. ACM (2012)
Dimitriadou, K., Papaemmanouil, O., Diao, Y.: Explore-by-example: an automatic query steering framework for interactive data exploration. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 517–528. ACM (2014)
Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Disc. Data (TKDD) 3(1), 1 (2009)
Agrawal, R., Gehrke, J., Gunopulos, D., et al.: Automatic subspace clustering of high dimensional data for data mining applications. ACM (1998)
Meilă, M.: Comparing clusterings—an information based distance. J. Multivar. Anal. 98(5), 873–895 (2007)
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor. Newsl. 6(1), 90–105 (2004)
Aggarwal, C.C., Wolf, J.L., Yu, P.S., et al.: Fast algorithms for projected clustering. ACM SIGMoD Rec. 28(2), 61–72 (1999)
Müller, E., Günnemann, S., Assent, I., et al.: Evaluating clustering in subspace projections of high dimensional data. Proc. VLDB Endow. 2(1), 1270–1281 (2009)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)
Muller, E., Gunnemann, S., Farber, I., et al.: Discovering multiple clustering solutions: grouping objects in different views of the data. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 1207–1210. IEEE (2012)
Aghagolzadeh, M., Soltanian-Zadeh, H., Araabi, B., et al.: A hierarchical clustering based on mutual information maximization. In: IEEE International Conference on Image Processing, ICIP 2007, vol. 1, pp. I-277–I-280. IEEE (2007)
Langfelder, P., Zhang, B., Horvath, S.: Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for R. Bioinformatics 24(5), 719–720 (2007)
Ho, T.K.: Random decision forests. In: International Conference on Document Analysis and Recognition, p. 278. IEEE (2002)
Müller, E., Assent, I., Günnemann, S., et al.: OpenSubspace: an open source framework for evaluation and exploration of subspace clustering algorithms in weka. In: Open Source in Data Mining Workshop (OSDM 2009) in Conjunction with 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2009), pp. 1–12 (2009)
Acknowledgment
This work was supported by the Fund by National Natural Science Foundation of China (Grant No. 61462012, No. 61562010, No. U1531246), Guizhou University Graduate Innovation Fund (Grant No. 2017081) and the Innovation Team of the Data Analysis and Cloud Service of Guizhou Province (Grant No. [2015]53), Science and Technology Project of the Department of Science and Technology in Guizhou Province (Grant No. LH [2016]7427).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Zhao, Q., Li, H., Chen, M., Dai, Z., Zhu, M. (2019). DACC: A Data Exploration Method for High-Dimensional Data Sets. In: Silhavy, R. (eds) Artificial Intelligence and Algorithms in Intelligent Systems. CSOC2018 2018. Advances in Intelligent Systems and Computing, vol 764. Springer, Cham. https://doi.org/10.1007/978-3-319-91189-2_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-91189-2_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91188-5
Online ISBN: 978-3-319-91189-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)