Skip to main content

DACC: A Data Exploration Method for High-Dimensional Data Sets

  • Conference paper
  • First Online:
Artificial Intelligence and Algorithms in Intelligent Systems (CSOC2018 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 764))

Included in the following conference series:

  • 953 Accesses

Abstract

Data exploration has been proved to be an efficient solution to learn interesting new insights from dataset in an intuitional way. Typically, discovering interesting patterns and objects over high-dimensional dataset is often very difficult due to its large search space. In this paper, we developed a data exploration method named Decision Analysis of Cross Clustering (DACC) based on subspace clustering. It characterize the data objects in the representation of decision trees over divided clustering subspace, which help users quickly understand the patterns of the data and then make interactive exploration easier. We conducted a series of experiments over the real-world datasets and the results showed that, DACC is superior to the representative data explorative approach in term of efficiency and accuracy, and it is applicable for interactive exploration analysis of high-dimensional data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Idreos, S., Papaemmanouil, O., Chaudhuri, S.: Overview of data exploration techniques, pp. 277–281 (2015)

    Google Scholar 

  2. Abouzied, A., Hellerstein, J., Silberschatz, A.: DataPlay: interactive tweaking and example-driven correction of graphical database queries. In: Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, pp. 207–218. ACM (2012)

    Google Scholar 

  3. Dimitriadou, K., Papaemmanouil, O., Diao, Y.: Explore-by-example: an automatic query steering framework for interactive data exploration. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 517–528. ACM (2014)

    Google Scholar 

  4. Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Disc. Data (TKDD) 3(1), 1 (2009)

    Article  Google Scholar 

  5. Agrawal, R., Gehrke, J., Gunopulos, D., et al.: Automatic subspace clustering of high dimensional data for data mining applications. ACM (1998)

    Google Scholar 

  6. Meilă, M.: Comparing clusterings—an information based distance. J. Multivar. Anal. 98(5), 873–895 (2007)

    Article  MathSciNet  Google Scholar 

  7. Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor. Newsl. 6(1), 90–105 (2004)

    Article  Google Scholar 

  8. Aggarwal, C.C., Wolf, J.L., Yu, P.S., et al.: Fast algorithms for projected clustering. ACM SIGMoD Rec. 28(2), 61–72 (1999)

    Article  Google Scholar 

  9. Müller, E., Günnemann, S., Assent, I., et al.: Evaluating clustering in subspace projections of high dimensional data. Proc. VLDB Endow. 2(1), 1270–1281 (2009)

    Article  Google Scholar 

  10. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)

    Book  Google Scholar 

  11. Muller, E., Gunnemann, S., Farber, I., et al.: Discovering multiple clustering solutions: grouping objects in different views of the data. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 1207–1210. IEEE (2012)

    Google Scholar 

  12. Aghagolzadeh, M., Soltanian-Zadeh, H., Araabi, B., et al.: A hierarchical clustering based on mutual information maximization. In: IEEE International Conference on Image Processing, ICIP 2007, vol. 1, pp. I-277–I-280. IEEE (2007)

    Google Scholar 

  13. Langfelder, P., Zhang, B., Horvath, S.: Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for R. Bioinformatics 24(5), 719–720 (2007)

    Article  Google Scholar 

  14. Ho, T.K.: Random decision forests. In: International Conference on Document Analysis and Recognition, p. 278. IEEE (2002)

    Google Scholar 

  15. Müller, E., Assent, I., Günnemann, S., et al.: OpenSubspace: an open source framework for evaluation and exploration of subspace clustering algorithms in weka. In: Open Source in Data Mining Workshop (OSDM 2009) in Conjunction with 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2009), pp. 1–12 (2009)

    Google Scholar 

Download references

Acknowledgment

This work was supported by the Fund by National Natural Science Foundation of China (Grant No. 61462012, No. 61562010, No. U1531246), Guizhou University Graduate Innovation Fund (Grant No. 2017081) and the Innovation Team of the Data Analysis and Cloud Service of Guizhou Province (Grant No. [2015]53), Science and Technology Project of the Department of Science and Technology in Guizhou Province (Grant No. LH [2016]7427).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, Q., Li, H., Chen, M., Dai, Z., Zhu, M. (2019). DACC: A Data Exploration Method for High-Dimensional Data Sets. In: Silhavy, R. (eds) Artificial Intelligence and Algorithms in Intelligent Systems. CSOC2018 2018. Advances in Intelligent Systems and Computing, vol 764. Springer, Cham. https://doi.org/10.1007/978-3-319-91189-2_22

Download citation

Publish with us

Policies and ethics