Skip to main content

Clustering and Dimensionality Reduction to Discover Interesting Patterns in Binary Data

  • Conference paper
  • First Online:
Advances in Data Analysis, Data Handling and Business Intelligence

Abstract

The attention towards binary data coding increased consistently in the last decade due to several reasons. The analysis of binary data characterizes several fields of application, such as market basket analysis, DNA microarray data, image mining, text mining and web-clickstream mining. The paper illustrates two different approaches exploiting a profitable combination of clustering and dimensionality reduction for the identification of non-trivial association structures in binary data. An application in the Association Rules framework supports the theory with the empirical evidence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In P. Buneman & S. Jajodia (Eds.), ACM SIGMOD International Conference on Management of Data (Vol. 22, pp. 207–216). ACM Press: Washington, D.C.

    Google Scholar 

  • Bruzzese, D., & Davino, C. (2000). Pruning, exploring and visualizing association rules. Statistica Applicata, 4(12), 461–472.

    Google Scholar 

  • Cox, D. R. (1972). The analysis of multivariate binary data. Applied Statistics, 21, 113–120.

    Article  Google Scholar 

  • Greenacre, M. J. (2007). Correspondence analysis in practice (2nd ed.). London: Chapman and Hall/CR.

    MATH  Google Scholar 

  • Greenacre, M. J., & Nenadic, O. (2005, September). Computation of multiple correspondence analysis, with code in R+. Economics working papers, Dept. of Economics and Bus., Universitat Pompeu Fabra. Retrieved from http://ideas.repec.org/p/upf/upfgen/887.html.

  • Iodice D’Enza, A., & Palumbo, F. (2007). Binary data flow visualization on factorial maps. Revue Modulad, 36. Retrieved from http://www-rocq.inria.fr/axis/modulad/.

  • Iodice D’Enza, A., Palumbo, F., & Greenacre, M. J. (2007). Exploratory data analysis leading towards the most interesting simple association rules. Computational Statistics and Data Analysis. doi:10.1016/j.csda.2007.10.006.

    Google Scholar 

  • Lauro, C. N., & Balbi, S. (1999). The analysis of structured qualitative data. Applied Stochastic Models and Data Analysis, 15(1), 1–27.

    Article  MATH  MathSciNet  Google Scholar 

  • Lauro, C. N., & D’Ambra, L. (1984). L’analyse non symmétrique des correspondances. In E. Diday et al. (Eds.), Data analysis and informatics (Vol. III). Amsterdam: North-Holland.

    Google Scholar 

  • MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In L. M. Le Cam, & J. Neyman (Eds.), 5th Berkeley Sym. on Mathematical Statistics and Probability procs. Berkeley: University of California Press.

    Google Scholar 

  • Palumbo, F., & Verde, R. (1996). Analisi fattoriale discriminante non-simmetrica su predittori qualitativi. In Atti della XXXVIII Riun. Scient. della SIS. Rimini.

    Google Scholar 

  • Plasse, M., Niang, N., Saporta, G., Villeminot, A., & Leblond, L. (2007). Combined use of association rules mining and clustering methods to find relevant links between binary rare attributes in a large data set. Computational Statistics and Data Analysis. doi: 10.1016/j.csda.2007.02.020.

    MATH  MathSciNet  Google Scholar 

  • Saporta, G. (1975). Liason entre plusieurs ensembles de variables et codages de données qualitatives. Thése de III cycle, Univ. de Paris VI, Paris.

    Google Scholar 

  • Vichi, M., & Kiers, H. A. L. (2001). Factorial k-means analysis for two-way data. Computational Statistics and Data Analysis, 37(1), 49–64.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

We would like to express our gratitude to the referees whose comments and suggestions improved the quality of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesco Palumbo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Palumbo, F., D’Enza, A.I. (2009). Clustering and Dimensionality Reduction to Discover Interesting Patterns in Binary Data. In: Fink, A., Lausen, B., Seidel, W., Ultsch, A. (eds) Advances in Data Analysis, Data Handling and Business Intelligence. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01044-6_4

Download citation

Publish with us

Policies and ethics