Abstract
The attention towards binary data coding increased consistently in the last decade due to several reasons. The analysis of binary data characterizes several fields of application, such as market basket analysis, DNA microarray data, image mining, text mining and web-clickstream mining. The paper illustrates two different approaches exploiting a profitable combination of clustering and dimensionality reduction for the identification of non-trivial association structures in binary data. An application in the Association Rules framework supports the theory with the empirical evidence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In P. Buneman & S. Jajodia (Eds.), ACM SIGMOD International Conference on Management of Data (Vol. 22, pp. 207–216). ACM Press: Washington, D.C.
Bruzzese, D., & Davino, C. (2000). Pruning, exploring and visualizing association rules. Statistica Applicata, 4(12), 461–472.
Cox, D. R. (1972). The analysis of multivariate binary data. Applied Statistics, 21, 113–120.
Greenacre, M. J. (2007). Correspondence analysis in practice (2nd ed.). London: Chapman and Hall/CR.
Greenacre, M. J., & Nenadic, O. (2005, September). Computation of multiple correspondence analysis, with code in R+. Economics working papers, Dept. of Economics and Bus., Universitat Pompeu Fabra. Retrieved from http://ideas.repec.org/p/upf/upfgen/887.html.
Iodice D’Enza, A., & Palumbo, F. (2007). Binary data flow visualization on factorial maps. Revue Modulad, 36. Retrieved from http://www-rocq.inria.fr/axis/modulad/.
Iodice D’Enza, A., Palumbo, F., & Greenacre, M. J. (2007). Exploratory data analysis leading towards the most interesting simple association rules. Computational Statistics and Data Analysis. doi:10.1016/j.csda.2007.10.006.
Lauro, C. N., & Balbi, S. (1999). The analysis of structured qualitative data. Applied Stochastic Models and Data Analysis, 15(1), 1–27.
Lauro, C. N., & D’Ambra, L. (1984). L’analyse non symmétrique des correspondances. In E. Diday et al. (Eds.), Data analysis and informatics (Vol. III). Amsterdam: North-Holland.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In L. M. Le Cam, & J. Neyman (Eds.), 5th Berkeley Sym. on Mathematical Statistics and Probability procs. Berkeley: University of California Press.
Palumbo, F., & Verde, R. (1996). Analisi fattoriale discriminante non-simmetrica su predittori qualitativi. In Atti della XXXVIII Riun. Scient. della SIS. Rimini.
Plasse, M., Niang, N., Saporta, G., Villeminot, A., & Leblond, L. (2007). Combined use of association rules mining and clustering methods to find relevant links between binary rare attributes in a large data set. Computational Statistics and Data Analysis. doi: 10.1016/j.csda.2007.02.020.
Saporta, G. (1975). Liason entre plusieurs ensembles de variables et codages de données qualitatives. Thése de III cycle, Univ. de Paris VI, Paris.
Vichi, M., & Kiers, H. A. L. (2001). Factorial k-means analysis for two-way data. Computational Statistics and Data Analysis, 37(1), 49–64.
Acknowledgements
We would like to express our gratitude to the referees whose comments and suggestions improved the quality of the paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Palumbo, F., D’Enza, A.I. (2009). Clustering and Dimensionality Reduction to Discover Interesting Patterns in Binary Data. In: Fink, A., Lausen, B., Seidel, W., Ultsch, A. (eds) Advances in Data Analysis, Data Handling and Business Intelligence. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01044-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-01044-6_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01043-9
Online ISBN: 978-3-642-01044-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)