“Padding” bitmaps to support similarity and mining

Gelbard, Roy

doi:10.1007/s10796-011-9318-9

“Padding” bitmaps to support similarity and mining

Published: 26 July 2011

Volume 15, pages 99–110, (2013)
Cite this article

Information Systems Frontiers Aims and scope Submit manuscript

Roy Gelbard¹

187 Accesses
4 Citations
Explore all metrics

Abstract

The current paper presents a novel approach to bitmap-indexing for data mining purposes. Currently bitmap-indexing enables efficient data storage and retrieval, but is limited in terms of similarity measurement, and hence as regards classification, clustering and data mining. Bitmap-indexes mainly fit nominal discrete attributes and thus unattractive for widespread use, which requires the ability to handle continuous data in a raw format. The current research describes a scheme for representing ordinal and continuous data by applying the concept of “padding” where each discrete nominal data value is transformed into a range of nominal-discrete values. This "padding" is done by adding adjacent bits "around" the original value (bin). The padding factor, i.e., the number of adjacent bits added, is calculated from the first and second derivative degrees of each attribute’s domain-distribution. The padded representation better supports similarity measures, and therefore improves the accuracy of clustering and mining. The advantages of padding bitmaps are demonstrated on Fisher’s Iris dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Chan, C. Y.,& Ioannidis, Y. E. (1998). Bitmap index design and evaluation. Proceedings of the 1998 ACM SIGMOD international conference on Management of data, Seattle, Washington, pp. 355–366.
Dice, L. R. (1945). Measures of the amount of ecological association between species. Ecology, 26, 297–302.
Article Google Scholar
Erlich, Z., Gelbard, R., & Spiegler, I. (2002). Data mining by means of binary representation: a model for similarity and clustering. Information Systems Frontiers, 4(2), 187–197.
Article Google Scholar
Estivill-Castro, V., & Yang, J. (2004). Fast and robust general purpose clustering algorithms. Data Mining and Knowledge Discovery, 8, 127–150.
Article Google Scholar
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annual Eugenics, 7, 179–188.
Article Google Scholar
Gelbard, R., & Spiegler, I. (2000). Hempel’s raven paradox: a positive approach to cluster analysis. Computers and Operations Research, 27(4), 305–320.
Article Google Scholar
Gelbard, R., Goldman, O., & Spiegler, I. (2007). Investigating diversity of clustering methods: an empirical comparison. Data and Knowledge Engineering, 63, 155–166.
Article Google Scholar
Jain, A. K., & Dubes, R. C. (1988). Algorithms for Clustering Data. Prentice Hall.
Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM Communication Surveys, 31, 264–323.
Article Google Scholar
Johnson, T. (1999). Performance Measurements of Compressed Bitmap Indices. VLDB-1999, 25th International Conference on Very Large Data Bases, September 7–10, 1999, Edinburgh, Scotland, pp. 278–289.
Lim, T. S., Loh, W. Y., & Shih, Y. S. (2000). A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40(3), 203–228.
Article Google Scholar
O’Neil, P. E. (1987). Model 204 Architecture and Performance. Lecture Notes In Computer Science, Vol.359, Proceedings of the 2nd International Workshop on High Performance Transaction Systems, pp. 40–59.
Oracle corp. (1993). Database concept—overview of indexes—bitmap index. Retrieved July 2010, from Oracle site: http://download.oracle.com/docs/cd/B19306_01/server.102/b14220/schema.htm#sthref1008
Oracle corp. (2001). Data warehousing guide—using bitmap index in data warehousing. Retrieved July 2010, from Oracle site: http://download.oracle.com/docs/cd/B19306_01/server.102/b14223/indexes.htm#sthref349
Perlich, C., & Provost, F. (2006). Distribution-based aggregation for relational learning with identifier attributes. Machine Learning, 62, 65–105.
Article Google Scholar
Spiegler, I., & Maayan, R. (1985). Storage and retrieval considerations of binary data bases. Information Processing and Management, 21(3), 233–254.
Article Google Scholar
Zhang, B., & Srihari, S. N. (2003) Properties of binary vector dissimilarity measures. In JCIS CVPRIP 2003, Cary, North Carolina, pp. 26–30.
Zhang, B., & Srihari, S. N. (2004). Fast k-nearest neighbor classification using cluster-based trees. IEEE Trans Pattern Analysis and Machine Intelligence, 26(4), 525–528.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Information System Program, Graduate School of Business Administration, Bar-Ilan University, Ramat-Gan, 52900, Israel
Roy Gelbard

Authors

Roy Gelbard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roy Gelbard.

Appendices

Appendix A

Table 8 Fisher’s Iris dataset

Full size table

Appendix B

Table 9 Padded bitmap format

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gelbard, R. “Padding” bitmaps to support similarity and mining. Inf Syst Front 15, 99–110 (2013). https://doi.org/10.1007/s10796-011-9318-9

Download citation

Published: 26 July 2011
Issue Date: March 2013
DOI: https://doi.org/10.1007/s10796-011-9318-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

“Padding” bitmaps to support similarity and mining

Abstract

Access this article

Similar content being viewed by others

Siamese Neural Networks: An Overview

K-Means algorithm based on multi-feature-induced order

A hybrid method based on the completely positive-tensors and PCA for face recognition

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A

Appendix B

Rights and permissions

About this article

Cite this article

Keywords

Navigation

“Padding” bitmaps to support similarity and mining

Abstract

Access this article

Similar content being viewed by others

Siamese Neural Networks: An Overview

K-Means algorithm based on multi-feature-induced order

A hybrid method based on the completely positive-tensors and PCA for face recognition

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A

Appendix B

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation