Abstract
The current paper presents a novel approach to bitmapindexing for data mining purposes. Currently bitmapindexing enables efficient data storage and retrieval, but is limited in terms of similarity measurement, and hence as regards classification, clustering and data mining. Bitmapindexes mainly fit nominal discrete attributes and thus unattractive for widespread use, which requires the ability to handle continuous data in a raw format. The current research describes a scheme for representing ordinal and continuous data by applying the concept of “padding” where each discrete nominal data value is transformed into a range of nominaldiscrete values. This "padding" is done by adding adjacent bits "around" the original value (bin). The padding factor, i.e., the number of adjacent bits added, is calculated from the first and second derivative degrees of each attribute’s domaindistribution. The padded representation better supports similarity measures, and therefore improves the accuracy of clustering and mining. The advantages of padding bitmaps are demonstrated on Fisher’s Iris dataset.
Keywords
Bitmapindex Data representation Similarity index Cluster analysis Classification Data mining1 Introduction
Classification and clustering, as well as other data mining techniques, usually set up hypotheses to assign different objects to groups and classes based on the similarity/distance between them (EstivillCastro and Yang 2004; Jain and Dubes 1988; Jain et al 1999; Lim et al 2000; Zhang and Srihari 2004). These techniques are widely used in numerous fields such as medical diagnosis, technical support, direct marketing, customer segmentation, fraud detection, bioinformatics, etc.
Bitmapindexing enables efficient data storage and retrieval (Spiegler and Maayan 1985; O’Neil 1987; Chan and Ioannidis 1998; Johnson 1999), as well as clustering and mining (Erlich et al 2002; Gelbard et al. 2007). However, bitmapindexing by means of a binary data representation (assigning a ‘1’ or ‘0’ to each possible value of each attribute) is restricted to nominal discrete attributes which severely limits its use, since it requires the ability to handle continuous data in a raw format (Zhang and Srihari 2003). A bitmapindex representation does not preserve the natural numeric capability to “bind” close numerical values, which is fundamental to similaritydistance calculations as to classification and data mining techniques. A similar problem exists for high dimensional data attributes such as identifiers (IDs) or product numbers. Perlich and Provost (2006) suggest an interesting solution to dimensional attributes.
The current research describes and verifies a scheme for representing ordinal and continuous data by applying the concept of “padding” bins, produced by transforming each data value into a range of nominaldiscrete values (binning), by adding adjacent bits "around" the original value (bin). The padding factor, i.e., the number of adjacent bits added, is based on first and second derivative degrees of each attribute’s domaindistribution. The main benefit of the padded bitmap format is the improved accuracy of similarity measures, which thus leads to better clustering and mining.
The impact of the padded bitmap format on the accuracy of clustering and data mining is tested on Fisher’s Iris dataset (Fisher 1936), which is often referenced as a baseline in the field of classification and data mining. In the experiment, nine clustering algorithms were executed on the regular representation of the Iris dataset. Then the dataset was transformed into a padded bitmap format, and reclustered using the same algorithms. Comparison of the results demonstrates the advantages of the new approach for handling ordinal and continuous data attributes in classification, clustering and mining.
2 Theoretical background
2.1 Bitmapindex
Bitmapindexing is well known and widely used in database technologies such as DB2 and Oracle (O’Neil 1987; Oracle 1993), as well as in data warehouse technologies such as Sybase IQ and others (Chan and Ioannidis 1998; Oracle 2001). A bitmapindex creates a storage scheme in which data appear in binary form rather than the common numeric and alphanumeric formats. The dataset is viewed as a twodimensional matrix that relates entities to all attribute values these entities may assume. The rows represent entities and the columns represent possible values such that entries in the matrix are either ‘1’ or ‘0’, indicating that a given entity (e.g., record, object) has or does not have a given value, respectively (Spiegler and Maayan 1985).
A formal definition
Suppose we have n entities. For each entity, we construct a binary vector that represents the values of its attributes in binary form, as follows. Suppose that for each entity i (i = 1, 2, . . ., n) we have m attributes, a _{1}, a _{2}, . . ., a _{ m }. The domain of each attribute a _{ j } is all its possible values, where p _{ j } is the domain size. We assume that for each attribute a _{ j } (j = 1, 2, . . ., m), its domain consists of p _{ j } mutually exclusive possible values; i.e., for each attribute a _{ j }, an entity can attain exactly one of its p _{ j } domain values. Denoting the k ^{th} value of attribute a _{ j } (j = 1, 2, . . ., m; k = 1, 2, . . ., p _{ j }) by a _{ jk }, we can represent the domain attributes vector of all possible values of all m attributes as: (a _{11,} a _{12}, . . ., a _{1p1,} a _{21}, a _{22, .} _{.} _{. ,} a _{2p2}, . . . , a _{ m1}, a _{ m2, .} _{.} _{. ,} a _{ mpm})
Denoting the length of the domain attributes vector by p, we have: \( p = \sum\limits_{j = 1}^m {pj} \)

x _{ ijk } = 1 if for entity i, the value of attribute j is a _{ jk }

0 otherwise

i = 1, 2, . . ., n

j = 1, 2, . . ., m

k = 1, 2, . . ., p _{ j }
x _{ ijk } is the corresponding value for the k ^{th} value of attribute j (a _{ jk }) for entity i _{,} where x _{ ijk } is either ‘1’ or ‘0’, indicating that a given entity has or does not have a given value a _{ jk } for attribute j, respectively.
The binary vector, of length p, for entity i, is given by: (x _{ i11}, x _{ i12, . . .} , x _{ impm } )
We can express the mutual exclusivity property assumption for each entity and for each attribute over its domain, for each i and j, as: \( \sum\limits_{k = 1}^{pj} {xijk = 1} \) (i = 1, 2, . . ., n; j = 1, 2, . . ., m)
This yields the sum of all the 1’s in each binary vector as the number of attributes, m, i.e., for each i, \( \sum\limits_{j = 1}^m {\sum\limits_{k = 1}^{pj} {xijk = m} } \) (i = 1, 2, . . ., n)
For example

Attribute 1: Gender: with two (p _{ 1 } = 2) mutually exclusive values M (male), F (female).

Attribute 2: Marital status: with four (p _{ 2 } = 4) mutually exclusive values S (single), M (married), D (divorced), W (widowed).

Attribute 3: Education with five (p _{ 3 } = 5) mutually exclusive values: 1 (elementary), 2 (high school), 3 (college), 4 (undergraduate), and 5 (graduate).
Bitmap representation
Gender  Marital status  Education  

M  F  S  M  D  W  1  2  3  4  5  
i = 1  1  0  0  1  0  0  0  0  0  0  1 
2  1  0  0  0  1  0  0  0  1  0  0 
3  0  1  1  0  0  0  0  0  0  0  1 
4  1  0  0  0  0  1  0  0  0  0  1 
5  1  0  1  0  0  0  1  0  0  0  0 
6  1  0  0  0  1  0  0  0  1  0  0 
2.2 Similarity measures
Calculating similarity among data records is a fundamental function in data mining and particularly in clustering (Gelbard and Spiegler 2000).
Values assigned to distancesimilarity variables
Algorithm  Ax  Ay  B  C 

Group average  Nx / (Nx + Ny)  Ny / (Nx + Ny)  C  0 
Nearest neighbor  0.5  0.5  0  −0.5 
Furthest neighbor  0.5  0.5  0  0.5 
Median  0.5  0.5  −0.25  0 
Centroid  Nx / (Nx + Ny)  Ny / (Nx + Ny)  Ax * Ay  0 
Ward’s method  (Nz + Nx) / (Nx + Ny + Nz)  (Nz + Ny) / (Nx + Ny + Nz)  Nz / (Nx + Ny + Nz)  0 
However, these likelihoodsimilarity measures are applicable only to ordinal/continuous attributes and cannot be used to classify nominal, discrete, or categorical attributes.
 Na

the number of ‘1’s in sequence a.
 Nb

the number of ‘1’s in sequence b.
 Nab

the number of ‘1’s common to both a and b.
3 The padding model
As seen, the bitmap representation is limited to nominal discrete attributes. To enable handling ordinal and continuous data in a binary format, we introduce a new concept called padding, to create “padded” bitmaps. Adding padded bits is not adding “noise” to the binary data as it is not randomly generated. Padding involves recovering some of the implicit similarity inherent to the numerical scale that “got lost” in the bitmapindexing. A similar way of representing numeric features is temperature coding, as discussed in Perlich and Provost (2006). The conversion of ordinal/continuous data into a padded bitmap format, i.e., into a range of nominaldiscrete values, has three stages: (a) Binning stage, (b) Padding stage, and (c) Determining the Padding Factor.
3.1 The binning stage
Illustration of the Binning stage
Mutually exclusive representation  0.0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9 

0.0  1  0  0  0  0  0  0  0  0  0 
0.2  0  0  1  0  0  0  0  0  0  0 
0.3  0  0  0  1  0  0  0  0  0  0 
0.7  0  0  0  0  0  0  0  1  0  0 
0.9  0  0  0  0  0  0  0  0  0  1 
Since the described binning process always produces a finite number of bins, one can also consider it as a private case suitable for situations where the candidate variable assumes finite number of relevant possible categories. Although such a representation is precise and preserves data accuracy; i.e., there is neither loss of information nor rounding off of any value, mutual exclusivity of the binning representation causes the “isolation” of each value. Normally and intuitively, we assume that 0.2 is closersimilar to 0.3 than to 0.7; but in converting such values into bitmap representation we lose those basic numerical relations. In the Dice similarity measure, the similarity between any pair of such values is always 0; the same goes for the HD similarity measure, for which the similarity is always 0.8, for {0.2, 0.3}, as it is for any other pair.
Losing these basic similarity “intuitions” is the main drawback of the bitmap representation.
3.2 The padding stage
Reformulating the intuition in which 0.2 is closersimilar to 0.3 than to 0.7 is done by padding each bin according to first and second derivative degrees of the attribute domain, as it appears in the dataset, or if known, of the entire population.
Similarity values between pairs of Padded vectors
#  Padding factor  The sequences  Dice  

0.0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  
1  P.F. = 0  0  0  1  0  0  0  0  0  0  0  0.00 
0  0  0  0  0  0  0  1  0  0  
 
2  P.F. = 1  0  1  1  1  0  0  0  0  0  0  0.00 
0  0  0  0  0  0  1  1  1  0  
 
3  P.F. = 2  1  1  1  1  1  0  0  0  0  0  0.00 
0  0  0  0  0  1  1  1  1  1  
 
4  P.F. = 3  1  1  1  1  1  1  0  0  0  0  0.33 
0  0  0  0  1  1  1  1  1  1  
 
5  P.F. = 4  1  1  1  1  1  1  1  0  0  0  0.57 
0  0  0  1  1  1  1  1  1  1  
 
6  P.F. = 5  1  1  1  1  1  1  1  1  0  0  0.75 
0  0  1  1  1  1  1  1  1  1  
 
7  P.F. = 6  1  1  1  1  1  1  1  1  1  0  0.89 
0  1  1  1  1  1  1  1  1  1  
 
8  P.F. = 7  1  1  1  1  1  1  1  1  1  1  1.00 
1  1  1  1  1  1  1  1  1  1 
Each line illustrates different padding factors. The padding factor in line No.2 is noted 1, i.e., one additional ‘1’ bit, on the right and left sides of the original bin. Similarly, the padding factor in line No.3 is 2, i.e., it has two additional ‘1’s on the right side and on the left side of the original bin; this continues up to line No.8, in which the padding factor is 7. The original binary vectors (presented in line No.1) have a padding factor of 0.
The right column in Table 4 presents the calculated similarity between the two vectors in each pair according to the Dice similarity index. The obvious issue is to determine the right padding factor to yield the best information for clustering.
3.3 Illustration of the model
Attributes of the Fisher’s Iris dataset
PW  SL  SW  PL  

Min. Value  0.1  1  2  4.3 
Max. Value  2.5  6.9  4.4  7.9 
Min. Interval  0.1  0.1  0.1  0.1 
Number of Bins  25  60  25  37 
The histograms in Fig. 1 show that at a padding factor of 8 the probability function of the Sepal Length (SL) attribute can be regarded as a normal distribution, whereas at a padding factor of 3 we can observe several midpoints. Therefore, there is a need for a mechanism or rule to determine the right padding factor to yield the best clustering.
Figure 2 shows that, for almost all padding factors, the probability function of the Sepal Width (SW) attribute resembles a normal distribution. The definition of the right padding factor will not affect the probability function of the SW attribute, but may affect mutual relations with the other three attributes.
The histograms in Fig. 3 show that at padding factor of 2 the probability function of the Petal Width (PW) attribute indicates three noticeable values, whereas at higher factors the distinction becomes vague.
Figure 4 illustrates a different phenomenon, in which for all padding factors the probability function of the Petal Length (PL) attribute cannot be regarded as a normal distribution. However, we can identify at least two noticeable values, which probably indicate different Iris species. Such phenomena may probably suggests two clusters; however, at this point, given one attribute (the PL) we have no way to determine whether there are two or more clusters.

The block of lowrange values designated by the letter “A” matches Iris Setosa.

The block of midrange values designated by the letter “B” matches Iris Versicolor.

The block of highrange values designated with the letter “C” matches Iris Verginica.
4 Determining the padding factor

Firstorder derivation degree of the attribute domain, i.e., growth rates of the probability function related to changes in the padding factor. The term "Growth Rate" refers to the maximal percent of records with a common bin value, i.e., the value of 1.00 is achieved when all 150 entities have a common bin value. For example, the PW attribute, which has 24 possible padding factors (since there are 25 possible values in this attribute), reaches the value of 1.00 at a padding factor of 12. The term "Growth Derivation" refers to the first derivation of the Growth Rate.

Secondorder derivation degree of the attribute domain, i.e., reflection points at the firstorder derivation of the probability function.

The required padding factor is the first local minima, i.e., the first reflection point.

PWpadding factor of 2, i.e., each value is represented by 5 bins.

PLpadding factor of 5, i.e., each value is represented by 11 bins.

SWpadding factor of 6, i.e., each value is represented by 13 bins.

SLpadding factor of 3, i.e., each value is represented by 7 bins.
5 Clustering continuous data vs. padded bitmap data
The proposed model was evaluated using Fisher’s Iris dataset (Fisher1936). Nine clustering algorithms were executed on the regular Iris dataset (continuous data). Then the dataset was transformed into a padded bitmap format, applying the relevant padding factor for each attribute (as discussed in the previous chapter). The padded bitmap was reclustered using the same nine algorithms. It worth mention that the evaluation objective is not to search for the best classification algorithm, but to show the merit of the proposed representation to the accuracy of clusteringclassification produced by diverse algorithms.
5.1 Tools and research process
 1.
Fisher’s Iris dataset is available on all SPSS versions (see also Appendix A). It consists of 3 clusters of 50 samples (entities) each. Each cluster corresponds to one species of the Iris flower: Iris Setosa (cluster C1), Iris Versicolor (cluster C2), and Iris Verginica (cluster C3). Each species (entity) has four features (attributes), representing Petal Width (PW), Petal Length (PL), Sepal Width (SW), and Sepal Length (SL); all attributes are expressed in centimeters (i.e. continuous attributes). Appendix A presents Fisher’s entire Iris dataset.
 2.
Converting the Iris dataset from continuous values into bitmap representation was done using a MatLab application developed for this purpose.
 3.
Extracting the padding factor and then Bitmap padding accordingly, was done using another MatLab application.
 4.
Nine cluster algorithms (discussed in Section 5.2) were executed using SPSS version 13.0.
 5.
Results and graphs were edited using Microsoft Excel for display purposes.
5.2 Clustering algorithms
This section provides a brief description of the nine clustering algorithms used in this study.
Twostep
This algorithm is applicable to both ordinal (continuous) and nominal discrete (categorical) attributes. It is based, as its name implies, on two passes of the dataset. The first pass divides the dataset into a coarse set of subclusters, and the second pass groups the subclusters into the desired number of clusters. This algorithm is dependent on the order of the samples and may produce different results based on the initial order. The desired number of clusters can be determined automatically or it can be a predetermined fixed number of clusters. We used the fixed number of clusters option in our analysis, so as to be able to use this algorithm in conjunction with the other algorithms chosen for this study.
Kmeans
This algorithm is applicable to both ordinal (continuous) and nominal discrete (categorical) attributes. One of its requirements is that the number of clusters used to classify the dataset is predetermined. It is based on determining arbitrary centers for the desired clusters, associating the samples with the clusters by using a predetermined distance measurement, iteratively changing the center of the clusters and then reassociating the samples. The length of the process is highly dependent on the initial setting of the cluster centers and can be improved when there is knowledge as to the location of these cluster centers.
Hierarchical methods
This set of algorithms work in a similar manner. These algorithms take the dataset properties that need to be clustered and start by classifying the dataset such that each sample represents a cluster. Next, it merges the clusters in steps: each step merges two clusters into a single cluster, until there is only one cluster (the dataset) remaining. The algorithms differ in the way in which distance is measured between clusters, mainly by using two parameters: the distance, or likelihood, measure, e.g., Euclidean, Dice, etc. and the cluster method, e.g., between group linkage, nearest neighbor, etc.

Within Groups Average—This method calculates the distance between two clusters by applying the likelihood measure to all the samples in the two clusters. The clusters with the highest average likelihood measure are then united.

Between Groups Average—This method calculates the distance between two clusters by applying the likelihood measure to all the samples of one cluster and then comparing it with all the samples of the other cluster. Again, the two clusters with the highest likelihood measure are then united.

Nearest Neighbor—This method, as in the Average Linkage (Between Groups) method, calculates the distance between two clusters by applying the likelihood measure to all the samples of one cluster and then comparing it with all the samples of the other cluster. The two clusters with the highest likelihood measure, from a pair of samples, are then united.

Furthest Neighbor—This method, like the previous methods, calculates the distance between two clusters by applying the likelihood measure to all the samples of one cluster and then comparing it with all the samples of another cluster. For each pair of clusters, the pair with the lowest likelihood measure is taken. The two clusters with the highest likelihood measure of those pairs are then united.

Centroid—This method calculates the centroid of each cluster by calculating the mean average of all the properties for all the samples of each cluster. The likelihood measure is then applied to the means of the clusters and the clusters with the highest likelihood measure between their centroids are then united.

Median—This method calculates the median of each cluster. The likelihood measure is applied to the medians of the clusters and the clusters with the highest median likelihood are then united.

Ward’s Method—This method calculates the centroid for each cluster and the square of the likelihood measure of each sample in the cluster and the centroid. The two clusters which when united have the smallest (negative) effect on the sum of likelihood measures are the clusters that need to be united.
5.3 Impact on clusteringclassification accuracy

Cluster C1 contains specimens 1–50 which relate to the Iris Setosa species.

Cluster C2 contains specimens 51–100 relate to the Iris Versicolor species.

Cluster C3 contains specimens 101–150 relate to the Iris Verginica species.
5.3.1 Matching evaluation
Crosstab table for results of the KMeans algorithm
Common clusters  Group A  Group B  Group C  Total members 

C1  50  0  0  50 
C2  0  47  3  50 
C3  0  14  36  50 
Total members  50  61  39  150 
Score I  1  0.94  0.72  0.89 
Score II  1  0.66  0.66  0.77 

Matching of Group A and C1 = 100% identity (50/50).

Matching of Group B and C2 = 94% identity (47/50).

Matching of Group C and C3 = 72% identity (36/50).
The average matching factor, presented in row Score I, is therefore: (1 + 0.94 + 0.72)/3 = 0.89

Matching of Group A and C1 = 100% (50/50).

Irrelevant members in Group A = 0%

Total score of “Group A = 100%  0% = 100%

Matching of Group B and C2 = 94% (47/50).

Irrelevant members in Group B = 28% (14/50).

Total score of Group B = 94%–28% = 66%

Matching of Group C and C3 = 72% (36/50).

Irrelevant members in Group C = 6% (3/50).

Total score of Group C = 72%–6% = 66%
The average matching factor presented in row Score II is: (1 + 0.66 + 0.66)/3 = 0.77
5.3.2 Scoring comparison
Scoring the classification results
Score  Standard Iris Dataset  Padded bitmap format & binary similarity measure 

Algorithm  
Nearest neighbor  0.68  0.72 
Between groups average  0.75  0.79 
Median  0.75  0.80 
Within groups average  0.84  0.90 
Furthest neighbor  0.84  0.91 
Twostep  0.86  0.92 
Kmeans  0.89  0.94 
Centroid  0.89  0.95 
Ward’s method  0.91  0.97 
The results show strong evidence for the efficiency of the combination of a padded bitmap representation and a binary similarity measure. The results of each algorithm were improved about 6–8%, simply by using this combination of representation and similarity. Table 7 presents the results of all nine algorithms in ascending order.
6 Summary and conclusions
Data representation is an essential part of similarity calculations, as well as in classification and data mining techniques. Various representation forms, measures and algorithms have been developed, one of which is the bitmap index. A bitmap format representation requires a binary similarity measure that focuses on the positive aspect of the data (‘1’ values), such as Dice measure. Earlier research suggested that coupling a binary similarity measure and a bitmap format representation could yield better clustering results than similarity indexes based upon the regular data representation. However, the bitmap format representation is currently limited to nominal and discrete attributes, which makes this approach unattractive for widespread use.
This paper describes new approach for representing and handling ordinal and continuous data in the form of a padded bitmap. Each ordinal or continuous value is converted into a range of bins, and then the 1's values (that represent the original values) are “padded” with adjacent 1's (around the original value), according to first and second derivative degrees of the attribute domain. The main benefit of the padded representation is improved accuracy of classification and data mining products.
The suggested methodology, that involves converting ordinal and continuous data into padded bitmap, is general and wide ranging, thus enabling the conversion of any numerical dataset into a padded bitmap representation form.
The impact of the proposed model on the accuracy of classification was tested and verified using Fisher's Iris dataset and nine common clustering algorithms. The accuracy of the resultant clusters was evaluated based upon crosstab tables which indicated the percentage of the right assignments for each cluster. Strong evidence for the efficiency of the combination of the padded bitmap format and the binary similarity measure was found. Each algorithm was enhanced about 7% simply as a result of using this combination. Although there are domains in which such improvements are insignificant, in other domains, such as medical diagnostics, such improvements are crucial.
6.1 Limitations and future research
There are diverse classification techniques, BayesNets, NeuralNets, Regressions, Clustering, DecisionTrees, DecisionRules etc. The current study is focused on supervised problems (i.e., problems in which the classification is known and agreed) using clustering techniques.
Further research should test: (a) Additional classification techniques. (b) Unsupervised clustering problems (where the classification is unsettled), which are quit common in the business world, hence the importance of their being formalized and evaluated. (c) Extending the padding model by assigning probabilities to the padded bins, rather than 1's, for better representing the probability function of the entire population. (d) Adjusted similarity index, Dice like, to support the extended padding model.
References
 Chan, C. Y.,& Ioannidis, Y. E. (1998). Bitmap index design and evaluation. Proceedings of the 1998 ACM SIGMOD international conference on Management of data, Seattle, Washington, pp. 355–366.Google Scholar
 Dice, L. R. (1945). Measures of the amount of ecological association between species. Ecology, 26, 297–302.CrossRefGoogle Scholar
 Erlich, Z., Gelbard, R., & Spiegler, I. (2002). Data mining by means of binary representation: a model for similarity and clustering. Information Systems Frontiers, 4(2), 187–197.CrossRefGoogle Scholar
 EstivillCastro, V., & Yang, J. (2004). Fast and robust general purpose clustering algorithms. Data Mining and Knowledge Discovery, 8, 127–150.CrossRefGoogle Scholar
 Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annual Eugenics, 7, 179–188.CrossRefGoogle Scholar
 Gelbard, R., & Spiegler, I. (2000). Hempel’s raven paradox: a positive approach to cluster analysis. Computers and Operations Research, 27(4), 305–320.CrossRefGoogle Scholar
 Gelbard, R., Goldman, O., & Spiegler, I. (2007). Investigating diversity of clustering methods: an empirical comparison. Data and Knowledge Engineering, 63, 155–166.CrossRefGoogle Scholar
 Jain, A. K., & Dubes, R. C. (1988). Algorithms for Clustering Data. Prentice Hall.Google Scholar
 Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM Communication Surveys, 31, 264–323.CrossRefGoogle Scholar
 Johnson, T. (1999). Performance Measurements of Compressed Bitmap Indices. VLDB1999, 25th International Conference on Very Large Data Bases, September 7–10, 1999, Edinburgh, Scotland, pp. 278–289.Google Scholar
 Lim, T. S., Loh, W. Y., & Shih, Y. S. (2000). A comparison of prediction accuracy, complexity, and training time of thirtythree old and new classification algorithms. Machine Learning, 40(3), 203–228.CrossRefGoogle Scholar
 O’Neil, P. E. (1987). Model 204 Architecture and Performance. Lecture Notes In Computer Science, Vol.359, Proceedings of the 2nd International Workshop on High Performance Transaction Systems, pp. 40–59.Google Scholar
 Oracle corp. (1993). Database concept—overview of indexes—bitmap index. Retrieved July 2010, from Oracle site: http://download.oracle.com/docs/cd/B19306_01/server.102/b14220/schema.htm#sthref1008
 Oracle corp. (2001). Data warehousing guide—using bitmap index in data warehousing. Retrieved July 2010, from Oracle site: http://download.oracle.com/docs/cd/B19306_01/server.102/b14223/indexes.htm#sthref349
 Perlich, C., & Provost, F. (2006). Distributionbased aggregation for relational learning with identifier attributes. Machine Learning, 62, 65–105.CrossRefGoogle Scholar
 Spiegler, I., & Maayan, R. (1985). Storage and retrieval considerations of binary data bases. Information Processing and Management, 21(3), 233–254.CrossRefGoogle Scholar
 Zhang, B., & Srihari, S. N. (2003) Properties of binary vector dissimilarity measures. In JCIS CVPRIP 2003, Cary, North Carolina, pp. 26–30.Google Scholar
 Zhang, B., & Srihari, S. N. (2004). Fast knearest neighbor classification using clusterbased trees. IEEE Trans Pattern Analysis and Machine Intelligence, 26(4), 525–528.CrossRefGoogle Scholar