rCOSA: A Software Package for Clustering Objects on Subsets of Attributes

rCOSA is a software package interfaced to the R language. It implements statistical techniques for clustering objects on subsets of attributes in multivariate data. The main output of COSA is a dissimilarity matrix that one can subsequently analyze with a variety of proximity analysis methods. Our package extends the original COSA software (Friedman and Meulman, 2004) by adding functions for hierarchical clustering methods, least squares multidimensional scaling, partitional clustering, and data visualization. In the many publications that cite the COSA paper by Friedman and Meulman (2004), the COSA program is actually used only a small number of times. This can be attributed to the fact that this original implementation is not very easy to install and use. Moreover, the available software is out-of-date. Here, we introduce an up-to-date software package and a clear guidance for this advanced technique. The software package and related links are available for free at: https://github.com/mkampert/rCOSA.


INTRODUCTION
Visual representations of dissimilarities (proximities, distances) are advantageous for discovery, identification and recognition of structure in many fields that apply statistical methods.Clustering objects in multivariate (attribute-value) data is a highly popular data analysis objective.
Distance-based methods define a measure of similarity, e.g. a composite based on distance derived from each attribute separately.Let an object i be defined as o i = x i = (x i1 , x i2 , . . ., x ik ), where {x ik } P k=1 denotes the measured attributes on each object i. X denotes the data matrix of size N × P , with N objects and P attributes (or variables).For each attribute k, we calculate the distance d ijk between a pair of objects i and j as follows: with s k a scale factor, a measure of dispersion.If s k is set as σ √ n , with σ the standard deviation of x k , then d ijk is the distance between object i and j in the standardized variable x k .For categorical attributes, we calculate the d ijk of object pair i and j as with s k a suitable scale factor for categorical variables.When all attributes are numeric, and we set s k equal to σ √ n , then the sum of all attribute distances for objects i and j defines the L 1 distance for standardized variables.The squared Euclidean distance would be obtained by taking

CLUSTERING ON SUBSETS OF ATTRIBUTES
The focus on clustering of objects on subsets of attributes was motivated by the presence of high-dimensional data, emerging from fields like genomics (e.g., gene expression micro-array data), and metabolomics (e.g., LC-MS data), where the data consist of a very large number of attributes/variables compared to a relatively small number of objects.Ordinary clustering techniques, based on (3) or (4) use equal weights for each attribute, and this might cause masking of existing clustering, because with a large number of attributes, it is very unlikely that objects cluster on all attributes.Instead, objects might be preferentially close on some attributes and far apart on others.This situation calls for feature selection, or assigning a different weight to each attribute; applying a clustering procedure to Euclidean and L 1 -distances does not perform well in general when only a few attributes contain signals and most others contain noise.In such situations, clustering applied to dissimilarities that incorporate variable weighting are much more likely to succeed in finding groups in the data.Figure 1 shows a display for a toy-example data set for which it will be unlikely that clustering of either Euclidean or L 1 distances would capture the signal.N(0, 1) Figure 1: A Monte Carlo data set X with 60 objects (vertical) and 500 attributes (horizontal, not all of them are shown due to P >> N ).There are three groups of 20-objects each (red, green, and blue) clustering on 50 attributes.Note that i and k are ordered into i ′ and k ′ , respectively, to show cluster blocks.
For data as displayed in Figure 1, we can expect the clustering procedure to be successful when the dissimilarity measure would incorporate variable selection/weighting.In partitional clustering the weighting of attributes has received considerable attention (For example, De Sarbo, Carroll, Clarck and Green 1984;Steinley and Brusco 2008;Jain 2010;Andrews and McNicholas 2014), but not so for dissimilarity and distance functions.There are studies where attribute weighting is applied, but either these methods are not capable to capture signal in highdimensional data settings where P >> N , or have as sole purpose to fit a tree in hierarchical clustering (Sebestyen 1962, De Soete, De Sarbo andCarroll 1985;De Soete 1985;Amorim 2015).
Sparse clustering (SPARCL) by Witten and Tibshirani (2010) can output an attribute weighted dissimilarity measure for the objects.
Denote the element w k as the weight for attribute dissimilarity d ijk , then the composite dissimilarity measure that incorporates variable weights is given in (5) (5) As we shall see below, restrictions are needed on the w = {w k } to prevent degenerate solutions (also, see Witten and Tibshirani (2010)).We will start our discussion with the case were only one subset of attributes is important for all groups of objects, and where the groups only differ in their means.This particular case was displayed in Figure 1.
It is important to realize that in this example, all objects are assumed to be in clusters; there are no objects in the data that do not belong to one of the clusters.This is a very particular structure, and is unlikely to be present in many high-dimensional settings.In many data sets, one can hope to find one or more clusters of objects, while the remainder of the objects are not close to any of the other objects.Moreover, it could very well be true that one cluster of objects is present in one subset of attributes, while another cluster is present in another subset of attributes.In this case, the subsets of attributes are different for each cluster of objects.In general, the subsets may be overlapping or partially overlapping, but they may also be disjoint.
An example is shown in Figure 2; the display shows a typical structure in which the groups of

COSA DISSIMILARITIES
The approach that is used in COSA, is to modify the original cluster criterion, defined on L 1 distances (3), by using a very particular distance instead, for which the equal weights starting point is not detrimental.During the search, in which the optimal weights are found, this particular distance transitions into an ordinary weighted L 1 distance.A penalty is used to avoid obtaining subsets that are trivially small (e.g., consisting of a single attribute).In this section, we will briefly give the technical details.Friedman and Meulman (2004) propose an algorithm that uses the weighted inverse exponential distance, defined as where λ is a scale parameter, defining "closeness" between objects.Because the distance basically gives emphasis to all distances smaller than a particular value for λ for any value of the weights including w k = 1/P .If we define a parameter η, and define then when η increases from λ to ∞, we obtain a transition from the weighted inverse exponential distance to the weighted L 1 distance, because By using this so-called homotopy strategy, COSA attemps to avoid local minima by starting the iterative process with inverse exponential distances (where equal weighting is not detrimental) that will change into ordinary L 1 distances during the process.
The weight for a pair of objects is then defined as: subject to 0 ≤ w ik ≤ 1 and Object pairs that belong to the same cluster will obtain weights that are more similar compared to object pairs that don't belong to the same cluster.The COSA dissimilarity in 10 can uncover groups that cluster on their own set of attributes.The larger the difference between w ik and w jk , the larger the dissimilarity The COSA weights and the associated dissimilarities are found by minimizing the criterion Here, K is a pre-set number of nearest neighbors, by default set to K = f loor( √ N ).The j ∈ KNN(i) denotes the j = 1 . . .K nearest neighbor objects for object i.The w i vector, is the i th row of W, and makes the minimization problem linear since the term max(w ik , w jk ) is now absent in (11).Equation ( 11) is written as a Lagrangian form, an equivalent way to write the equation is subject to The Lagrangian term t i , also called the penalty regularized by λ, ensures that subsets of attributes will not be trivially small.The larger λ, the smaller the penalty t, and the more equal the weights for the attributes.Vice versa, the smaller λ, the larger the penalty t i , and hence, the stronger a subset of attributes is favored over others.
For known j ∈ KNN(i) and λ, there is an analytical solution for W that minimizes Q(W) in ( 11) and (12); to be specific an element w ik is obtained as , on which we base the j ∈ KNN(i), are not known beforehand, we have to iteratively minimize the criterion.Summarizing, COSA uses a homotopy strategy by starting with the inverse exponential distance with η the homotopy parameter.During the iteration process, the inverse exponential distance transitions into the L 1 distance by slowly increasing the value of η.As is mentioned in Friedman and Meulman (2004), the correlation between the two set of distances is already .91for η =1, and .97 for η =2 for distances derived from normally distributed attribute values w ik and equal weights w k = 1/P .
Having defined the necessary ingredients, we can now summarize the COSA algorithm in the following six steps: COSA Algorithm Compute weights W (13)

and W}
We refer to Friedman and Meulman (2004) for more details and properties of the algorithm.

TARGETING
Until now the COSA clustering could be on any possible joint values on subsets of attributes.
Alternatively, we could wish to look for clusters that group only on particular values, say t k , which are possibly different for each attribute k.The {t k } are chosen to be of special interest; we reduce the search space, and would hope to be more likely to recover clusters.Examples are groups of consumers (objects) that spend relatively large amounts on products (attributes), while we wish to ignore consumers who spend relatively small or average amounts.(Or the other way around.)If we focus on one particular value, we call this single targeting.We modify the original distance between objects o i and o j on attribute k, d ijk = d k (x ik , x jk ), into targeted distances, and require objects o i and o j to be close to each other and to the particular target.
The so-called single target distance is defined as: where t k is the target value, e.g., a high or low or even average value.This distance is small only if both objects o i and o j are close to the target value t k on attribute k.In addition to single targeting, we can also focus on two different targets, being naturally either high or low values.An example is in micro array data, where we could search for clusters of samples with either high or low (but not moderate) expression levels on subsets of genes (attributes).In dual targeting, we define two targets t k and u k , and we use the dual target distance

INSTALLING AND USING COSA
In the sequel of this paper, we will present the new version of COSA implemented as a package for the statistical computing language R (R Core Team 2014).Compared to the old software, the current installation is much simpler, and it is extended with functions for multidimensional scaling, graphics, and M -groups clustering methods.For every function in rCOSA there is a help file with example code that users can run.The software is available for up-to-date Windows, Mac OSx and Linux platforms.The rCOSA package, the user manual (vignette) and related links are available for free on the web at: https://github.com/mkampert/rCOSA.To install rCOSA, run the following code in R.

USING COSA
We illustrate the rCOSA package using a data set based on the simulation model shown in Figure 2. The data set X N ×P (with N = 100 and P = 1000) contains two groups, and background noise.The two groups share a subset of 15 attributes, of which each Each groups also has its own unique subset of 15 attributes.These non-overlapping subsets, are x ik ∼ N (µ = +1.5, σ = 0.2) for both groups.All the remaining data in X were generated from a standard normal distribution.After creating the groups, the pooled sample was standardized to have zero mean and unit variance on all attributes.The data set thus contains two small groups that exhibits clustering on only a few non-overlapping and perfectly overlapping attributes, together with a large non-clustered background.
Possible R code for simulating such a data set, and reproducing our tutorial results is as follows: > set.seed(123);N <-100; P <-1000; > X <-matrix(rnorm(N*P), nrow = N, ncol = P) We run COSA using its default settings and store it in the object cosa.rslt in the following way: In Linux and Mac OS X based operating systems this will start the console to display the The first column indicates the changes in the weights ∆W after each iteration (#it) defined as the sum of the absolute differences between the weights in W (itrs) and the weights in the previous iteration W (itrs−1) .The #iit column gives the number of inner iterations and the #oit column the number of outer iterations.The eta column shows the value of the homotopy parameter η, which starts low and is defined as η = λ + #oit * 0.1 * λ.Gradually increasing the homotopy parameter tries to avoid local minima for the criterion (which is of course not guaranteed).The Mean of the Squared Differences (MSD) is defined as and gives the a verage of the squared differences between the L 1 distances and the inverse exponential distances.The last column gives the value of the criterion as displayed in equation ( 11).The R function str() shows the contents of the output object cosa.rslt.

Fitting dendrograms to COSA dissimilarities
To display the possible clustering structure contained in the COSA dissimilarities (cosa.rslt$D), we can first plot a dendrogram using the hierclust function.By default, the dendrogram is build using average linkage.Other options such as 'single', 'complete', and 'ward' linkage are available; the command for ward clustering would be hierclust(cosa.rslt$D,method = 'ward') (Ward Jr 1963).To ensure that this dendrogram has a scale that is comparable with future dendrograms, the COSA dissimilarities are by default normalized to have sum of squares equal to N .To plot a dendrogram, use

> hclst.cosa <-hierclust(cosa.rslt$D)
From the dendrogam given by the hierclust command, we can clearly see the grouping structure conform to the design that was used.There are two groups (each with 15 objects) and a large remaining group for which the objects are not similar to each other.We can select the observed clusters, and obtain the index numbers of the objects in each cluster, by using the getclust function: This function reads the position of the pointer, and with a click we can cut the tree at the vertical position of the pointer, and draw a colored rectangle around the cluster.The index numbers of the objects in the corresponding groups are then stored in the object grps.cosa.
When finished, press 'Esc' or choose Stop from the options using the right-click of the mouse.
Figure 3 shows the two groups we selected.The content of grps.cosa can be seen by using the 0.00 0.05 0.10 0.15 The first line indicates whether an object is from a particular group, and if so, which group label is attached.If an object has not been allocated to a group, it gets a 0. The subsequent lines give the indices for the objects in the selected groups.

Hierarchical Clustering
6.2 Fitting multidimensional scaling solutions to COSA dissimilarities In addition to hierarchical clustering producing a dendrogram, we can also use the COSA dissimilarity matrix to display the objects in low-dimensional space by multidimensional scaling (MDS).This is done preferably by using an algorithm that minimizes a least squares loss function, usually called STRESS, defined on dissimilarities and distances.This loss function (in its raw, squared, form) is written as: where Convex Function) algorithm is described in De Leeuw and Heiser (1982).Later, the meaning of the acronym was changed to Scaling by Majorizing a Complicated Function in Heiser (1995).
The Classical Scaling approach, also known as Torgerson-Gower scaling (Young and Householder 1938;Torgerson 1952;Gower 1966), minimizes a loss function (called STRAIN in Meulman 1986) defined on scalar products (ZZ ′ ) and not on distances D(Z), and is written as where J = I − N −1 11 ′ , a centering operator that is applied to squared dissimilarities in ∆ 2 .
The drawback of minimizing the STRAIN loss function is that the resulting configuration Z is obtained by a projection of the objects into a low-dimensional space.Due to this projection, objects having distances that are large in the data, may be displayed close together in the representation space, giving a false impression of similarity.By contrast, a least squares metric MDS approach (such as SMACOF) gives a nonlinear mapping instead of a linear projection, and will usually preserve large distances in low-dimensional space.See Meulman (1986Meulman ( , 1992) ) for more details.In the following MDS applications, we will display objects in two-dimensional space, showing both the classical solution and the least squares solution, in Figure 4 and Figure 5, respectively.
For the argument groups in the smacof function, we can use grps.cosaobtained from getclust to give different colors to points in the two groups.we obtain Figure 5.We observe that the large cloud of gray points, representing objects that are not similar to any of the other objects, seem to form a cluster as well; this is undesirable, since they are noise objects.Their closeness is due to the linear projection characteristic for classical MDS.Therefore, the representation given by the smacof function, given in Figure 6, is to be preferred since it shows that the noise objects are not closely related.At this point, we should address the possibility of having an additive constant present in the standard dissimilarity output of COSA in cosa.rslt$D.We can take care of such a constant by fitting an interval transformation to the COSA dissimilarities taking care that dij does not become negative.We do this by using the intercept option (by default set to 1) in the smacof function: > smacof.rslt<-smacof(cosa.rslt$D,groupnr = grps.cosa$grps,interc = 1) When we iteratively minimize the so-called nonmetric least squares loss function over dij and Z we obtain the representation of object points in Figure 6.

Attribute Importance
After having found clusters of objects in the data, we wish to know which attributes are important for the different clusters.The importance I kl of attribute k for cluster l (C l ) is inversely proportional to the dispersion S kl of the data in attribute k for objects in cluster C l of size N l , and is defined as If the dispersion of the data in an attribute is small for a particular group of objects, than the attribute is important for that particular group.Because the importance value is inversely proportional to within-group dispersion, the importance value is biased towards the variables with small within-group variability, and not towards large between-group separation.
To see whether the value of a particular attribute importance is higher than could be expected by chance, a simple resampling method can be used.First, to determine how many attributes are important for a particular cluster, e.g., cluster l of size N l , we execute the commands > attimp1.cosa<-attimp(X, group = grps.cosa$index$grp1,range = 1:1000) > str(attimp1.cosa) The indices of the ordered attributes are given in attimp1.cosa$att,and the corresponding descending attribute importance values in attimp1.cosa$imp.To get a complete overview of the attribute importances, use the attimp function for the other groups as well.

> par(mfrow = c(1,1))
Note the differences in scale on the vertical axes for the groups in Figure 9. Having a good overview of the number of important attributes per group, we can obtain the maximum of the importance values and select the number of attributes that should be inspected according to their importance.Based on the above overview, we would select the first 50 attributes.Next, execute attimp again, now with > attimp1.cosa<-attimp(X, ylim = c(0,7), group = grps.cosa$index$grp1,+ range = 1:50, times = 10, main = "Group 1 (Red)") By using these options, attimp will plot the 50 highest attribute importance values for cluster l, and will also take a random sample of size N l from the data for the first 50 ordered attributes, and compute the attribute importance values on the basis of this random group.This is repeated 10 times.Also, note that we know the maximum of the importance values at this point, so we can set the limits of the vertical axes equal to each other for each group.
> lmts <-range(cosa.rslt$W[, k[1:50] In Figure 9 the black line indicates the attribute importance of the attributes for each cluster.
The green lines are the attribute importance lines for groups of the same size, randomly sampled from the data.The red line is the average of the green lines.Thus, the larger the difference between the black line and the red line, the more evidence that the attribute importance values are not just based on chance.Note the sudden drop of the black attribute importance line after 30 attributes.This is in line with the simulated data, in which each group clustered on 30 attributes only.There are no attributes that can be considered important for the remaining objects.
In It is clear that the COSA weights display the same structure as was found for the attribute importances: Group 1 has large weights for attributes 1:30, group 2 has large weights for attributes 15:45, group 1 and 2 have large weights on the overlapping attributes 15:30, and all weights for the remaining objects are small.COSA clearly separates the signal from the noise in our data.
Although the structure in the data was especially designed to demonstrate COSA, it is not particularly complicated.However, very common approaches in cluster analysis, such as hierarchical clustering of either squared Euclidean distances or L 1 distances, are not able to cope with it.This is also true for the more sophisticated SPARCL approach.Results are shown in Figure 11, where we give the dendrograms obtained for the COSA dissimilarities, the L 1 distances, the squared Euclidean distances, and the SPARCL dissimilarities, as defined in Equations ( 10), (3), (4), and (5), respectively.To obtain the COSA and the SPARCL dissimilarities, we used the default settings, which amounts to weighted L 1 distances in COSA and weighted squared Euclidean dissimilarities in SPARCL.

COSA Clustering
Weighted Sq.Euclidean distances

ANALYSIS OF THE LEIDEN APOE3 DATA
The data in the following example are from an experiment with two types of mice: normal mice (called 'wildtype') and transgenic mice.The latter type contains the Human Leiden ApoE3 variety.The biological background is briefly summarized as follows.ApoE3 stands for Apolipoprotein E3; it is one of many apolipoproteins that, together with lipids, form lipoproteins (cholesterol particles), for example, LDL, VLDL, and HDL.The E3 "Leiden" is a human variant of ApoE3.
When the lipoprotein is no longer recognized by special receptors in the liver, it prevents uptake of LDL cholesterol by the liver, and this results in strongly increased lipoprotein levels in the plasma.Eventually the latter condition results in atherosclerosis, which is hardening of the arteries.This may lead to blocked blood vessels and a stroke or a heart attack.The experiment has two important features.Mice would usually develop severe atherosclerosis when on a high fat diet.However, in the current experiment, the mice were on a low fat diet.Also atherosclerosis would be manifest after 20 weeks, but the samples were collected when the mice were only 9 weeks of age.

Data Description
The 1550 attributes in the study are LC-MS (liquid chromatography-mass spectrometry) measurements of plasma lipids.The objects consist of 38 cases, with two observations for each mouse.The original experiment was performed with 10 wildtype and 10 transgenic mice, but only 9 transgenic mice survived the experiment (Damian, Oresics, Verheij, Meulman, Friedman, Adourian, Morel, Smilde and van der Greef 2007).

COSA analysis
The COSA analysis consists of first computing the dissimilarity matrix based on the COSA weights, and then subjecting this matrix to hierarchical clustering (using hierclust) and multidimensional scaling (using smacof), resulting in a dendrogram and a two-dimensional space, respectively (shown in Figure 12).We have used the average link option (the default) for the hierarchical cluster analysis, but this choice was not essential for the separation between the transgenic and the wild type mice, which is perfect.
Again, we use the attimp() function to inspect the importance values for the variables in each of the two clusters.and in this way we can determine which variables are more important than can be attributed to chance.We also perform this test for the transgenic cluster.Here the values for the 85 most important variables (out of 1550) are displayed.The black curve gives the observed importance values, the ten green curves are for the randomly generated samples, and the red curve is again the average of the ten green curves.The difference between the importance values for the wildtype cluster and those for the 10 random groups is large; about 60 attributes appear to be important for the clustering of the wildtype group.The importance values for the transgenic cluster are somewhat less distinct.It is clear, however, that note more than 100 variables are truly important for the clustering of the transgenic mice.
We obtain boxplots (Figure 14) for the weights of the first 85 attributes, ordered from most to least important within each group, by: and the weights of the median ordered attributes for the transgenic mice (bottom) separately.
When we take a look at the boxplots of the attribute weights within each group, we can conclude that the medians of the weights of the ordered attributes in the wild type group are much more distinct compared to the the transgenic group.
In Figures 15 and 16, we inspect the attribute values for the 100 most and least important attributes.Values for the wildtype group are ordered according to attribute importance, and are contrasted with their corresponding attribute median values for the transgenic group and vice versa.Those with some extended R programming skills, can use the output from the cosa2 for further use.Analysis of the COSA dissimilarity matrix is not limited to hierarchical clustering or multidimensional scaling, as was presented in this tutorial.Other linear or non-linear projection methods that use dissimilarity matrices, such as self organizing maps (Kohonen 2001), Sammon's mapping (Sammon 1969) or curvilinear distance analysis (Lee, Lendasse and Verleysen 2004), may also be considered.Compositional data analysis (Aitchison 1986) of the COSA weight matrix may also lead to additional insights in the cluster structure of the objects.
objects cluster on their own subset of attributes.The first group (with objects 1-15) clusters on the attributes 1-30, and the second group (with objects 16-30) clusters on attributes 16-45.So the two groups are similar with respect to attributes 16-30, and different with respect to attributes 1-15 and 31-45, respectively.The two subsets of attributes, 1-30 and 16-45, are partially overlapping.The remaining 70 objects in the data form an unclusterable background (noise), and the remaining 955 attributes do not contain any clusters at all.The data structure displayed in Figure2is a typical example for which COSA was designed.In such a situation, heuristic, greedy cluster algorithms are very likely to convergence to suboptimal solutions.To avoid such solutions as much as possible, we would need a clever search strategy, together with a good starting point.The latter is crucial: when we start the search with equal weighting of attributes in combination with a usual definition of 'closeness' -such as Euclidean or L 1 distanceour search will almost surely end up in a distinctly suboptimal local minimum.

Figure 2 :
Figure 2: A Monte Carlo model for 100 objects with 1,000 attributes (not all are shown due to P >> N ).There are two small 15-object groups (red & blue), clustering each on 30 attributes out of 1000 attributes, with partial overlap, and nested within an unclustered background of 70 objects (gray).
) on selected attributes x k , where d ijk (•) is the corresponding single target distance (15).This dual target distance is small whenever x ik and x jk are either both close to t k or both close to u k .Thus, in gene expression and consumer spending examples, one might set t k and u k to values near the maximum and minimum data values of the attributes, respectively, and we will cause COSA to seek clusters based on extreme attribute values, ignoring (perhaps dominant) clusters with moderate attribute values.

Figure 3 :
Figure 3: Selecting two groups out of the dendrogram of the COSA Dissimilarities using the function getclust ||•|| 2 denotes the squared Euclidean norm.Here ∆ is the N × N COSA dissimilarity matrix with elements D ij [W] and D(Z) is the Euclidean distance matrix derived from the N × p configuration matrix Z that contains coordinates for the objects in a p−dimensional representation space.An example of an algorithm that minimizes such a metric least squares loss function is the so-called SMACOF algorithm.The original SMACOF (Scaling by Maximizing a

Figure 6 :
Figure 6: Multidimensional scaling solution and dendrogram, after eliminating an additive constant .

Figure 8 :Figure 9 :Figure 10 :
Figure 8: Display of the attribute importances of group 1, group 2 and the remaining objects in barplots.

Figure 11 :
Figure 11: Dendrograms obtained from hierarchical clustering for six different dissimilarity matrices derived from the simulated data in Figure 2. L 1 distances in the first row, squared Euclidean distances in the second row.Unweighted dissimilarities in the first column, SPARCL dissimilarities in the second column, COSA dissimilarities in the third column.

Figure 12 :
Figure 12: Hierarchical cluster analysis and smacof of the COSA dissimilarities of the ApoE3 Leiden data.Wildtype mice are in red, and transgenic mice are in blue.Average linkage was used in the clustering.

Figure 13 :
Figure 13: The two black curves in the upper and lower graph display the 85 largest (out of 1550) importance values for the group of transgenic mice 1-18 in the ApoE3 Leiden data (at the top) and for the wildtype mice (at the bottom).In each graph the ten green curves indicate the 85 largest importance values for ten random groups of size 18 and 20, respectively.The two red curves are the averages of each set of ten green curves.

Figure 14 :
Figure 14: Boxplots for the weights of the median ordered attributes for the wildtype mice (top)

Figure 15 :
Figure 15: In the top panel the values of the 100 most important attributes for the wildtype group are shown in red boxplots.In blue the median values of the transgenic group are added for these attributes.In the bottom panel the values of the 100 most important attributes for the transgenic group are shown in blue boxplots.In red the median values of the wildtype group are added for these attributes.

Figure 16 :Figure 17 :
Figure 16: In the top panel the values of the 100 least important attributes for the wildtype group are shown in red boxplots.In blue the median values of the transgenic group are added for these attributes.In the bottom panel the values of the 100 least important attributes for the transgenic group are shown in blue boxplots.In red the median values of the wildtype group are added for these attributes.