Sparse Component Analysis: a New Tool for Data Mining

* Final gross prices may vary according to local VAT.

Get Access


In many practical problems for data mining the data X under consideration (given as (m × N)-matrix) is of the form X = AS, where the matrices A and S with dimensions m×n and n × N respectively (often called mixing matrix or dictionary and source matrix) are unknown (mn < N). We formulate conditions (SCA-conditions) under which we can recover A and S uniquely (up to scaling and permutation), such that S is sparse in the sense that each column of S has at least one zero element. We call this the Sparse Component Analysis problem (SCA). We present new algorithms for identification of the mixing matrix (under SCA-conditions), and for source recovery (under identifiability conditions). The methods are illustrated with examples showing good performance of the algorithms. Typical examples are EEG and fMRI data sets, in which the SCA algorithm allows us to detect some features of the brain signals. Special attention is given to the application of our method to the transposed system X T = S T A T utilizing the sparseness of the mixing matrix A in appropriate situations. We note that the sparseness conditions could be obtained with some preprocessing methods and no independence conditions for the source signals are imposed (in contrast to Independent Component Analysis). We applied our method to fMRI data sets with dimension (128 × 128 × 98) and to EEG data sets from a 256-channels EEG machine.