Exploratory Data Mining is the process of using data mining methods to gain novel insights into data without having a specific goal in mind. To convey large amounts of complex information, it is a logical choice to present this information visually, as the information bandwidth of the eye is much larger than the other senses, and humans excel at spotting visual patterns . Surprisingly, visual interactive data mining tools are still rare.
The few tools that exist are either designed for specific problems and domains (e.g., itemset and subgroup discovery [1, 4, 7], information retrieval , or analysis of networks ) and/or aim to present information that align with the user’s beliefs (e.g., semi-supervised PCA ). However, users are typically interested in finding structures in the data that contrast with their current knowledge .
In this paper, we present a generic toolFootnote 1 that enables users to efficiently explore data via a sequence of 2D scatter plots, i.e., projections. It models the user’s beliefs about data by iteratively incorporating their feedback, which in turn is utilized for calculating an updated data projection. SIDE operates iteratively, with three steps in each iteration (see Fig. 1). In step 1, it presents a user with a ‘surprising’ data projection. In step 2, the user provides feedback about the projection. Finally, in step 3, the background model is updated to reflect the user’s current belief state. It then repeats from step 1, and shows a data projection that takes into account the updated background model.