A Visual and Interactive Data Exploration Method for Large Data Sets and Clustering

  • David Da Costa
  • Gilles Venturini
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4632)


We present in this paper a new method for the visual exploration of large data sets with up to one million of objects. We highlight some limitations of the existing visual methods in this context. Our approach is based on previous systems like Vibe, Sqwid or Radviz which have been used in information retrieval: several data called points of interest (POIs) are placed on a circle. The remaining large amount of data is displayed within the circle at locations which depend on the similarity between the data and the POIs. Several interactions with the user are possible and ease the exploration of the data. We highlight the visual and computational properties of this representation: it displays the similarities between data in a linear time, it allows the user to explore the data set and to obtain useful information. We show how it can be applied to standard ’small’ databases, either benchmarks or real world data. Then we provide results on several large, real or artificial, data sets with up to one million data. We describe then both the successes and limits of our method.


Information Retrieval Information Visualization Interactive Cluster Visual Data Mining Graphic Request 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wong, P.C., Bergeron, R.D.: 30 years of multidimensional multivariate visualization. In: Scientific Visualization — Overviews, Methodologies and Techniques, pp. 3–33. IEEE Computer Society Press, Los Alamitos, CA (1997)Google Scholar
  2. 2.
    Sudipto, G., Rajeev, R., Kyuseok, S.: CURE: an efficient clustering algorithm for large databases. In: Haas, L.M., Tiwary, A. (eds.) Proceedings ACM SIGMOD International Conference on Management of Data, Seattle, Washington, USA, pp. 73–84. ACM Press, New York (1998)Google Scholar
  3. 3.
    Tian, Z., Raghu, R., Miron, L.: Birch: An efficient data clustering method for very large databases. In: Jagadish, H.V., Mumick, I.S. (eds.) Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4-6, 1996, pp. 103–114. ACM Press, New York (1996)Google Scholar
  4. 4.
    Costa, D.D., Venturini, G.: An interactive visualization environment for data exploration using points of interest. In: Li, X., Zaïane, O.R., Li, Z. (eds.) ADMA 2006. LNCS (LNAI), vol. 4093, pp. 416–423. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Korfhage, R.: To see, or not to see: Is that the query ((Special Issue of the SIGIR Forum)). In: Bookstein, A., Chiaramella, Y., Salton, G., Raghavan, V.V. (eds.) Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Chicago, Illinois, USA, October 13-16, 1991, pp. 134–141. ACM, New York (1991)Google Scholar
  6. 6.
    Chernoff, H.: Using faces to represent points in k–dimensional spae graphically. Journal of the American Statistical Association 68, 361–368 (1973)CrossRefGoogle Scholar
  7. 7.
    Inselberg, A.: The plane with parallel coordinates. The Visual Computer 1, 69–91 (1985)CrossRefzbMATHGoogle Scholar
  8. 8.
    Fua, Y.H., Ward, M.O., Rundensteiner, E.A.: Hierarchical parallel coordinates for exploration of large datasets. In: VIS 1999. VISUALIZATION 1999: Proceedings of the 10th IEEE Visualization 1999 Conference, IEEE Computer Society, Washington, DC, USA (1999)Google Scholar
  9. 9.
    Becker, R.A., Cleveland, W.S.: Brushing Scatterplots. Technometrics 29, 127–142 (1987). In: Cleveland, W.S., McGill, M.E. (eds.) Dynamic Graphics for Data Analysis. Chapman and Hall, New York (reprinted, 1988)Google Scholar
  10. 10.
    Keim, D.A., Kriegel, H.: VisDB: Database exploration using multidimensional visualization. In: Computer Graphics and Applications (1994)Google Scholar
  11. 11.
    Fekete, J., Plaisant, C.: Interactive information visualization of a million items proceedings of ieee symposium on information visualization (2002)Google Scholar
  12. 12.
    Jun Wang, B.Y., Gasser, L.: Classification visualization with shaded similarity matrices. Technical report, GSLIS University of Illinois at Urbana-Champaign (2002)Google Scholar
  13. 13.
    McCrickard, S., Kehoe, C.: Visualizing search results using sqwid. In: Proceedings of the Sixth International World Wide Web Conference (1997)Google Scholar
  14. 14.
    Hoffman, P., Grinstein, G., Pinkney, D.: Dimensional anchors: a graphic primitive for multidimensional multivariate information visualizations. In: NPIVM 1999. Proceedings of the 1999 workshop on new paradigms in information visualization and manipulation in conjunction with the eighth ACM internation conference on Information and knowledge management, pp. 9–16. ACM Press, New York, NY, USA (1999)Google Scholar
  15. 15.
    Au, P., Carey, M., Sewraz, S., Guo, Y., Rüger, S.M.: New paradigms in information visualization. Research and Development in Information Retrieval, 307–309 (2000)Google Scholar
  16. 16.
    Blake, C., Merz, C.: UCI repository of machine learning databases (1998)Google Scholar
  17. 17.
    Hemmje, M., Kunkel, C., Willett, A.: Lyberworld visualization user interface supporting fulltext retrieval. In: SIGIR 1994. Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, New York, NY, USA, pp. 249–259. Springer, New York (1994)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • David Da Costa
    • 1
    • 2
  • Gilles Venturini
    • 1
  1. 1.Laboratoire d’Informatique de l’Université de ToursFrance
  2. 2.CohesiumFrance

Personalised recommendations