Advertisement

Using visualization to support data mining of large existing databases

  • Daniel A. Keim
  • Hans -Peter Kriegel
Papers: Interaction, User Interfaces and Presentation
Part of the Lecture Notes in Computer Science book series (LNCS, volume 871)

Abstract

In this paper. we present ideas how visualization technology can be used to improve the difficult process of querying very large databases. With our VisDB system, we try to provide visual support not only for the query specification process. but also for evaluating query results and. thereafter, refining the query accordingly. The main idea of our system is to represent as many data items as possible by the pixels of the display device. By arranging and coloring the pixels according to the relevance for the query, the user gets a visual impression of the resulting data set and of its relevance for the query. Using an interactive query interface, the user may change the query dynamically and receives immediate feedback by the visual representation of the resulting data set. By using multiple windows for different parts of the query, the user gets visual feedback for each part of the query and, therefore, may easier understand the overall result. To support complex queries, we introduce the notion of ‘approximate joins’ which allow the user to find data items that only approximately fulfill join conditions. We also present ideas how our technique may be extended to support the interoperation of heterogeneous databases. Finally, we discuss the performance problems that are caused by interfacing to existing database systems and present ideas to solve these problems by using data structures supporting a multidimensional search of the database.

Keywords

Visualizing Large Data Sets Visualizing Multidimensional Multivariate Data Data Mining Visual Query Systems Visual Relevance Feedback Interfaces to Database Systems 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [ABN 92]
    Anwar T. M., Beck H. W., Navathe S. B.: ‘Knowledge Mining by Imprecise Querying: A Classification-Based Approach', Proc. 8th Int. Conf. on Data Engineering, Tempe, AZ, 1992, pp. 622–630.Google Scholar
  2. [Bed 90]
    Beddow J.: ‘Shape Coding of Multidimensional Data on a Mircocomputer Display', Visualization'90, San Francisco, CA. 1990, pp. 238–246.Google Scholar
  3. [BKSS 90]
    Beckmann N., Kriegel H.-P., Schneider R. Seeger B.: ‘The R *-Tree: An Efficient and Robust Access Method for Points and Rectangles', Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ, 1990. pp. 322–331.Google Scholar
  4. [Cha 90]
    Chaudhuri S.: ‘Generalization and a Framework for Query Modification', Proc. 6th Int. Conf. on Data Engineering, Los Angeles, CA. 1990, pp. 138–145.Google Scholar
  5. [DE 82]
    Dunn G., Everitt B.: ‘An Introduction to Mathematical Taxonomy', Cambridge University Press, Cambridge. MA, 1982.Google Scholar
  6. [FB 90]
    Feiner S., Beshers C.: ‘Visualizing n-Dimensional Virtual Worlds with n-Vision', Computer Graphics, Vol. 24, No. 2. 1990. pp. 37–38.Google Scholar
  7. [FM 91]
    Frei H. P., Meienberg S.: 'Evaluating Weighted Search Terms as Boolean Queries', Proc. GI/GMD-Workshop. Darmstadt 1991. in: Informatik-Fachberichte, Vol. 289, 1991, pp. 11–22.Google Scholar
  8. [FPM 91]
    Frawley W. J., Piatetsky-Shapiro G., Matheus C. J.: ‘Knowledge Discovery in Databases: An Overview', in: Knowledge Discovery in Databases, AAAI Press, Menlo Park, CA, 1991.Google Scholar
  9. [FS 91]
    Frei H. P., Schäuble P.: ‘Determining the Effectiveness of Retrieval Algorithms', Information Processing & Management, Vol. 27, No. 2, 1991.Google Scholar
  10. [GPP 90]
    Geiger D., Paz A., Pearl J.: ‘Learning Causal Trees from Dependence Information', Proc. 8th National Conf. on Artificial Intelligence, 1990. pp. 771–776.Google Scholar
  11. [GSSK 87]
    Glymour C., Scheines R., Spirtes P., Kelly K.: ‘Discovering Causal Structure', Academic Press, San Diego. CA. 1987.Google Scholar
  12. [HD 80]
    Hall P. A., Dowling G. R.: 'Approximate String Matching', Proc. 6th Annual Int. SIGIR Conf., in: SIGIR, Vol. 17. No. 4, 1983. pp. 130.Google Scholar
  13. [HHNT 86]
    Holland J. H., Holyoak K. J.. Nisbett R. E., Thagard P. R.: ‘Induction: Processes of Inference, Learning, and Discovery', MIT Press, Cambridge, MA. 1986.Google Scholar
  14. [Hub 85]
    Huber P. J.: ‘Projection Pursuit', The Annals of Statistics, Vol. 13, No. 2. 1985, pp. 435–474.Google Scholar
  15. [ID 90]
    Inselberg A., Dimsdale B.: ‘Parallel Coordinates: A Tool for Visualizing Multi-Dimensional Geometry', Visualization'90. San Francisco, CA. 1990, pp. 361–370.Google Scholar
  16. [ISO 92]
    ISO/IEC: ‘Database Language SQL', ISO/IEC 9075:1992 (German Standardization: DIN 66315).Google Scholar
  17. [JKL 77]
    Joshi A. K., Kaplan S. J., Lee R. M.: ‘Approximate Responses from a Data Base Query System: Applications of Inferencing in Natural Language', Proc. 5th Int. Joint Conf. on Artificial Intelligence (IJCAI). Boston. MA. 1977, pp. 211–212.Google Scholar
  18. [Kap 82]
    Kaplan S. J.: ‘Cooperative Responses from a Portable Natural Language Query System', Artificial Intelligence. Vol. 19, 1982. pp. 165–187.Google Scholar
  19. [KKM 93a]
    Keim D. A., Kriegel H.-P., Miethsam A.: ‘Integration of Relational Databases in a Multidatabase System based on Schema Enrichment', Proc. 3rd Int. Workshop on Interoperability in Multidatabase Systems (RIDE-IMS), Vienna. Austria. 1993, pp. 96–104.Google Scholar
  20. [KKM 93b]
    Keim D. A., Kriegel H.-P., Miethsam A.: ‘Object-Oriented Querying of Existing Relational Databases', Proc. 4th Int. Conf. on Database and Expert Systems Applications (DEXA), Prague. Czech Republic, 1993, in: Lecture Notes in Computer Science, Vol. 720, Springer, 1993, pp. 325–336.Google Scholar
  21. [KKM 94]
    Keim D. A., Kriegel H.-R, Miethsam A.: ‘Query Translation Supporting the Migration of Legacy Databases into Cooperative Information Systems', Proc. Int. Conf. on Cooperative Information Systems. Toronto, Canada, 1994.Google Scholar
  22. [KKS 93]
    Keim D. A., Kriegel H.-P., Seidl T.: ‘Visual Feedback in Querying Large Databases', Proc. Visualization'93, San Jose, CA, 1993, pp. 158–165.Google Scholar
  23. [KL 92]
    Keim D. A., Lum V.: ‘Visual Query Specification in a Multimedia Database System', Proc. Visualization'92, Boston. MA. 1993, pp. 194–201.Google Scholar
  24. [LWW90]
    LeBlanc J., Ward M. O., Wittels N.: ‘Exploring N-Dimensional Databases', Visualization'90, San Francisco. CA. 1990, pp. 230–239.Google Scholar
  25. [MGTS 90]
    Mihalisin T., Gawlinski E., Timlin J., Schwendler J.: ‘Visualizing Scalar Field on an N-dimensional Lattice', Visualization'90, San Francisco, CA, 1990, pp. 255–262.Google Scholar
  26. [Mot 86]
    Motro A.: ‘BAROQUE: A Browser for Relational Databases', ACM Trans. on Office Information Systems. Vol. 4. No. 2, 1983, pp. 164–181.Google Scholar
  27. [Mot 90]
    Motro A.: ‘FLEX: A Tolerant and Cooperative User Interface to Databases', IEEE Trans. on Knowledge and Data Engineering, Vol. 2, No. 2. 1990, pp. 231–246.Google Scholar
  28. [MZ 92]
    Marchak F., Zulager D.: ‘The Effectiveness of Dynamic Graphics in Revealing Structure in Multivariate Data', Behavior. Research Methods, Instruments and Computers, Vol. 24, No. 2, 1992. pp. 253–257.Google Scholar
  29. [NMK 81]
    Noreault T., McGill M., Koll M. B.: ‘A Performance Evaluation of Similarity Measures, Document Term Weighting Schemes and Representations in a Boolean Environment', in: Information Retrieval Research. Butterworths, London. 1981.Google Scholar
  30. [PC 93]
    Parsaye K., Chignell M.: ‘Intelligent Database Tools & Applications', John Wiley & Sons. New York, 1993.Google Scholar
  31. [Qui 86]
    Quinlan J. R.: ‘Induction of Decision Trees', in: Machine Learning, Vol. 1. No. 1, 1986, pp. 81–106.Google Scholar
  32. [RM 86]
    Rummelhart D. E., McClelland J. L.: ‘Parallel Distributed Processing', MIT Press, Cambridge. MA. 1986.Google Scholar
  33. [Sal 88]
    Salton G.: ‘A Simple Blueprint for Automatic Boolean Query Processing', Information Processing & Management Vol. 24. No. 3. 1988, pp. 269–280.Google Scholar
  34. [SB 88]
    Salton G., Buckley C.: ‘Term-Weighting Approaches in Automatic Text Retrieval', Information Processing & Management. Vol. 24, No. 5. 1988, pp. 513–523.Google Scholar
  35. [SBG 90]
    Smith S., Bergeron D., Grinstein G.: ‘Stereophonic and Surface Sound Generation for Exploratory Data Analysis'. Proc. Conf. Special Interest Group in Computer and Human Interaction (SIGCHI), 1990. pp. 125–131.Google Scholar
  36. [SK 90]
    Seeger B., Kriegel H.-P: ‘The Buddy Tree: An Efficient and Robust Access Method for Spatial Databases', Proc. 16th Int. Conf. on Very Large Data Bases, Brisbane, Australia, 1990, pp. 590–601.Google Scholar
  37. [SSU 90]
    Silberschatz A., Stonebraker M., Ullman J. D.: ‘Database Systems: Achievements and Opportunities', Technical Report. No. TR-90-22, Dept. of Computer Sciences, University of Texas at Austin, 1990.Google Scholar
  38. [TC 90]
    Trimble J. H., Chappell D.: ‘A Visual Introduction to SQL', John Wiley & Sons, New York, 1990.Google Scholar
  39. [Tre 92]
    Treinish L. A. Butler D. M., Senay H., Grinstein G. G., Bryson S. T.: ‘Grand Challenge Problems in Visualization Software', Proc. Visualization'92, Boston, MA. 1992, pp. 366–371.Google Scholar
  40. [Zlo 77]
    Zloof M. M. ‘Query-By-Example: A Data Base Language', IBM Systems Journal, Vol. 4, 1977, pp. 324–343.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1994

Authors and Affiliations

  • Daniel A. Keim
    • 1
  • Hans -Peter Kriegel
    • 1
  1. 1.Institute for Computer ScienceUniversity of MunichMunichGermany

Personalised recommendations