Eureka!: A Tool for Interactive Knowledge Discovery

  • Giuseppe Manco
  • Clara Pizzuti
  • Domenico Talia
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2453)

Abstract

In this paper we describe an interactive, visual knowledge discovery tool for analyzing numerical data sets. The tool combines a visual clustering method, to hypothesize meaningful structures in the data, and a classification machine learning algorithm, to validate the hypothesized structures. A two-dimensional representation of the available data allows a user to partition the search space by choosing shape or density according to criteria he deems optimal. A partition can be composed by regions populated according to some arbitrary form, not necessarily spherical. The accuracy of clustering results can be validated by using a decision tree classifier, included in the mining tool.

Keywords

Data Mining Singular Value Decomposition Knowledge Discovery Cluster Result Cluster Scheme 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    E. Beltrami. Sulle funzioni bilineari [on bilinear functions]. Giornale di Matematiche ad Uso degli Studenti delle Università, 11:98–106, 1873.Google Scholar
  2. 2.
    S. Berchtold, H.V. Jagadish, and K.A. Ross. Independence Diagrams: A Technique for Visual Data Mining. In Proceedings of Fourth Int. Conf. on Knowledge Discovery and Data Mining, 1998.Google Scholar
  3. 3.
    C.L. Blake and C.J. Merz. UCI repository of machine learning databases, 1998.Google Scholar
  4. 4.
    K.C. Cox, S.G. Eick, G.J. Wills, and R. J. Brachman. Visual Data Mining: Recognizing Telephone Calling Fraud. Data Mining and Knowledge Discovery, 1(2):225–231, 1997.CrossRefGoogle Scholar
  5. 5.
    R.O. Duda and P.E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.MATHGoogle Scholar
  6. 6.
    U. Fayyad, G.G. Grinstein, and A. Wierse. Infomation Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann, 2002.Google Scholar
  7. 7.
    U.M. Fayyad, G. Piatesky-Shapiro, and P. Smith. From Data Mining to Knowledge Discovery: an overview. In U. Fayyad et al., editors, Advances in Knowledge Discovery and Data Mining, pages 1–34. AAAI/MIT Press, 1996.Google Scholar
  8. 8.
    G.H. Golub and C.F. Van Loan. Matrix Computation. The Johns Hopkins University Press, 1989.Google Scholar
  9. 9.
    M. Halkidi, Y. Batistakis, and M. Vazirgiannis. On Clustering Validation Techniques. Journal of Intelligent Information Systems. To appear. Available at http://www.db-net.aueb.gr/mhalk/papers/validity_survey.pdf.
  10. 10.
    J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufman, 2000.Google Scholar
  11. 11.
    A.K. Jain and R.C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.Google Scholar
  12. 12.
    I.T. Jolliffe. Principal Component Analysis. Springer Verlag, 1986.Google Scholar
  13. 13.
    D.A. Keim and S. Eick. Proceedings Workshop on Visual Data Mining. ACM SIGKDD, 2001.Google Scholar
  14. 14.
    D.A. Keim and H.P. Kriegel. Visualization Techniques for Mining Large Databases: A Comparison. IEEE Transaction on Knowledge and Data Engineering, 8(6):923–938, 1996.CrossRefGoogle Scholar
  15. 15.
    F. Korn et al. Quantifiable Data Mining Using Principal Component Analysis. VLDB Journal, 8(3–4):254–266, 2000.CrossRefGoogle Scholar
  16. 16.
    F. Korn, H.V. Jagadish, and C. Faloutsos. Efficient Supporting Ad Hoc Queries in Large Datasets of Time Sequences. In Proceedings of the ACM Sigmod Conf. on Magagment of Data, 1997.Google Scholar
  17. 17.
    M. Macedo, D. Cook, and T.J. Brown. Visual Data Mining In Atmospheric Science Data. Data Mining and Knowledge Discovery, 4(1):68–80, 2000.CrossRefGoogle Scholar
  18. 18.
    G.J. MacLahan and T. Krishnan. The EM Algorithm and Extensions. Wiley, 1997.Google Scholar
  19. 19.
    W. H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical Receips in C: The Art of Computing. Cambridge University Press, 1992.Google Scholar
  20. 20.
    R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.Google Scholar
  21. 21.
    G. Strang. Linear Algebra and its Applications. Academic Press, 1980.Google Scholar
  22. 22.
    Telcal Team. Analisi della struttura produttiva ed occupazionale della regione calabria: Risultati. Technical report, Piano Telematico Calabria, 2001. in italian.Google Scholar
  23. 23.
    S. Theodoridis and K. Koutroubas. Pattern Recognition. Academic Press, 1999.Google Scholar
  24. 24.
    I. Witten and E. Frank. Data Mining: Practical Machine Learning Tools with Java Implementation. Morgan-Kaufman, 1999.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Giuseppe Manco
    • 1
  • Clara Pizzuti
    • 1
  • Domenico Talia
    • 2
  1. 1.ISI-CNRc/o DEIS, Università della CalabriaRende (CS)Italy
  2. 2.DEISUniversità della CalabriaRende (CS)Italy

Personalised recommendations