DEXA 2002: Database and Expert Systems Applications pp 381-391 | Cite as
Eureka!: A Tool for Interactive Knowledge Discovery
Abstract
In this paper we describe an interactive, visual knowledge discovery tool for analyzing numerical data sets. The tool combines a visual clustering method, to hypothesize meaningful structures in the data, and a classification machine learning algorithm, to validate the hypothesized structures. A two-dimensional representation of the available data allows a user to partition the search space by choosing shape or density according to criteria he deems optimal. A partition can be composed by regions populated according to some arbitrary form, not necessarily spherical. The accuracy of clustering results can be validated by using a decision tree classifier, included in the mining tool.
Keywords
Data Mining Singular Value Decomposition Knowledge Discovery Cluster Result Cluster SchemePreview
Unable to display preview. Download preview PDF.
References
- 1.E. Beltrami. Sulle funzioni bilineari [on bilinear functions]. Giornale di Matematiche ad Uso degli Studenti delle Università, 11:98–106, 1873.Google Scholar
- 2.S. Berchtold, H.V. Jagadish, and K.A. Ross. Independence Diagrams: A Technique for Visual Data Mining. In Proceedings of Fourth Int. Conf. on Knowledge Discovery and Data Mining, 1998.Google Scholar
- 3.C.L. Blake and C.J. Merz. UCI repository of machine learning databases, 1998.Google Scholar
- 4.K.C. Cox, S.G. Eick, G.J. Wills, and R. J. Brachman. Visual Data Mining: Recognizing Telephone Calling Fraud. Data Mining and Knowledge Discovery, 1(2):225–231, 1997.CrossRefGoogle Scholar
- 5.R.O. Duda and P.E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.MATHGoogle Scholar
- 6.U. Fayyad, G.G. Grinstein, and A. Wierse. Infomation Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann, 2002.Google Scholar
- 7.U.M. Fayyad, G. Piatesky-Shapiro, and P. Smith. From Data Mining to Knowledge Discovery: an overview. In U. Fayyad et al., editors, Advances in Knowledge Discovery and Data Mining, pages 1–34. AAAI/MIT Press, 1996.Google Scholar
- 8.G.H. Golub and C.F. Van Loan. Matrix Computation. The Johns Hopkins University Press, 1989.Google Scholar
- 9.M. Halkidi, Y. Batistakis, and M. Vazirgiannis. On Clustering Validation Techniques. Journal of Intelligent Information Systems. To appear. Available at http://www.db-net.aueb.gr/mhalk/papers/validity_survey.pdf.
- 10.J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufman, 2000.Google Scholar
- 11.A.K. Jain and R.C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.Google Scholar
- 12.I.T. Jolliffe. Principal Component Analysis. Springer Verlag, 1986.Google Scholar
- 13.D.A. Keim and S. Eick. Proceedings Workshop on Visual Data Mining. ACM SIGKDD, 2001.Google Scholar
- 14.D.A. Keim and H.P. Kriegel. Visualization Techniques for Mining Large Databases: A Comparison. IEEE Transaction on Knowledge and Data Engineering, 8(6):923–938, 1996.CrossRefGoogle Scholar
- 15.F. Korn et al. Quantifiable Data Mining Using Principal Component Analysis. VLDB Journal, 8(3–4):254–266, 2000.CrossRefGoogle Scholar
- 16.F. Korn, H.V. Jagadish, and C. Faloutsos. Efficient Supporting Ad Hoc Queries in Large Datasets of Time Sequences. In Proceedings of the ACM Sigmod Conf. on Magagment of Data, 1997.Google Scholar
- 17.M. Macedo, D. Cook, and T.J. Brown. Visual Data Mining In Atmospheric Science Data. Data Mining and Knowledge Discovery, 4(1):68–80, 2000.CrossRefGoogle Scholar
- 18.G.J. MacLahan and T. Krishnan. The EM Algorithm and Extensions. Wiley, 1997.Google Scholar
- 19.W. H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical Receips in C: The Art of Computing. Cambridge University Press, 1992.Google Scholar
- 20.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.Google Scholar
- 21.G. Strang. Linear Algebra and its Applications. Academic Press, 1980.Google Scholar
- 22.Telcal Team. Analisi della struttura produttiva ed occupazionale della regione calabria: Risultati. Technical report, Piano Telematico Calabria, 2001. in italian.Google Scholar
- 23.S. Theodoridis and K. Koutroubas. Pattern Recognition. Academic Press, 1999.Google Scholar
- 24.I. Witten and E. Frank. Data Mining: Practical Machine Learning Tools with Java Implementation. Morgan-Kaufman, 1999.Google Scholar