A procedure to compute prototypes for data mining in non-structured domains
This paper describes a technique for associating a set of symbols with an event in the context of knowledge discovery in database or data mining. The set of symbols is related to the keywords in a database which is used as an implicit knowledge source. The aim of this approach is to discover the significant keyword groups which best represent the event. A significant contribution of this work is a procedure which obtains the representative prototype of a group of symbolic data. It can be used for both, unsupervised learning to describe classes, and supervised learning to compute prototypes. The procedure involves defining an objective function and the subsequent hypothesis-exploring system and obtaining an advantageous procedure regarding computational costs.
Key wordslearning data mining knowledge discovery symbolic clustering
Unable to display preview. Download preview PDF.
- 3.Jain A.K. and Dubes R.C. Algorithms for Clustering Data. Printice Hall, 1988.Google Scholar
- 4.Moxon B. Defining data mining. DBMS online, August 1996. http://www.dbmsmag.com/9608d53.html.Google Scholar
- 5.Merz C.J. and Murphy P. UCI repository of machibe learning databases. Technical report, Departament of Information and Computer Science, University of California, Irvine, CA, 1996. http://www.ics.uci.edu/mlearn/MLRepository.htmlxxx.Google Scholar
- 6.Mannila H. Methods and problems in data mining. In Proc. Int. Conf. on Database Theory. Springer-Verlag, January 1997.Google Scholar
- 7.Toivonen H. Discovery of frecuent patterns in large data collections. Technical Report Report A-1996-5, Dept. of Computer Science, University of Helssinki, Finlad, 1996.Google Scholar
- 8.Quinlan J.R. Induction of decision trees. Machine Learning, 1:81–106, 1986.Google Scholar
- 9.Decker K.M. and Focardi S. Technology overview: A report on data mining. Technical Report CSCS TR-95-02, Swiss Scientific Computer Center, May 1995.Google Scholar
- 10.Guigó R. and Temple F.S. Inferring correlation between database queries: Analysis of protein sequence patterns. IEEE PAMI, 25(10):1030–1041, 1988.Google Scholar
- 11.Duda R.O., and Hart P. Pattern Classification and Scene Analysis. Wiley and Sons, 1973.Google Scholar
- 13.Fayyad U.M., Haussler D., and Stolorz P. KDD for science data analysis; issues and examples. In Proc. Second Int. Conf. on Knowledge Discovery and Data Minig. AAAI Press, August 1996.Google Scholar
- 14.Fayyad U.M., Piatetsjy-Shapiro G., and Smyth P. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, 1996.Google Scholar