Abstract
To prepare and model data successfully, the data miner needs to be aware of the properties of the data manifold. In this paper, the outline of a tool for automatically generating data survey reports for this purpose is described. The report combines linguistic descriptions (rules) and statistical measures with visualizations. Together these provide both quantitative and qualitative information and help the user to form a mental model of the data. The main focus is on describing the cluster structure and the contents of the clusters. The data is clustered using a novel algorithm based on the Self-Organizing Map. The rules describing the clusters are selected using a significance measure based on the confidence on their characterizing and discriminating properties.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Esa Alhoniemi, Jaakko Hollmén, Olli Simula, and Juha Vesanto. Process Monitoring and Modeling Using the Self-Organizing Map. Integrated Computer-Aided Engineering, 6 (1): 3–14, 1999.
Stephen D. Bay and Michael J. Pazzani. Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery, 5 (3): 213–246, July 2001.
Eric Boudaillier and Georges Hebrail. Interactive Interpretation of Hierarchical Clustering. Intelligent Data Analysis, 2 (3), August 1998.
Pete Chapman, Julian Clinton, Thomas Khabaza, Thomas Reinartz, and Rüdiger Wirth. The CRISP-DM process model. Technical report, CRISM-DM consortium, March 1999. http://www.crisp-dm.org.
David L. Davies and Donald W. Bouldin. A Cluster Separation Measure. IEEE Trans. on Pattern Analysis and Machine Intelligence, PAMI-1(2): 224–227, April 1979.
Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification. John Wiley & Sons, second edition, 2001.
Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim CURE: an efficient clustering algorithm for large databases. In Proceedings of SIGMOD International Conference on Management of Data,pages 73–84, New York, 1998. ACM.
Jiawei Han, Yandong Cai, and Nick Cercone. Knowledge discovery in databases: An attribute-oriented approach. In Li-Yan Yuan, editor, Proceedings of the 18th International Conference on Very Large Databases, pages 547–559, San Francisco, U.S.A., 1992. Morgan Kaufmann Publishers.
R. Hilderman and H. Hamilton. Knowledge discovery and interestingness measures: A survey. Technical Report CS 99–04, Department of Computer Science, University of Regina, October 1999.
Johan Himberg. A SOM based cluster visualization and its application for false coloring. In Proceedings of International Joint Conference in Neural Networks (IJCNN) 2000, Como, Italy, 2000.
Teuvo Kohonen. Self-Organizing Maps,volume 30 of Springer Series in Information Sciences. Springer, Berlin, Heidelberg, 3rd edition, 1995.
Andreas König. A survey of methods for multivariate data projection, visualization and interactive analysis. In T. Yamakawa and G. Matsumoto, editors, Proceedings of the 5th International Conference on Soft Computing and Information/Intelligent Systems (IIZUKA’98), pages 55–59. World Scientific, October 1998.
Krista Lagus and Samuel Kaski. Keyword selection method for characterizing text document maps. In Proceedings of ICANN99, Ninth International Conference on Artificial Neural Networks, volume 1, pages 371–376. IEE, London, 1999.
Jouko Lampinen and Timo Kostiainen. Recent advances in self-organizing neural networks,chapter Generative probability density model in the Self-Organizing Map. Springer Verlag, To appear.
R. S. Michalski and R. Stepp. Automated construction of classifications: Conceptual clustering versus numerical taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5: 396–410, 1983.
G. Piatetsky-Shapiro and C. Matheus. The interestingness of deviations. In Proceedings of KDD’94, pages 25–36, July 1994.
Dorian Pyle. Data Preparation for Data Mining. Morgan Kaufmann Publishers, 1999.
Andreas Rauber and Dieter Merkl. Automatic labeling of self-organizing maps: Making a treasure-map reveal its secrets. In Proceedings of the 3rd Pasific-Area Conference on Knowledge Discovery and Data Mining (PAKDD ’99), 1999.
Olli Simula, Jussi Ahola, Esa Alhoniemi, Johan Himberg, and Juha Vesanto. Kohonen Maps (E. Oja and S. Kaski, eds.), chapter Self-Organizing Map in Analysis of Large-Scale Industrial Systems. Elsevier, 1999.
Markus Siponen, Juha Vesanto, 011i Simula, and Petri Vasara. An approach to automated interpretation of SOM. In Nigel Allinson, Hujun Yin, Lesley Allinson, and Jon Slack, editors, Proceedings of Workshop on Self-Organizing Map 2001, pages 89–94. Springer, June 2001.
Edward Tufte. The Visual Display of Quantitative Information. Graphics Press, 1983.
A. Ultsch, G. Guimaraes, D. Korns, and H. Li. Knowledge extraction from artificial neural networks and applications. In Proceedings of Transputer-Anwender-Treffen/World-Transputer-Congress (TAT/WTC) 1993,pages 194–203, Aachen, Tagungsband, September 1993. Springer Verlag.
A. Ultsch and H. P. Siemon. Kohonen’s Self Organizing Feature Maps for Exploratory Data Analysis. In Proceedings of International Neural Network Conference (INNC’90),pages 305–308, Dordrecht, Netherlands, 1990. Kluwer.
A. Vellido, P.J.G Lisboa, and K. Meehan. Segmentation of the on-line shopping market using neural networks. Expert Systems with Applications, 17: 303–314, 1999.
Juha Vesanto. SOM-Based Data Visualization Methods. Intelligent Data Analysis, 3 (2): 111–126, 1999.
Juha Vesanto and Esa Alhoniemi. Clustering of the Self-Organizing Map. IEEE Transactions on Neural Networks, 11 (2): 586–600, March 2000.
Colin Ware. Information Visualization: Perception for Design. Morgan Kaufmann Publishers, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vesanto, J., Hollmén, J. (2002). An Automated Report Generation Tool for the Data Understanding Phase. In: Abraham, A., Köppen, M. (eds) Hybrid Information Systems. Advances in Soft Computing, vol 14. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1782-9_44
Download citation
DOI: https://doi.org/10.1007/978-3-7908-1782-9_44
Publisher Name: Physica, Heidelberg
Print ISBN: 978-3-7908-1480-4
Online ISBN: 978-3-7908-1782-9
eBook Packages: Springer Book Archive