Analyzing XploRe Download Profiles with Intelligent Miner

Sofyan, Hizir; Werwatz, Axel

doi:10.1007/s001800100079

Analyzing XploRe Download Profiles with Intelligent Miner

Published: 04 November 2019

Volume 16, pages 465–479, (2001)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Hizir Sofyan¹ &
Axel Werwatz¹

476 Accesses
3 Citations
Explore all metrics

Summary

This paper is an example of data mining in action. The database we are mining contains 1085 profiles of individuals who have downloaded the statistical software XploRe. Each profile contains the responses to an online questionnaire comprised of questions about such things as an individuals’ computing preferences (operating system, favorite statistical software) or professional affiliation. After formatting and cleaning the raw data using MS Excel, we use IBM’s Intelligent Miner to perform a cluster analysis of the download profiles. We try to identify a small number of “types” of users by employing a clustering algorithm based on the New Condorcet Criterion, which is particularly well-suited for our all-categorical data. We identify three clusters in the mining run to which we refer as Academia, Unix/Linux users and Researchers, respectively. Based on the characteristics of the cluster members, we briefly outline how the results of the data analysis may be used for targeted marketing of XploRe.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Profiling Web users using big data

Article 22 March 2018

Xiaotao Gu, Hong Yang, … Xiao Fu

Web Usage Mining—Process, Tools and Practices

Customer Analyst for the Telecom Industry

Notes

¹To make it easy to relate the questions to the variables used below, we already indicate the variable names in bold Typewriter font at this point whenever possible.
²See, for instance Gordon (1999) for a comprehensive treatment of cluster ananlysis. Alternative clustering methods in a data mining context are CLARANS (Ng & Han (1994)), DBSCAN (Ester, Kriegel, Sander, & Xu (1996)) BIRCH (Zhang, Ramakrishnan, & Livny (1996)) and CURE (Guha, Rastogi, & Shim (1998).)
³The remaining variables in the data set proved to be less useful in the clustering algorithm, essentially because they either have too few (Server, Mailing List) or too many possible values (Statistical Software, Country)).
⁴In fact, Intelligent Miner provides a standardized χ² statistic for each variable in each cluster. This statistic, reported in Table 2, indicates how much the intracluster distribution differs from the distribution of the variable in the entire sample. The closer χ² is to 1 (and the farther apart it is from 0) the more differs the intracluster distribution of the variable from its distribution at large. See Grabmeier & Rudolph (1998) for details.
⁵Indeed, in an earlier analysis with less recent profiles, the world wide web was less important in clusters I and III. The increased importance is probably due to both the general increase in internet usage, as well as the enhanced internet representation of XploRe.

References

Chen, M. S., Han, J., & Yu, P. S. (1996). Data Mining: an Overview from a Database Perspective, IEEE Trans. on Knowledge and Data Engineering, 8:866–883.
Article Google Scholar
Ester, M, Kriegel, H., Sander, J., & Xu, X. (1996). A Density Based Algorith for Discovering Clusters in large Spatial Databases with Noise, Proc. of Int’l Conf. on Knowledge Discovery and Data Mining, Portland, Oregon.
Gordon, A. D. (1999). Classification, Chapman and Hall, 2nd ed., London.
Grabmeier, J. & Rudolph, A. (1998). Techniques of Cluster Algorithms in Data Mining, Technical Report IBM, http://www.ibm.com/software/data/iminer/fordata/clusttechn.pdf.
Guha. S, Rastogi. R, & Shim. K (1998). CURE: An efficient clustering algorithm for large databases, Proc. of ACM SIGMOD Int’l Conf. on Management of Data, New York, pp. 73–84.
Ha, S. H. & Park, S. C. (1998). Application of data mining tools to hotel data mart on the Intranet for database marketing, Expert System with Application, 15:1–31.
Article Google Scholar
Härdle, W., Klinke, S., & Müller, M. (1999). XploRe Learning Guide, Springer Verlag, Heidelberg.
Book Google Scholar
Michaud, P. (1987). Condorcet — A man of the Avant-garde, Applied Stochastic Models and Data Analysis, 3:173–189.
Article Google Scholar
Michaud, P. (1997). Clustering Techniques, Future Generation Computer Systems, 13:135–147.
Article Google Scholar
Ng, R.T, & Han, J. (1994). Efficient and Effective Clustering Methods for Spatial Data Mining, Proc. of the 20th Int’l Conf. on Very large databases, Santiago, Chile, pp.144–155.
Zhang, T., Ramakrishnan, R., & Livny, M. (1996). BIRCH: An Efficient Data Clustering Method for Very Large Databases, Proc. of the 1996 ACM SIGMOD Int’l Conf. on Management of Data, Montreal, Canada, pp. 103–114.

Download references

Author information

Authors and Affiliations

Institut für Statistik und Ökonometrie, Humboldt Universität zu Berlin, Spandauer Str. 1, 10178, Berlin, Germany
Hizir Sofyan & Axel Werwatz

Authors

Hizir Sofyan
View author publications
You can also search for this author in PubMed Google Scholar
Axel Werwatz
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

An extended version of this paper is available at http://sfb.wiwi.hu-berlin.de/. Financial support from Deutscher Akademischer Austauschdienst and Deutsche Forschungs-gemeinschaft (SFB 373, “Qualifikation und Simulation Ökonomischer Prozesse”,) is gratefully acknowledged. We are very grateful for the helpful comments of two anonymous referees which led to improvements in the paper. All remaining errors are our own.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sofyan, H., Werwatz, A. Analyzing XploRe Download Profiles with Intelligent Miner. Computational Statistics 16, 465–479 (2001). https://doi.org/10.1007/s001800100079

Download citation

Published: 04 November 2019
Issue Date: September 2001
DOI: https://doi.org/10.1007/s001800100079

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analyzing XploRe Download Profiles with Intelligent Miner

Summary

Access this article

Similar content being viewed by others

Profiling Web users using big data

Web Usage Mining—Process, Tools and Practices

Customer Analyst for the Telecom Industry

Notes

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Analyzing XploRe Download Profiles with Intelligent Miner

Summary

Access this article

Similar content being viewed by others

Profiling Web users using big data

Web Usage Mining—Process, Tools and Practices

Customer Analyst for the Telecom Industry

Notes

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation