Algorithms can be utilised to play two essential roles in the data mining endeavour. Firstly, in the form of procedures, they may determine how profiling is conducted by controlling the profiling process itself. For example, methodologies such as CRISP-DM have been designed to control the process of extracting information from the large quantities of data that have become readily available in our modern, data rich society. In this situation, algorithms can be tuned to assist in the capture, verification and validation of data, as discussed in the reply to Chapter 3.
Secondly, algorithms, dominantly as mathematical procedures, can be used as the profiling engine to identify trends, relationships and hidden patterns in disparate groups of data. The use of algorithms in this way often means that more effective profiles can ultimately be computed than would be possible manually. In this chapter we show how algorithms find a natural home at the very heart of the profiling process and how such machine learning can actually be used to address the task of knowledge discovery.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aggarwal, C. C., Hinneburg, A., Keim, D.A., ‘On the Surprising Behavior of Distance Metrics in High Dimensional Spaces’, Proceedings of the 8th International Conference on Database Theory 2001, Springer Verlag GmbH 2006, pp. 420–434.
Agrawal, R., Imielinski, T., Swami, A., ‘Mining Association Rules between Sets of Items in Large Databases’, Proceedings of the ACM SIGMOD Conference Washington DC, USA, May 1993, ACM Press, New York, 1993, pp. 207–216.
Banks, D., et al., Classification, Clustering, and Data Mining Applications, Springer, Berlin, 2004.
Backhaus, K., et al., Multivariate Analysemethoden - Eine anwendungsorientierte Einführung, Springer, Berlin, 2000.
Cristianini N., Shawe-Taylor J., Kernel Methods for Pattern Analysis, Cambridge University Press, Cambridge, 2004.
Codd, E. F., ‘A Relational Model of Data for Large Shared Data Banks’, Communications of the ACM, Vol.13, No 6, ACM Press, New York, pp. 377–387, 1970.
Fisher, R., ‘The correlation between relatives on the supposition of Mendelian inheritance’, Philosophical Transactions of the Royal Society of Edinburgh, Vol. 52, Royal Society of Edinburgh, Scotland, 1918, pp. 399–433.
Fisher, R., ‘On the mathematical foundations of theoretical statistics’, Philosophical Transactions of the Royal Society, Royal Society of Edinburgh, Scotland, Vol. 222, 1922, pp. 309–368.
Franzén, T., Gödel’s Theorem. An Incomplete Guide to its Use and Abuse, Wellesley, Mass.: A. K. Peters, 2005.
Han, J., Kamber, M., Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2000.
Holte R., ‘Very simple classification rules perform well on most commonly used datasets’, Machine Learning, Vol. 11, No. 1, Springer, Netherlands, 1993, pp. 63–91.
Jain, A. K., Murty, M. N., Flynn P. J., ‘Data Clustering: A Review’, ACM Computing Surveys, Vol. 31, No. 3, ACM Press, New York, 1999, pp. 264–323.
Lusti, M., Data Warehousing and Data Mining, Springer, Berlin, 2001.
MacKay, D. J. C., Information Theory, Inference and Learning Algorithms, Cambridge University Press, Cambrigde, 2003.
Mackenzie, D., Mechanizing Proof. Computing, Risk, and Trust. Cambridge, Mass.: MIT, 2001.
Müller, J. A., Lemke, F., Self-Organising Data Mining, BoD GmbH, Norderstedt, 2000.
Oliveira, S. R. M., Zaïane, O. R., ‘Towards Standardization in Privacy-Preserving Data Mining’, Proceeding of the 3rd. Workshop on Data Mining Standards (DM-SSP 2004), in conjunction with KDD 2004, Seattle, WA, USA, August, 2004. Available at: http://www.cs.ualberta.ca/%7Ezaiane/postscript/dm-ssp04.pdf
Pearl, J., Probabilistic reasoning in plausible inference, Morgan Kaufmann, 1988.
Picard, J., ‘Modeling and Combining Evidence Provided by Document Relationships Using Probabilistic Argumentation Systems’, Proceedings of the ACM SIGIR’98 Conference, ACM Press, New York, 1998, pp. 182–189.
Quinlan, J. R., ‘Inductive Learning of Decision Trees’, Machine Learning, Vol 1, No. 1, Springer, Netherlands, 1986, pp. 81–106.
Schölkopf, B., Smola, A., Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (Adaptive Computation and Machine Learning), MIT Press, Cambridge, MA, 2002.
Verykios, V. S., Bertino, E., Fovino, I. N., Provenza, L. P., Saygin, Y., Theodoridis, Y., ‘State-of-the-art in Privacy Preserving Data Mining’, SIGMOD Record, Vol. 33, No. 1, New York, March 2004, pp. 50–57. Available at: http://dke.cti.gr/CODMINE/SIGREC_Verykios-et-al.pdf
Witten, I. H., Frank, E., Data Mining: Practical machine learning tools and techniques, Morgan Kaufmann, San Francisco, 2005.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Science + Business Media B.V
About this chapter
Cite this chapter
Anrig, B., Browne, W., Gasson, M. (2008). The Role of Algorithms in Profiling. In: Hildebrandt, M., Gutwirth, S. (eds) Profiling the European Citizen. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6914-7_4
Download citation
DOI: https://doi.org/10.1007/978-1-4020-6914-7_4
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-6913-0
Online ISBN: 978-1-4020-6914-7
eBook Packages: Computer ScienceComputer Science (R0)