Metrics for Mining Multisets

Kosters, Walter A.; Laros, Jeroen F. J.

doi:10.1007/978-1-84800-094-0_22

Walter A. Kosters⁴ &
Jeroen F. J. Laros⁴

Included in the following conference series:

International Conference on Innovative Techniques and Applications of Artificial Intelligence

433 Accesses
12 Citations

Abstract

We propose a new class of distance measures (metrics) designed for multisets, both of which are a recurrent theme in many data mining applications. One particular instance of this class originated from the necessity for a clustering of criminal behaviours.

These distance measures are parameterized by a function f which, given a few simple restrictions, will always produce a valid metric. This flexibility allows these measures to be tailored for many domain-specific applications.

In this paper, the metrics are applied in bio-informatics (genomics), criminal behaviour clustering and text mining. The metric we propose also is a generalization of some known measures, e.g., the Jaccard distance and the Canberra distance. We discuss several options, and compare the behaviour of different instances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bray J.R., Curtis J.T., An ordination of the upland forest communities of southern Wisconsin, Ecol. Monogr. 27 (1957): 325–349.
Article Google Scholar
Bruin, J.S. de, Cocx, T.K., Kosters, W.A., Laros, J.F.J., Kok, J.N., Data mining approaches to criminal career analysis, Sixth IEEE International Conference on Data Mining (ICDM 2006), Proceedings pp. 171-177.
Google Scholar
Hoogeboom, H.J., Kosters, W.A., Laros, J.F.J., Selection of DNA markers, to appear in IEEE Transactions on Systems, Man, and Cybernetics Part C, 2007.
Google Scholar
Jaccard, P., Lois de distribution florale dans la zone alpine, Bull. Soc. Vaud. Sci. Nat. 38 (1902): 69–130.
Google Scholar
Kosters, W.A.,Wezel, M.C. van, Competitive neural networks for customer choice models, in E-Commerce and Intelligent Methods, volume 105 of Studies in Fuzziness and Soft Computing, Physica-Verlag, Springer, 2002, pp. 41–60.
Google Scholar
Mahalanobis, P. C., On the generalised distance in statistics, Proceedings of the National Institute of Science of India 12 (1936): 49–55.
Google Scholar
Tan, P.-N., Steinbach, M., Kumar, V., Introduction to data mining, Addison- Wesley, 2005.
Google Scholar
UCSC Genome Bioinformatics, http://genome.ucsc.edu/.
Google Scholar

Download references

Author information

Authors and Affiliations

Leiden Institute of Advanced Computer Science, Leiden University, The Netherlands
Walter A. Kosters & Jeroen F. J. Laros

Authors

Walter A. Kosters
View author publications
You can also search for this author in PubMed Google Scholar
Jeroen F. J. Laros
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Technology, University of Portsmouth, Portsmouth, UK
Max Bramer BSc, PhD, CEng, CITP, FBCS, FIET, FRSA, FHEA
Department of Computer Science, University of Liverpool, Liverpool, UK
Frans Coenen BSc, PhD
University of Greenwich, UK
Miltos Petridis DipEng, MBA, PhD, MBCS, AMBA

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kosters, W.A., Laros, J.F.J. (2008). Metrics for Mining Multisets. In: Bramer, M., Coenen, F., Petridis, M. (eds) Research and Development in Intelligent Systems XXIV. SGAI 2007. Springer, London. https://doi.org/10.1007/978-1-84800-094-0_22

Download citation

DOI: https://doi.org/10.1007/978-1-84800-094-0_22
Publisher Name: Springer, London
Print ISBN: 978-1-84800-093-3
Online ISBN: 978-1-84800-094-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics