Skip to main content

Metrics for Mining Multisets

  • Conference paper
Research and Development in Intelligent Systems XXIV (SGAI 2007)

Abstract

We propose a new class of distance measures (metrics) designed for multisets, both of which are a recurrent theme in many data mining applications. One particular instance of this class originated from the necessity for a clustering of criminal behaviours.

These distance measures are parameterized by a function f which, given a few simple restrictions, will always produce a valid metric. This flexibility allows these measures to be tailored for many domain-specific applications.

In this paper, the metrics are applied in bio-informatics (genomics), criminal behaviour clustering and text mining. The metric we propose also is a generalization of some known measures, e.g., the Jaccard distance and the Canberra distance. We discuss several options, and compare the behaviour of different instances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bray J.R., Curtis J.T., An ordination of the upland forest communities of southern Wisconsin, Ecol. Monogr. 27 (1957): 325–349.

    Article  Google Scholar 

  2. Bruin, J.S. de, Cocx, T.K., Kosters, W.A., Laros, J.F.J., Kok, J.N., Data mining approaches to criminal career analysis, Sixth IEEE International Conference on Data Mining (ICDM 2006), Proceedings pp. 171-177.

    Google Scholar 

  3. Hoogeboom, H.J., Kosters, W.A., Laros, J.F.J., Selection of DNA markers, to appear in IEEE Transactions on Systems, Man, and Cybernetics Part C, 2007.

    Google Scholar 

  4. Jaccard, P., Lois de distribution florale dans la zone alpine, Bull. Soc. Vaud. Sci. Nat. 38 (1902): 69–130.

    Google Scholar 

  5. Kosters, W.A.,Wezel, M.C. van, Competitive neural networks for customer choice models, in E-Commerce and Intelligent Methods, volume 105 of Studies in Fuzziness and Soft Computing, Physica-Verlag, Springer, 2002, pp. 41–60.

    Google Scholar 

  6. Mahalanobis, P. C., On the generalised distance in statistics, Proceedings of the National Institute of Science of India 12 (1936): 49–55.

    Google Scholar 

  7. Tan, P.-N., Steinbach, M., Kumar, V., Introduction to data mining, Addison- Wesley, 2005.

    Google Scholar 

  8. UCSC Genome Bioinformatics, http://genome.ucsc.edu/.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag London Limited

About this paper

Cite this paper

Kosters, W.A., Laros, J.F.J. (2008). Metrics for Mining Multisets. In: Bramer, M., Coenen, F., Petridis, M. (eds) Research and Development in Intelligent Systems XXIV. SGAI 2007. Springer, London. https://doi.org/10.1007/978-1-84800-094-0_22

Download citation

  • DOI: https://doi.org/10.1007/978-1-84800-094-0_22

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84800-093-3

  • Online ISBN: 978-1-84800-094-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics