Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

ECML PKDD 2012: Machine Learning and Knowledge Discovery in Databases pp 42–57Cite as

  1. Home
  2. Machine Learning and Knowledge Discovery in Databases
  3. Conference paper
Smoothing Categorical Data

Smoothing Categorical Data

  • Arno Siebes20 &
  • René Kersten20 
  • Conference paper
  • 4516 Accesses

  • 3 Citations

Part of the Lecture Notes in Computer Science book series (LNAI,volume 7523)

Abstract

Global models of a dataset reflect not only the large scale structure of the data distribution, they also reflect small(er) scale structure. Hence, if one wants to see the large scale structure, one should somehow subtract this smaller scale structure from the model.

While for some kinds of model – such as boosted classifiers – it is easy to see the “important” components, for many kind of models this is far harder, if at all possible. In such cases one might try an implicit approach: simplify the data distribution without changing the large scale structure. That is, one might first smooth the local structure out of the dataset. Then induce a new model from this smoothed dataset. This new model should now reflect the large scale structure of the original dataset. In this paper we propose such a smoothing for categorical data and for one particular type of models, viz., code tables.

By experiments we show that our approach preserves the large scale structure of a dataset well. That is, the smoothed dataset is simpler while the original and smoothed datasets share the same large scale structure.

Keywords

  • Local Structure
  • Large Scale Structure
  • Original Dataset
  • Minimal Support
  • Pattern Mining

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Download conference paper PDF

References

  1. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In: Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI (1996)

    Google Scholar 

  2. Agresti, A.: Categorical Data Analysis, 2nd edn. Wiley (2002)

    Google Scholar 

  3. Coenen, F.: The LUCS-KDD discretised/normalised (C)ARM data library (2003)

    Google Scholar 

  4. Cover, T., Thomas, J.: Elements of Information Theory, 2nd edn. Wiley (2006)

    Google Scholar 

  5. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)

    CrossRef  MathSciNet  MATH  Google Scholar 

  6. Grünwald, P.D.: Minimum description length tutorial. In: Grünwald, P., Myung, I. (eds.) Advances in Minimum Description Length. MIT Press (2005)

    Google Scholar 

  7. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: Weka data mining software: An update. SIGKDD Explorations 11 (2009)

    Google Scholar 

  8. van Leeuwen, M., Vreeken, J., Siebes, A.: Compression Picks Item Sets That Matter. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 585–592. Springer, Heidelberg (2006)

    CrossRef  Google Scholar 

  9. Pei, J., Tung, A.K.H., Han, J.: Fault tolerant pattern mining: Problems and challenges. In: DMKD (2001)

    Google Scholar 

  10. Siebes, A., Kersten, R.: A structure function for transaction data. In: Proc. SIAM Conf. on Data Mining (2011)

    Google Scholar 

  11. Siebes, A., Vreeken, J., van Leeuwen, M.: Item sets that compress. In: Proc. SIAM Conf. Data Mining, pp. 393–404 (2006)

    Google Scholar 

  12. Simonoff, J.S.: Three sides of smoothing: Categorical data smoothing, nonparametric regression, and density estimation. International Statistical Reviews /Revue Internationale de Statistique 66(2), 137–156 (1998)

    CrossRef  MathSciNet  MATH  Google Scholar 

  13. Vreeken, J., Siebes, A.: Filling in the blanks - krimp minimization for missing data. In: Proceedings of the IEEE International Conference on Data Mining (2008)

    Google Scholar 

  14. Wand, M., Jones, M.: Kernel Smoothing. Chapman & Hall (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Universiteit Utrecht, The Netherlands

    Arno Siebes & René Kersten

Authors
  1. Arno Siebes
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. René Kersten
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Woodland Road, BS8 1UB, Bristol, UK

    Peter A. Flach, Tijl De Bie & Nello Cristianini,  & 

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Siebes, A., Kersten, R. (2012). Smoothing Categorical Data. In: Flach, P.A., De Bie, T., Cristianini, N. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2012. Lecture Notes in Computer Science(), vol 7523. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33460-3_8

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-33460-3_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33459-7

  • Online ISBN: 978-3-642-33460-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature