A Probabilistic Clustering-Projection Model for Discrete Data

  • Shipeng Yu
  • Kai Yu
  • Volker Tresp
  • Hans-Peter Kriegel
Conference paper

DOI: 10.1007/11564126_41

Part of the Lecture Notes in Computer Science book series (LNCS, volume 3721)
Cite this paper as:
Yu S., Yu K., Tresp V., Kriegel HP. (2005) A Probabilistic Clustering-Projection Model for Discrete Data. In: Jorge A.M., Torgo L., Brazdil P., Camacho R., Gama J. (eds) Knowledge Discovery in Databases: PKDD 2005. PKDD 2005. Lecture Notes in Computer Science, vol 3721. Springer, Berlin, Heidelberg

Abstract

For discrete co-occurrence data like documents and words, calculating optimal projections and clustering are two different but related tasks. The goal of projection is to find a low-dimensional latent space for words, and clustering aims at grouping documents based on their feature representations. In general projection and clustering are studied independently, but they both represent the intrinsic structure of data and should reinforce each other. In this paper we introduce a probabilistic clustering-projection (PCP) model for discrete data, where they are both represented in a unified framework. Clustering is seen to be performed in the projected space, and projection explicitly considers clustering structure. Iterating the two operations turns out to be exactly the variational EM algorithm under Bayesian model inference, and thus is guaranteed to improve the data likelihood. The model is evaluated on two text data sets, both showing very encouraging results.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Shipeng Yu
    • 1
    • 2
  • Kai Yu
    • 2
  • Volker Tresp
    • 2
  • Hans-Peter Kriegel
    • 1
  1. 1.Institute for Computer ScienceUniversity of MunichGermany
  2. 2.Siemens Corporate TechnologyMunichGermany

Personalised recommendations