An EM-Approach for Clustering Multi-Instance Objects

  • Hans-Peter Kriegel
  • Alexey Pryakhin
  • Matthias Schubert
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3918)


In many data mining applications the data objects are modeled as sets of feature vectors or multi-instance objects. In this paper, we present an expectation maximization approach for clustering multi-instance objects. We therefore present a statistical process that models multi-instance objects. Furthermore, we present M-steps and E-steps for EM clustering and a method for finding a good initial model. In our experimental evaluation, we demonstrate that the new EM algorithm is capable to increase the cluster quality for three real world data sets compared to a k-medoid clustering.


Feature Vector Mixture Model Expectation Maximization Expectation Maximization Algorithm Instance Object 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Dietterich, T., Lathrop, R., Lozano-Perez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89, 31–71 (1997)CrossRefMATHGoogle Scholar
  2. 2.
    Kriegel, H.P., Schubert, M.: Classification of websites as sets of feature vectors. In: Proc. IASTED Int. Conf. on Databases and Applications (DBA 2004), Innsbruck, Austria (2004)Google Scholar
  3. 3.
    Zhou, Z.H.: Multi-Instance Learning: A Survey. Technical Report, AI Lab, Computer Science a. Technology Department, Nanjing University, Nanjing, China (2004)Google Scholar
  4. 4.
    Ruffo, G.: Learning single and multiple instance decision tree for computer security applications. PhD thesis, Department of Computer Science, University of Turin, Torino, Italy (2000)Google Scholar
  5. 5.
    Weidmann, N., Frank, E., Pfahringer, B.: A two-level learning method for generalized multi-instance problems. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) ECML 2003. LNCS, vol. 2837, pp. 468–479. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  6. 6.
    Eiter, T., Mannila, H.: Distance Measures for Point Sets and Their Computation. Acta Informatica 34, 103–133 (1997)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Ramon, J., Bruynooghe, M.: A polynomial time computable metric between points sets. Acta Informatica 37, 765–780 (2001)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)MATHGoogle Scholar
  9. 9.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. Int. Conf. on Knowledge Discovery and Data Mining (KDD), pp. 291–316 (1996)Google Scholar
  10. 10.
    Gärtner, T., Flach, P., Kowalczyk, A., Smola, A.: Multi-Instance Kernels, pp. 179–186 (2002)Google Scholar
  11. 11.
    Ng, R., Han, J.: Efficient and Effective Clustering Methods for Spatial Data Mining. In: Proc. Int. Conf. on Very Large Databases (VLDB), pp. 144–155 (1994)Google Scholar
  12. 12.
    Wang, J., Zucker, J.: Solving Multiple-Instance Problem: A Lazy Learning Approach, pp. 1119–1125 (2000)Google Scholar
  13. 13.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Academic Press, London (2001)MATHGoogle Scholar
  14. 14.
    Fayyad, U., Reina, C., Bradley, P.: Initialization of Iterative Refinement Clustering Algorithms. In: Proc. Int. Conf. on Knowledge Discovery in Databases (KDD) (1998)Google Scholar
  15. 15.
    Smyth, P.: Clustering using monte carlo cross-validation. In: KDD, pp. 126–133 (1996)Google Scholar
  16. 16.
    Wang, J.T.L., Ma, Q., Shasha, D., Wu, C.H.: New techniques for extracting features from protein sequences. IBM Syst. J. 40, 426–441 (2001)CrossRefGoogle Scholar
  17. 17.
    Newman, D.J., Hettich, S., Blake, C., Merz, C.: UCI repository of machine learning databases (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Hans-Peter Kriegel
    • 1
  • Alexey Pryakhin
    • 1
  • Matthias Schubert
    • 1
  1. 1.Institute for InformaticsUniversity of MunichMunichGermany

Personalised recommendations