A Maximum Profit Coverage Algorithm with Application to Small Molecules Cluster Identification

  • Refael Hassin
  • Einat Or
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4007)


In this paper we present the cluster identification of molecules (CIM), which is a clustering problem in a finite metric space. We model the problem as a parameter estimation via likelihood maximization and as a novel clustering problem, the maximum profit coverage problem (MPCP). We present a numerical study in which we compare a greedy heuristic and a random heuristic for MPCP, to the known Expectation Minimization approach for the likelihood maximization model. We present a polynomial time approximation scheme for MPCP in Euclidean space.


Unit Ball Cluster Problem Greedy Heuristic Molecule Structure Polynomial Time Approximation Scheme 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barnett, V., Lewis, T.: Outliers in statistical data. Wiley, Chichester (1984)MATHGoogle Scholar
  2. 2.
    Charikar, M., Khuller, S., Mount, D.M., Narasimhan, G.: Algorithms for facility location problems with outliers. In: SODA, pp. 642–651 (2001)Google Scholar
  3. 3.
    Dave, R.N., Krishnapuram, R.: Robust Clustering Methods: A Unified View. IEEE Transactions on Fuzzy Systems 5, 270–293 (1997)CrossRefGoogle Scholar
  4. 4.
    Du, D.-Z., Paradalos, P.M.: Handbook of Combinatorial Optimization, pp. 261–329. Kluwer Academic Publishers, Dordrecht (1998)Google Scholar
  5. 5.
    Ester, M., Kreigel, H., Sander, J., Xu, X.: A density based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: KDD-1996, pp. 226–231 (1996)Google Scholar
  6. 6.
    Ester, M., Kriegel, H.-P., Xu, X.: A Database Interface for Clustering in Large Spatial Databases. In: KDD-1995 (1995)Google Scholar
  7. 7.
    Fowler, R.J., Paterson, M.S., Tanimoto, S.L.: Optimal packing and covering in the plane are NP-complete. Information Processing Letters 12, 290–308 (1981)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Gonzalez, T.F.: Covering a set of points in multidimensional space. Information Processing Letters 40, 181–188 (1991)CrossRefMathSciNetMATHGoogle Scholar
  9. 9.
    Guha, S., Rastogi, R., Shim, K.: CURE: A Efficient Clustering Algorithm for large Databases. In: Proc. of the ACM SIGMOND Conference on Management of Data (1998)Google Scholar
  10. 10.
    Hanand, J.W., Kamber, M.: Data Mining: Concepts And Techniques. Morgan Kaufmann Publishers, San Francisco (2001)Google Scholar
  11. 11.
    Hochbaum, D.S., Maass, W.: Approximation schemes for covering and packing problems in image processing and VLSI. Journal of ACM 32, 130–136 (1985)CrossRefMathSciNetMATHGoogle Scholar
  12. 12.
    Jiang, M.F., Tseng, S.S., Su, C.M.: Two-phase clustering process for outliers detection. Pattern Recognition Letters 22, 691–700 (2001)CrossRefMATHGoogle Scholar
  13. 13.
    Khuller, S., Moss, A., Naor, J.: The budgeted maximum coverage problem. Information Processing Letters 70, 290–308 (1999)CrossRefMathSciNetGoogle Scholar
  14. 14.
    Nag, R., Han, J.: Efficient and Effective Clustering Methods for Spatial Data Mining. In: Proceedings of the 20th VLDB conference, pp. 145–155 (1994)Google Scholar
  15. 15.
    McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. Wiley-Interscience, Chichester (1996)Google Scholar
  16. 16.
    Olson, C.F.: Parallel Algorithms for Hierarchical Clustering. Technical report, Computer Science Division, Univ. of California at Berkley (1993)Google Scholar
  17. 17.
    Pawitan, Y.: In All Likelihood: Statistical Modelling and Inference Using Likelihood. Oxford University Press, Oxford (2000)Google Scholar
  18. 18.
    Richardson, S., Green, P.J.: On Bayesian Analysis of mixtures with an Unknown number of components. J. R. Stat. Soc. B 59, 731–792 (1997)CrossRefMathSciNetMATHGoogle Scholar
  19. 19.
    Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Review 26, 195–239 (1984)CrossRefMathSciNetMATHGoogle Scholar
  20. 20.
    Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6, 461–464 (1978)CrossRefMathSciNetMATHGoogle Scholar
  21. 21.
    Spielman, D., Teng, S.-H.: Spectral partitioning works: planar graphs and finite element meshes. In: Proc. of 37th FOCS, pp. 96–105 (1996)Google Scholar
  22. 22.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 888–905 (2000)CrossRefGoogle Scholar
  23. 23.
    Xu, L., Jordan, M.: On the convergence properties of the EM Algorithm for Gaussian Mixtures. Neural Computation 8, 129–151 (1996)CrossRefGoogle Scholar
  24. 24.
    Zhuang, X., Huang, Y., Palaniappan, K., Zhao, Y.: Gaussian Mixture Density Modelling, and Applications. IEEE Transactions on Image Processing 5, 1293–1301 (1996)CrossRefGoogle Scholar
  25. 25.
    Zhang, J., Leung, Y.: Robust Clustering by Pruning Outliers. IEEE Transactions on Systems, Man and Cybernetics-Part B: Cybernetics 33, 983–998 (2003)CrossRefMATHGoogle Scholar
  26. 26.
    Zhang, T., Ramakrishnan, R., Livny, M.: BRITCH: An Efficient Data Clustering Method for Very Large Databases. In: Proc. of the ACM SIGMOND Conference on Management of Data, pp. 103–114 (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Refael Hassin
    • 1
  • Einat Or
    • 1
  1. 1.Department of Statistics and Operations Resea rchTel Aviv UniversityTel AvivIsrael

Personalised recommendations