Skip to main content

Clustering Uncertain Data Via K-Medoids

  • Conference paper
Scalable Uncertainty Management (SUM 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5291))

Included in the following conference series:

Abstract

Uncertain data are usually represented in terms of an uncertainty region over which a probability density function (pdf) is defined. In the context of uncertain data management, there has been a growing interest in clustering uncertain data. In particular, the classic K-means clustering algorithm has been recently adapted to handle uncertain data. However, the centroid-based partitional clustering approach used in the adapted K-means presents two major weaknesses that are related to: (i) an accuracy issue, since cluster centroids are computed as deterministic objects using the expected values of the pdfs of the clustered objects; and, (ii) an efficiency issue, since the expected distance between uncertain objects and cluster centroids is computationally expensive.

In this paper, we address the problem of clustering uncertain data by proposing a K-medoids-based algorithm, called UK-medoids, which is designed to overcome the above issues. In particular, our UK-medoids algorithm employs distance functions properly defined for uncertain objects, and exploits a K-medoids scheme. Experiments have shown that UK-medoids outperforms existing algorithms from an accuracy viewpoint while achieving reasonably good efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chau, M., Cheng, R., Kao, B., Ng, J.: Uncertain Data Mining: An Example in Clustering Location Data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 199–204. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  2. Imielinski, T., Lipski Jr., W.: Incomplete Information in Relational Databases. Journal of the ACM 31(4), 761–791 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  3. Abiteboul, S., Kanellakis, P., Grahne, G.: On the Representation and Querying of Sets of Possible Worlds. In: Proc. SIGMOD Conf., pp. 34–48 (1987)

    Google Scholar 

  4. Sadri, F.: Modeling Uncertainty in Databases. In: Proc. ICDE Conf., pp. 122–131 (1991)

    Google Scholar 

  5. Lakshmanan, L.V.S., Leone, N., Ross, R.B., Subrahmanian, V.S.: ProbView: A Flexible Probabilistic Database System. ACM TODS 22(3), 419–469 (1997)

    Article  Google Scholar 

  6. Dalvi, N.N., Suciu, D.: Efficient Query Evaluation on Probabilistic Databases. In: Proc. VLDB Conf., pp. 864–875 (2004)

    Google Scholar 

  7. Green, T., Tannen, V.: Models for Incomplete and Probabilistic Information. IEEE Data Engineering Bulletin 29(1), 17–24 (2006)

    Google Scholar 

  8. Aggarwal, C.C.: On Density Based Transforms for Uncertain Data Mining. In: Proc. ICDE Conf., pp. 866–875 (2007)

    Google Scholar 

  9. Tao, Y., Xiao, X., Cheng, R.: Range Search on Multidimensional Uncertain Data. TODS 32(3), 15–62 (2007)

    Article  Google Scholar 

  10. Galindo, J., Urrutia, A., Piattini, M.: Fuzzy Databases: Modeling, Design, and Implementation. Idea Group Publishing (2006)

    Google Scholar 

  11. Lee, S.K.: An Extended Relational Database Model for Uncertain and Imprecise Information. In: Proc. VLDB Conf., pp. 211–220 (1992)

    Google Scholar 

  12. Lim, E.-P., Srivastava, J., Shekhar, S.: An Evidential Reasoning Approach to Attribute Value Conflict Resolution in Database Integration. TKDE 8(5), 707–723 (1996)

    Google Scholar 

  13. Sarma, A.D., Benjelloun, O., Halevy, A., Widom, J.: Working Models for Uncertain Data. In: Proc. ICDE Conf., pp. 7–18 (2006)

    Google Scholar 

  14. Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: Proc. SIGMOD Conf., pp. 551–562 (2003)

    Google Scholar 

  15. Kriegel, H.-P., Pfeifle, M.: Density-Based Clustering of Uncertain Data. In: Proc. ACM SIGKDD Conf., pp. 672–677 (2005)

    Google Scholar 

  16. Cantoni, V., Lombardi, L., Lombardi, P.: Challenges for Data Mining in Distributed Sensor Networks. In: Proc. ICPR Conf., pp. 1000–1007 (2006)

    Google Scholar 

  17. Faradjian, A., Gehrke, J., Bonnet, P.: GADT: A Probability Space ADT for Representing and Querying the Physical World. In: Proc. ICDE Conf., pp. 201–211 (2002)

    Google Scholar 

  18. Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J.M., Hong, W.: Model-based approximate querying in sensor networks. VLDB Journal 14(4), 417–443 (2005)

    Article  Google Scholar 

  19. Li, Y., Han, J., Yang, J.: Clustering Moving Objects. In: Proc. ACM SIGKDD Conf., pp. 617–622 (2004)

    Google Scholar 

  20. Aggarwal, C.C., Yu, P.S.: A Survey of Uncertain Data Algorithms and Applications. Technical Report RC24394, IBM Research Division, Thomas J. Watson Research Center (October 2007)

    Google Scholar 

  21. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  22. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proc. Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)

    Google Scholar 

  23. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Chichester (1990)

    Google Scholar 

  24. Bi, J., Zhang, T.: Support Vector Classification with Input Data Uncertainty. In: Proc. NIPS Conf., pp. 483–493 (2004)

    Google Scholar 

  25. Aggarwal, C.C., Yu, P.S.: Outlier Detection with Uncertain Data. In: Proc. SDM Conf., pp. 483–493 (2008)

    Google Scholar 

  26. Chui, C.K., Kao, B., Hung, E.: Mining Frequent Itemsets from Uncertain Data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 47–58. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  27. Kriegel, H.-P., Pfeifle, M.: Hierarchical Density-Based Clustering of Uncertain Data. In: Proc. ICDM Conf., pp. 689–692 (2005)

    Google Scholar 

  28. Ngai, W.K., Kao, B., Chui, C.K., Cheng, R., Chau, M., Yip, K.Y.: Efficient Clustering of Uncertain Data. In: Proc. ICDM Conf., pp. 436–445 (2006)

    Google Scholar 

  29. Lee, S.D., Kao, B., Cheng, R.: Reducing UK-means to K-means. In: Proc. ICDM Workshops, pp. 483–488 (2007)

    Google Scholar 

  30. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. ACM SIGKDD Conf., pp. 226–231 (1996)

    Google Scholar 

  31. Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: OPTICS: Ordering Points To Identify the Clustering Structure. In: Proc. SIGMOD Conf., pp. 49–60 (1999)

    Google Scholar 

  32. Kaufmann, L., Rousseeuw, P.J.: Clustering by means of medoids. In: Proc. Statistical Data Analysis based on the L 1 Norm Conf., pp. 405–416 (1987)

    Google Scholar 

  33. van Rijsbergen, C.J.: Information Retrieval. Butterworths (1979)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gullo, F., Ponti, G., Tagarelli, A. (2008). Clustering Uncertain Data Via K-Medoids. In: Greco, S., Lukasiewicz, T. (eds) Scalable Uncertainty Management. SUM 2008. Lecture Notes in Computer Science(), vol 5291. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87993-0_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87993-0_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87992-3

  • Online ISBN: 978-3-540-87993-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics