Hybrid Data Clustering Based on Dependency Structure and Gibbs Sampling

  • Shuang-Cheng Wang
  • Xiao-Lin Li
  • Hai-Yan Tang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4304)


A new method for data clustering is presented in this paper. It can cluster data set with both continuous and discrete data effectively. By using this method, the values of cluster variable are viewed as missing data. At first, the missing data are initialized randomly. All those data are revised through the iteration by combining Gibbs sampling with the dependency structure that is built according to prior knowledge or built as star-shaped structure alternatively. A penalty coefficient is introduced to extend the MDL scoring function and the optimal cluster number is determined by using the extended MDL scoring function and the statistical methods.


Dependency Structure Gibbs Sampling Cluster Number Cluster Variable Cluster Accuracy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chen, S.M., Hsiao, H.R.: A New Method to Estimate Null Values in Relational Database Systems Based on Automatic Clustering Techniques. Information Sciences: an International Journal 69, 1–2 (2005)Google Scholar
  2. 2.
    Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., Freeman, D.: AutoClass: A Bayesian Classification System. In: Laird, J. (ed.) Proceedings of the 15th International Conference on Machine Learning, pp. 54–64. Morgan Kaufmann, San Mateo (1988)Google Scholar
  3. 3.
    Cheeseman, P., Stutz, J.: Bayesian Classification (AutoClass): Theory and Results. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.), pp. 153–180. AAAI/MIT Press, Cambridge (1996)Google Scholar
  4. 4.
    Geman, S., Geman, D.: Stochastic Relaxation, Gibbs Distributions and the Bayesian Restoration of Images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721–742 (1984)MATHCrossRefGoogle Scholar
  5. 5.
    Mao, S.S., Wang, J.L., Pu, X.L.: Advanced Mathematical Statistics, 1st edn., pp. 401–459. China Higher Education Press, Beijing, Springer, Berlin (1998)Google Scholar
  6. 6.
    Lam, W., Bacchus, F.: Learning Bayesian Belief Networks: An Approach Based on the MDL Principle. Computational Intelligence 4, 269–293 (1994)CrossRefGoogle Scholar
  7. 7.
    Domingos, P., Pazzani, M.: On the Optimality of the Simple Bayesian Classifier Under Zero-one Loss. Machine Learning 130, 2–3 (1997)Google Scholar
  8. 8.
    Murphy, S.L., Aha, D.W.: UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/~mlearn/MLRepository.html

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Shuang-Cheng Wang
    • 1
    • 2
  • Xiao-Lin Li
    • 3
  • Hai-Yan Tang
    • 2
  1. 1.Department of Information ScienceShanghai Lixin University of CommerceShanghaiChina
  2. 2.China Lixin Risk Management Research InstituteShanghai Lixin University of CommerceShanghaiChina
  3. 3.National Laboratory for Novel Software TechnologyNanjing UniversityNanjingChina

Personalised recommendations