Skip to main content
Log in

Discovery of latent structures: Experience with the CoIL Challenge 2000 data set*

  • Published:
Journal of Systems Science and Complexity Aims and scope Submit manuscript

Abstract

The authors present a case study to demonstrate the possibility of discovering complex and interesting latent structures using hierarchical latent class (HLC) models. A similar effort was made earlier by Zhang (2002), but that study involved only small applications with 4 or 5 observed variables and no more than 2 latent variables due to the lack of efficient learning algorithms. Significant progress has been made since then on algorithmic research, and it is now possible to learn HLC models with dozens of observed variables. This allows us to demonstrate the benefits of HLC models more convincingly than before. The authors have successfully analyzed the CoIL Challenge 2000 data set using HLC models. The model obtained consists of 22 latent variables, and its structure is intuitively appealing. It is exciting to know that such a large and meaningful latent structure can be automatically inferred from data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. N. L. Zhang, Hierarchical latent class models for cluster analysis, in Proceedings of the 18th National Conference on Artificial Intelligence, AAAI Press, Menlo Park, 2002, 230–237.

    Google Scholar 

  2. N. L. Zhang, Hierarchical latent class models for cluster analysis, Journal of Machine Learning Research, 2004, 5(Jun): 697–723.

    Google Scholar 

  3. P. F. Lazarsfeld and N. W. Henry, Latent Structure Analysis, Houghton Mifflin, Boston, 1968.

  4. J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann Publishers, Palo Alto, 1988.

    Google Scholar 

  5. D. J. Bartholomew and M. Knott, Latent Variable Models and Factor Analysis (2nd edition), Arnold, London, 1999.

    Google Scholar 

  6. R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, Cambridge, 1998.

    Google Scholar 

  7. G. Elidan, N. Lotner, N. Friedman, and D. Koller, Discovering hidden variables: a structure-based approach, in Advances in Neural Information Processing Systems 13, MIT Press, Cambridge, 2001, 479–485.

    Google Scholar 

  8. R. Silva, R. Scheines, C. Glymour, and P. Spirtes, Learning measurement models for unobserved variables, in Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence, 2003, 545–555.

  9. P. van der Putten and M. van Someren, A bias-variance analysis of a real world learning problem: the CoIL Challenge 2000, Machine Learning, 2004, 57(1–2): 177–195.

    Article  Google Scholar 

  10. J. K. Vermunt and J. Magidson, Latent class cluster analysis, in Applied Latent Class Analysis, Cambridge University Press, Cambridge, 2002, 89–106.

    Google Scholar 

  11. G. Schwarz, Estimating the dimension of a model, Annals of Statistics, 1978, 6(2): 461–464.

    Article  Google Scholar 

  12. H. Akaike, A new look at the statistical model identification, IEEE Transactions on Automatic Control, 1974, 19(6): 716–723.

    Article  Google Scholar 

  13. P. Cheeseman and J. Stutz, Bayesian classification (AutoClass): theory and results, in Advances in Knowledge Discovery and Data Mining (ed. by U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy), AAAI Press, Menlo Park, 1996.

    Google Scholar 

  14. R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter, Probabilistic Networks and Expert Systems, Springer, New York, 1999.

  15. D. Geiger, D. Heckerman, and C.Meek, Asymptotic model selection for directed networks with hidden variables, in Proceedings of the 12th Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, San Fransisco, 1996, 283–290.

    Google Scholar 

  16. T. Koc̆ka and N. L. Zhang, Dimension correction for hierarchical latent class models, in Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (ed. by A. Darwiche and N. Friedman), Morgan Kaufmann Publishers, San Fransisco, 2002, 267–274.

    Google Scholar 

  17. N. L. Zhang and T. Koèka, Efficient learning of hierarchical latent class models, in Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence, IEEE Computer Society, Los Alamitos, CA, 2004, 585–593.

  18. N. Friedman, Learning belief networks in the presence of missing values and hidden variables, in Proceedings of the 14th International Conference on Machine Learning, Morgan Kaufmann Publishers, San Fransisco, 1997, 125–133.

  19. D. M. Chickering, Learning equivalence classes of Bayesian-network structures, Journal of Machine Learning Research, 2002, 2(Feb): 445–498.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nevin L. ZHANG.

Additional information

*The research is supported by Hong Kong Grants Council Grants #622105 and #622307, and the National Basic Research Program of China (aka the 973 Program) under project No. 2003CB517106.

Rights and permissions

Reprints and permissions

About this article

Cite this article

ZHANG, N.L., WANG, Y. & CHEN, T. Discovery of latent structures: Experience with the CoIL Challenge 2000 data set*. J Syst Sci Complex 21, 172–183 (2008). https://doi.org/10.1007/s11424-008-9101-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11424-008-9101-2

Key words

Navigation