Journal of Classification

, Volume 31, Issue 1, pp 49–84 | Cite as

Adaptive Mixture Discriminant Analysis for Supervised Learning with Unobserved Classes

Article

Abstract

In supervised learning, an important issue usually not taken into account by classical methods is that a class represented in the test set may have not been encountered earlier in the learning phase. Classical supervised algorithms will automatically label such observations as belonging to one of the known classes in the training set and will not be able to detect new classes. This work introduces a model-based discriminant analysis method, called adaptive mixture discriminant analysis (AMDA), which can detect several unobserved groups of points and can adapt the learned classifier to the new situation. Two EM-based procedures are proposed for parameter estimation and model selection criteria are used for selecting the actual number of classes. Experiments on artificial and real data demonstrate the ability of the proposed method to deal with complex and real-world problems. The proposed approach is also applied to the detection of unobserved communities in social network analysis.

Keywords

Supervised classification Unobserved classes Adaptive learning Multiclass novelty detection Model-based classification Social network analysis 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. AKAIKE, H. (1974), “A New Look at the Statistical Model Identification”, IEEE Transactions on Automatic Control, 19(6), 716-723.Google Scholar
  2. ANDERSON, E. (1935), “The Irises of the Gaspé Peninsula”, Bulletin of the American Iris Society, 59, 2-5.Google Scholar
  3. BANFIELD, J., and RAFTERY, A. (1993), “Model-Based Gaussian and Non-Gaussian Clustering”, Biometrics, 49, 803-821.CrossRefMATHMathSciNetGoogle Scholar
  4. BELLMAN, R. (1957), Dynamic Programming, Princeton: Princeton University Press.Google Scholar
  5. BENSMAIL, H., and CELEUX, G. (1996), “Regularized Gaussian Discriminant Analysis Through Eigenvalue Decomposition”, Journal of the American Statistical Association, 91, 1743-1748.CrossRefMATHMathSciNetGoogle Scholar
  6. BENSMAIL, H., and MEULMAN, J. (2003), “Model-Based Clustering With Noise: Bayesian Inferences and Estimation”, Journal of Classification, 20(1): 49-76.CrossRefMATHMathSciNetGoogle Scholar
  7. BIERNACKI, C., CELEUX, G., and GOVAERT, G. (2000), “Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7), 719-725.CrossRefGoogle Scholar
  8. BISHOP, C. (1994), “Novelty Detection and Neural Network Validation”, IEEE Conference on Vision and Image Signal Processing, 141(4), 217-222.Google Scholar
  9. BLUM, A., and MITCHELL, T. (1998), “Combining Labeled and Unlabeled Data with Co-Training”, Proceedings of the Eleventh Annual Conference on Computational Learning Theory COLT 98, 98(4), pp. 92-100.Google Scholar
  10. BOUVEYRON, C., CHIPMAN, H., and C ÔME, E. (2009), “Supervised Classification and Visualization of Social Networks Based on a Probabilistic Latent Space Model”, at 7th International Workshop on Mining and Learning with Graphs, Leuven, Belgium.Google Scholar
  11. BOUVEYRON, C., GIRARD, S., and SCHMID, C. (2007a), “High-Dimensional Discriminant Analysis”, Communications in Statistics: Theory and Methods, 36(14), 2607-2623.CrossRefMATHMathSciNetGoogle Scholar
  12. BOUVEYRON, C., GIRARD, S., and SCHMID, C. (2007b), “High-Dimensional Data Clustering”, Computational Statistics and Data Analysis, 52(1), 502-519.CrossRefMATHMathSciNetGoogle Scholar
  13. CELEUX, G., and GOVAERT, G. (1995), “Gaussian Parsimonious Clustering Models”, Pattern Recognition, 28(5), 781-793.CrossRefGoogle Scholar
  14. CHOW, C. (1970), “On Optimum Recognition Error and Reject Tradeoff”, IEEE Transactions on Information Theory, 16(1), 41-46.Google Scholar
  15. DASGUPTA, D., and NINO, F. (2000), “A Comparison of Negative and Positive Selection Algorithms in Novel Pattern Detection”, in Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, 1, pp. 125-130.Google Scholar
  16. DEMPSTER, A., LAIRD, N., and RUBIN, D. (1977), “Maximum Likelihood From Incomplete Data via the EM Algorithm”, Journal of the Royal Statistical Society, 39(1), 1-38.MATHMathSciNetGoogle Scholar
  17. DESFORGES, M., JACOB, P., and COOPER, J. (1998) “Applications of Probability Density Estimation to the Detection of Abnormal Conditions in Engineering”, in Proceedings of the Institute of Mechanical Engineers, Part C: Journal of Mechanical Engineering, 212(8), 687-703.Google Scholar
  18. FISHER, R. (1936), “The Use of Multiple Measurements in Taxonomic Problems”, Annals of Eugenics, 7, 179-188.Google Scholar
  19. HANDCOCK, A., RAFTERY, and J. TANTRUM (2007), “Model-Based Clustering for Social Networks”, Journal of the Royal Statistical Society, Series A, 170(2), 1-22.Google Scholar
  20. HANSEN, L., LIISBERG, C., and SALAMON, P. (1997), “The Error-Reject Tradeoff”, Open Systems and Information Dynamics, 4, 159-184.CrossRefMATHGoogle Scholar
  21. HARRIS, K., FLOREY, F., TABOR, J., BEARMAN, P., JONES, J., and UDRY, R. (2003), “The National Longitudinal of Adolescent Health: Research Design”, Technical Report, Unviersity of North Carolina: Carolina Population Center.Google Scholar
  22. HELLMAN, M. (1970), “The Nearest Neighbour Classification with a Reject Option”, in IEEE Transactions on Systems Science and Cybernetics, 6, pp. 179-185.Google Scholar
  23. HENNIG, C., and CORETTO, P. (2008), “The Noise Component in Model-Based Cluster Analysis”, in Data Analysis, Machine Learning and Applications, eds. C. Preisach, H. Burkhardt, L. Schmidt-Thieme, L., and R. Decker, Springer, pp. 127-138.Google Scholar
  24. HOFF, P., RAFTERY, A., and HANDCOCK, M. (2002), “Latent Spaces Approaches to Social Network Analysis”, Journal of the American Statistical Association, 97(460), 1090-1098.CrossRefMATHMathSciNetGoogle Scholar
  25. KOHONEN, T. (1988), Self-Organisation and Associative Memory, Springer-Verlag.Google Scholar
  26. KRISHNAPURAM, B., WILLIAMS, D., XUE, Y., HARTEMINK, A., CARIN, L., and FIGUEIREDO, M. (2004), “On Semi-Supervised Classification”, in Advances in Neural Information Processing Systems (NIPS) 16, pp. 721-728.Google Scholar
  27. MANIKOPOULOS, C., and PAPAVASSILIOU, S. (2002), “Network Intrusion and Fault Detection: A Statistical Anomaly Approach”, IEEE Communications Magazine, 40(10), 76-82.CrossRefGoogle Scholar
  28. MARKOU, M., and SINGH, S. (2003a), “Novelty Detection: A Review - Part 1: Statistical Approaches”, Signal Processing, 83(12), 2481-2497.CrossRefMATHGoogle Scholar
  29. MARKOU, M., and SINGH, S. (2003b), “Novelty Detection: A Review - Part 2: Neural Network Based Approaches”, Signal Processing, 83(12), 2499-2521.CrossRefMATHGoogle Scholar
  30. MCLACHLAN, G. (1975), “Iterative Reclassification Procedure for Constructing an Asymptotically Optimal Rule of Allocation in Discriminant Analysis”, Journal of the American Statistical Association, (70), 365-369.Google Scholar
  31. MCLACHLAN, G. (1992), Discriminant Analysis and Statistical Pattern Recognition, New York: Wiley.Google Scholar
  32. MCLACHLAN, G., and PEEL, D. (2000), Finite Mixture Models, New York: Wiley.CrossRefMATHGoogle Scholar
  33. MILLER, D., and BROWNING, J. (2003), “A Mixture Model and Em-Based Algorithm for Class Discovery, Robust Classification, and Outlier Rejection in Mixed Labeled/Unlabeled Data Sets”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(25), 1468-1483.CrossRefGoogle Scholar
  34. ODIN, T., and ADDISON, D. (2000), “Novelty Detection Using Neural Network Technology”, in Proceedings of the International Congress of Condition Monitoring and Diagnostic Engineering Management (COMADEN), pp. 731-743.Google Scholar
  35. ONEILL, T. (1978), “Normal Discrimination with Unclassified Observations”, Journal of the American Statistical Association, (73), 821-826.Google Scholar
  36. ROBERTS, S. (1999), “Novelty Detection Using Extreme Value Statistics”, in IEEE Proceedings on Vision, Image and Signal Processing, 146, 124-129.Google Scholar
  37. ROBERTS, S., and TARASSENKO, L. (1994), “A Probabilistic Resource Allocating Network for Novelty Detection”, Neural Computation, 6, 270-284.CrossRefGoogle Scholar
  38. RYAN, J., LIN, M., and MIIKKULAINEN, R. (1998), “Intrusion Detection with Neural Networks”, in Advances in Neural Information Processing Systems, pp. 943-949.Google Scholar
  39. SCHWARZ, G. (1978), “Estimating the Dimension of a Model”, The Annals of Statistics, 6, 461-464.Google Scholar
  40. SEEGER, M. (2001), “Learning with Labeled and Unlabeled Data”, Technical Report, University of Edinburgh, Institute for Adaptive and Neural Computation.Google Scholar
  41. SCH ÖLKOPF, B., WILLIAMSON, R., SMOLA, A., SHAWE-TAYLOR, J., and PLATT, J. (2000), “Support Vector Method for Novelty Detection”, in Advances in Neural Information Processing Systems, eds. S.A. Solla, T.K. Leen, and K.-R. Müller, pp. 582-588.Google Scholar
  42. TARASSENKO, L. (1995), “Novelty Detection for the Identification of Masses in Mammograms”, in The 4th IEEE International Conference on Artificial Neural Networks, 4, pp. 442-447.Google Scholar
  43. TAX, D., and DUIN, R. (1999), “Outlier Detection Using Classifier Instability”, in Advances in Pattern Recognition, pp. 251-256.Google Scholar
  44. UDRY, R. (2003), “The National Longitudinal of Adolescent Health: Waves 1 and 2 (1994-1996), Wave 3 (2001-2002)”, Technical Report, University of North Carolina, Carolina Population Center.Google Scholar
  45. YEUNG, D., and CHOW, C. (2002), “Parzen Window Network Intrusion Detectors”, in Proceedings of the International Conference on Pattern Recognition, pp. 385-388.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Université Paris 1ParisFrance

Personalised recommendations