Incremental data-driven learning of a novelty detection model for one-class classification with application to high-dimensional noisy data

Kassab, Randa; Alexandre, Frédéric

doi:10.1007/s10994-008-5092-4

Incremental data-driven learning of a novelty detection model for one-class classification with application to high-dimensional noisy data

Published: 02 December 2008

Volume 74, pages 191–234, (2009)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Incremental data-driven learning of a novelty detection model for one-class classification with application to high-dimensional noisy data

Download PDF

Randa Kassab¹ &
Frédéric Alexandre¹

913 Accesses
22 Citations
Explore all metrics

Abstract

Most conventional learning algorithms require both positive and negative training data for achieving accurate classification results. However, the problem of learning classifiers from only positive data arises in many applications where negative data are too costly, difficult to obtain, or not available at all. This paper describes a new machine learning approach, called ILoNDF (Incremental data-driven Learning of Novelty Detector Filter). The approach is inspired by novelty detection theory and its learning method, which typically requires only examples from one class to learn a model. One advantage of ILoNDF is the ability of its generative learning to capture the intrinsic characteristics of the training data by continuously integrating the information relating to the relative frequencies of the features of training data and their co-occurrence dependencies. This makes ILoNDF rather stable and less sensitive to noisy features which may be present in the representation of the positive data. In addition, ILoNDF does not require extensive computational resources since it operates on-line without repeated training, and no parameters need to be tuned. In this study we mainly focus on the robustness of ILoNDF in dealing with high-dimensional noisy data and we investigate the variation of its performance depending on the amount of data available for training. To make our study comparable to previous studies, we investigate four common methods: PCA residuals, Hotelling’s T ² test, an auto-associative neural network, and a one-class version of the SVM classifier (lately a favored method for one-class classification). Experiments are conducted on two real-world text corpora: Reuters and WebKB. Results show that ILoNDF tends to be more robust, is less affected by initial settings, and consistently outperforms the other methods.

Article PDF

Fast One-Class Support Vector Machine for Novelty Detection

Self-supervised Novelty Detection for Continual Learning: A Gradient-Based Approach Boosted by Binary Classification

Collaborative and dynamic kernel discriminant analysis for large-scale problems: applications in multi-class learning and novelty detection

Article 22 January 2024

References

Aeyels, D. (1990). On the dynamic behaviour of the novelty detector and the novelty filter. In B. Bonnard, B. Bride, J. P. Gauthier & I. Kupka (Eds.), Analysis of controlled dynamical systems (pp. 1–10).
Baldi, P., Chauvin, Y., & Hornik, K. (1995). Back-propagation and unsupervised learning in linear networks. In Y. Chauvin & D. E. Rumelhart (Eds.), Back propagation: theory, architectures and applications (pp. 389–432). Hillsdale: Lawrence Earlbaum Associates.
Google Scholar
Ben-Israel, A., & Greville, T. N. E. (2003). Generalized inverse: theory and applications (2nd ed.). New York: Springer.
MATH Google Scholar
Brown, M. W., & Xiang, J. Z. (1998). Recognition memory: neuronal substrates of the judgement of prior occurrence. Progress in Neurobiology, 55, 149–189.
Article Google Scholar
Cottrell, G. W., & Munro, P. (1988). Principal components analysis of images via back propagation. In Proceedings of the society of photo-optical instrumentation engineers (pp. 1070–1077).
Debole, F., & Sebastiani, F. (2005). An analysis of the relative hardness of Reuters-21578 subsets. Journal of the American Society for Information Science and Technology, 56(6), 584–596.
Article Google Scholar
Denis, F., Gilleron, R., Laurent, A., & Tommasi, M. (2003). Text classification and co-training from positive and unlabeled examples. In Proceedings of the ICML 2003 workshop: the continuum from labeled to unlabeled data (pp. 80–87).
Detroja, K. P., Gudi, R. D., & Patwardhan, S. C. (2007). Plant-wide detection and diagnosis using correspondence analysis. Control Engineering Practice, 15(12), 1468–1483.
Article Google Scholar
Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. In Proceedings of the 7th international conference on information and knowledge management (CIKM 98) (pp. 148–155).
Fletcher, R. (1987). Practical methods of optimization. New York: Wiley.
MATH Google Scholar
Fung, G. P. C., Yu, J. X., Lu, H., & Yu, P. S. (2006). Text classification without negative examples revisit. IEEE Transactions on Knowledge and Data Engineering, 18(1), 6–20.
Article Google Scholar
Greville, T. N. E. (1960). Some applications of the pseudoinverse of a matrix. SIAM Review.
Jackson, J. E., & Mudholkar, G. S. (1979). Control procedures for residuals associated with principal component analysis. Technometrics, 21(3), 341–349.
Article MATH Google Scholar
Japkowicz, N. (2001). Supervised versus unsupervised binary-learning by feedforward neural networks. Machine Learning, 42(1-2), 97–122.
Article MATH Google Scholar
Japkowicz, N., Myers, C., & Gluck, M. A. (1995). A novelty detection approach to classification. In Proceedings of the fourteenth joint conference on artificial intelligence (pp. 518–523).
Japkowicz, N., Hanson, S. J., & Gluck, M. A. (2000). Nonlinear autoassociation is not equivalent to PCA. Neural Computing, 12(3), 531–545.
Article Google Scholar
Joachims, T. (2001). A statistical learning model of text classification for support vector machines. In Proceedings of the conference on research and development in information retrieval (SIGIR) (pp. 128–136).
Jolliffe, I. T. (1986). Principal component analysis. New York: Springer.
Google Scholar
Kambhaltla, N., & Leen, T. K. (1994). Fast non-linear dimension reduction. In J. D. Cowan, G. Tesauro & J. Alspector (Eds.), Advances in neural information processing systems (Vol. 6, pp. 152–159).
Kassab, R., & Lamirel, J. C. (2006). A new approach to intelligent text filtering based on novelty detection. In The 17th Australian database conference (pp. 149–156), Tasmania, Australia.
Kassab, R., & Lamirel, J. C. (2007). Towards a synthetic analysis of user’s information need for more effective personalized filtering services. In The 22nd ACM symposium on applied computing special track on information access and retrieval (SAC-IAR) (pp. 852–859).
Kim, J. H., & Beale, G. O. (2002). Fault detection and classification in underwater vehicles using the T ² statistic. AUTOMATIKA—Journal for Control, Measurement, Electronics, Computing and Communications, 43(1-2), 29–37.
MATH Google Scholar
Kohonen, T. (1989). Self organisation and associative memory (3rd ed.). New York: Springer.
Google Scholar
Kohonen, T., & Oja, E. (1976). Fast adaptive formation of orthogonalizing filters and associative memory in recurrent networks of neuron-like elements. Biological Cybernetics, 21, 85–95.
Article MATH MathSciNet Google Scholar
Lee, H., & Cho, S. (2006). Application of LVQ to novelty detection using outlier training data. Pattern Recognition Letters, 27(13), 1572–1579.
Article Google Scholar
Lewis, D. D. (1991). Evaluating text categorization. In Proceedings of speech and natural language workshop (pp. 312–318). San Mateo: Morgan Kaufmann.
Chapter Google Scholar
Li, X., & Liu, B. (2003). Learning to classify texts using positive and unlabeled data. In Proceedings of the 18th international joint conference on artificial intelligence (IJCAI-03) (pp. 587–594).
Liu, B., Dai, Y., Li, X., Lee, W. S., & Yu, P. S. (2003). Building text classifiers using positive and unlabeled examples. In International conference on data mining (pp. 179–186).
Manevitz, L. M., & Yousef, M. (2001). One-class SVMs for document classification. Journal of Machine Learning Research, 2, 139–154.
Article Google Scholar
Markou, M., & Singh, S. (2003a). Novelty detection: a review—part 1: statistical approaches. Signal Processing, 83(12), 2481–2497.
Article MATH Google Scholar
Markou, M., & Singh, S. (2003b). Novelty detection: a review—part 2: neural network based approaches. Signal Processing, 83(12), 2499–2521.
Article MATH Google Scholar
Marsland, S., Nehmzow, U., & Shapiro, J. (2000). A real-time novelty detector for a mobile robot. In European advanced robotics systems masterclass and conference. Salford: AAAI Press.
Google Scholar
Noda, M. T., Makino, I., & Saito, T. (1997). Algebraic methods for computing a generalized inverse. ACM SIGSAM Bulletin, 31(3), 51–52.
Article Google Scholar
Oja, E. (1991). Data compression, feature extraction, and autoassociation in feedforward neural networks. In K. Kohonen, et al. (Eds.), Artificial neural networks (pp. 737–745).
Penrose, R. (1955). A generalized inverse for matrices. Proceedings of the Cambridge Philosophical Society, 52, 406–413.
Article MathSciNet Google Scholar
Raskutti, B., & Kowalczyk, A. (2004). Extreme re-balancing for SVMs: a case study. SIGKDD Explorer Newsletter, 6(1), 60–69.
Article Google Scholar
van Rijsbergen, C. J. (1979). Information retrieval. London: Butterworths.
Google Scholar
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representation by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing: explorations in the microstructures of cognition (pp. 318–362). Cambridge: MIT Press.
Google Scholar
Rüping, S. (2000). mySVM-Manual. Universitat Dortmund, Lehrstuhl Informatik VIII, http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/.
Salton, G. (1971). The SMART retrieval system: experiments in automatic document processing. Englewood Cliffs: Prentice Hall.
Google Scholar
Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 513–523.
Article Google Scholar
Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A. J., & Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Computation, 13(7), 1443–1471.
Article MATH Google Scholar
Schwab, I., Pohl, W., & Koychev, I. (2000). Learning to recommend from positive evidence. In Proceedings of intelligent user interfaces (pp. 241–247). New York: ACM Press.
Chapter Google Scholar
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1–47.
Article Google Scholar
Sirois, S., & Mareshal, D. (2004). An interacting systems model of infant habituation. Journal of Cognitive Neuroscience, 16(8), 1352–1362.
Article Google Scholar
Tax, D. M. J., & Duin, R. P. W. (2001). Uniform object generation for optimizing one-class classifiers. Journal of Machine Learning Research, 2, 155–173.
Article Google Scholar
Valle, S., Li, W., & Qin, S. J. (1999). Selection of the number of principal components: the variance of the reconstruction error criterion with a comparison to other methods. Industrial & Engineering Chemistry Research, 38(11), 4389–4401.
Article Google Scholar
Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.
MATH Google Scholar
Yang, Y., Slattery, S., & Ghani, R. (2002). A study of approaches to hypertext categorization. Journal of Intelligent Information Systems, 18(2-3), 219–241.
Article Google Scholar
Yu, H., Zhai, C., & Han, J. (2003). Text classification from positive and unlabeled documents. In Proceedings of the twelfth international conference on information and knowledge management (pp. 232–239). New York: ACM Press.
Chapter Google Scholar
Yu, H., Han, J., & Chang, K. C. C. (2004). PEBL: Web page classification without negative examples. IEEE Transactions on Knowledge and Data Engineering, 16(1), 70–81.
Article Google Scholar
Žižka, J., Hroza, J., Pouliquen, B., Ignat, C., & Steinberger, R. (2006). The selection of electronic text documents supported by only positive examples. In Proceedings of the 8th international conference on the statistical analysis of textual data (JADT) (pp. 19–21).

Download references

Author information

Authors and Affiliations

LORIA, INRIA Lorraine, Campus Scientifique, BP. 239, 54506, Vandœuvre-lès-Nancy Cedex, France
Randa Kassab & Frédéric Alexandre

Authors

Randa Kassab
View author publications
You can also search for this author in PubMed Google Scholar
Frédéric Alexandre
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Randa Kassab.

Additional information

Editor: Tom Fawcett.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kassab, R., Alexandre, F. Incremental data-driven learning of a novelty detection model for one-class classification with application to high-dimensional noisy data. Mach Learn 74, 191–234 (2009). https://doi.org/10.1007/s10994-008-5092-4

Download citation

Received: 24 February 2007
Revised: 04 November 2008
Accepted: 07 November 2008
Published: 02 December 2008
Issue Date: February 2009
DOI: https://doi.org/10.1007/s10994-008-5092-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Incremental data-driven learning of a novelty detection model for one-class classification with application to high-dimensional noisy data

Abstract

Article PDF

Similar content being viewed by others

Fast One-Class Support Vector Machine for Novelty Detection

Self-supervised Novelty Detection for Continual Learning: A Gradient-Based Approach Boosted by Binary Classification

Collaborative and dynamic kernel discriminant analysis for large-scale problems: applications in multi-class learning and novelty detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Incremental data-driven learning of a novelty detection model for one-class classification with application to high-dimensional noisy data

Abstract

Article PDF

Similar content being viewed by others

Fast One-Class Support Vector Machine for Novelty Detection

Self-supervised Novelty Detection for Continual Learning: A Gradient-Based Approach Boosted by Binary Classification

Collaborative and dynamic kernel discriminant analysis for large-scale problems: applications in multi-class learning and novelty detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation