Abstract
Most conventional learning algorithms require both positive and negative training data for achieving accurate classification results. However, the problem of learning classifiers from only positive data arises in many applications where negative data are too costly, difficult to obtain, or not available at all. This paper describes a new machine learning approach, called ILoNDF (Incremental data-driven Learning of Novelty Detector Filter). The approach is inspired by novelty detection theory and its learning method, which typically requires only examples from one class to learn a model. One advantage of ILoNDF is the ability of its generative learning to capture the intrinsic characteristics of the training data by continuously integrating the information relating to the relative frequencies of the features of training data and their co-occurrence dependencies. This makes ILoNDF rather stable and less sensitive to noisy features which may be present in the representation of the positive data. In addition, ILoNDF does not require extensive computational resources since it operates on-line without repeated training, and no parameters need to be tuned. In this study we mainly focus on the robustness of ILoNDF in dealing with high-dimensional noisy data and we investigate the variation of its performance depending on the amount of data available for training. To make our study comparable to previous studies, we investigate four common methods: PCA residuals, Hotelling’s T 2 test, an auto-associative neural network, and a one-class version of the SVM classifier (lately a favored method for one-class classification). Experiments are conducted on two real-world text corpora: Reuters and WebKB. Results show that ILoNDF tends to be more robust, is less affected by initial settings, and consistently outperforms the other methods.
Article PDF
Similar content being viewed by others
References
Aeyels, D. (1990). On the dynamic behaviour of the novelty detector and the novelty filter. In B. Bonnard, B. Bride, J. P. Gauthier & I. Kupka (Eds.), Analysis of controlled dynamical systems (pp. 1–10).
Baldi, P., Chauvin, Y., & Hornik, K. (1995). Back-propagation and unsupervised learning in linear networks. In Y. Chauvin & D. E. Rumelhart (Eds.), Back propagation: theory, architectures and applications (pp. 389–432). Hillsdale: Lawrence Earlbaum Associates.
Ben-Israel, A., & Greville, T. N. E. (2003). Generalized inverse: theory and applications (2nd ed.). New York: Springer.
Brown, M. W., & Xiang, J. Z. (1998). Recognition memory: neuronal substrates of the judgement of prior occurrence. Progress in Neurobiology, 55, 149–189.
Cottrell, G. W., & Munro, P. (1988). Principal components analysis of images via back propagation. In Proceedings of the society of photo-optical instrumentation engineers (pp. 1070–1077).
Debole, F., & Sebastiani, F. (2005). An analysis of the relative hardness of Reuters-21578 subsets. Journal of the American Society for Information Science and Technology, 56(6), 584–596.
Denis, F., Gilleron, R., Laurent, A., & Tommasi, M. (2003). Text classification and co-training from positive and unlabeled examples. In Proceedings of the ICML 2003 workshop: the continuum from labeled to unlabeled data (pp. 80–87).
Detroja, K. P., Gudi, R. D., & Patwardhan, S. C. (2007). Plant-wide detection and diagnosis using correspondence analysis. Control Engineering Practice, 15(12), 1468–1483.
Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. In Proceedings of the 7th international conference on information and knowledge management (CIKM 98) (pp. 148–155).
Fletcher, R. (1987). Practical methods of optimization. New York: Wiley.
Fung, G. P. C., Yu, J. X., Lu, H., & Yu, P. S. (2006). Text classification without negative examples revisit. IEEE Transactions on Knowledge and Data Engineering, 18(1), 6–20.
Greville, T. N. E. (1960). Some applications of the pseudoinverse of a matrix. SIAM Review.
Jackson, J. E., & Mudholkar, G. S. (1979). Control procedures for residuals associated with principal component analysis. Technometrics, 21(3), 341–349.
Japkowicz, N. (2001). Supervised versus unsupervised binary-learning by feedforward neural networks. Machine Learning, 42(1-2), 97–122.
Japkowicz, N., Myers, C., & Gluck, M. A. (1995). A novelty detection approach to classification. In Proceedings of the fourteenth joint conference on artificial intelligence (pp. 518–523).
Japkowicz, N., Hanson, S. J., & Gluck, M. A. (2000). Nonlinear autoassociation is not equivalent to PCA. Neural Computing, 12(3), 531–545.
Joachims, T. (2001). A statistical learning model of text classification for support vector machines. In Proceedings of the conference on research and development in information retrieval (SIGIR) (pp. 128–136).
Jolliffe, I. T. (1986). Principal component analysis. New York: Springer.
Kambhaltla, N., & Leen, T. K. (1994). Fast non-linear dimension reduction. In J. D. Cowan, G. Tesauro & J. Alspector (Eds.), Advances in neural information processing systems (Vol. 6, pp. 152–159).
Kassab, R., & Lamirel, J. C. (2006). A new approach to intelligent text filtering based on novelty detection. In The 17th Australian database conference (pp. 149–156), Tasmania, Australia.
Kassab, R., & Lamirel, J. C. (2007). Towards a synthetic analysis of user’s information need for more effective personalized filtering services. In The 22nd ACM symposium on applied computing special track on information access and retrieval (SAC-IAR) (pp. 852–859).
Kim, J. H., & Beale, G. O. (2002). Fault detection and classification in underwater vehicles using the T 2 statistic. AUTOMATIKA—Journal for Control, Measurement, Electronics, Computing and Communications, 43(1-2), 29–37.
Kohonen, T. (1989). Self organisation and associative memory (3rd ed.). New York: Springer.
Kohonen, T., & Oja, E. (1976). Fast adaptive formation of orthogonalizing filters and associative memory in recurrent networks of neuron-like elements. Biological Cybernetics, 21, 85–95.
Lee, H., & Cho, S. (2006). Application of LVQ to novelty detection using outlier training data. Pattern Recognition Letters, 27(13), 1572–1579.
Lewis, D. D. (1991). Evaluating text categorization. In Proceedings of speech and natural language workshop (pp. 312–318). San Mateo: Morgan Kaufmann.
Li, X., & Liu, B. (2003). Learning to classify texts using positive and unlabeled data. In Proceedings of the 18th international joint conference on artificial intelligence (IJCAI-03) (pp. 587–594).
Liu, B., Dai, Y., Li, X., Lee, W. S., & Yu, P. S. (2003). Building text classifiers using positive and unlabeled examples. In International conference on data mining (pp. 179–186).
Manevitz, L. M., & Yousef, M. (2001). One-class SVMs for document classification. Journal of Machine Learning Research, 2, 139–154.
Markou, M., & Singh, S. (2003a). Novelty detection: a review—part 1: statistical approaches. Signal Processing, 83(12), 2481–2497.
Markou, M., & Singh, S. (2003b). Novelty detection: a review—part 2: neural network based approaches. Signal Processing, 83(12), 2499–2521.
Marsland, S., Nehmzow, U., & Shapiro, J. (2000). A real-time novelty detector for a mobile robot. In European advanced robotics systems masterclass and conference. Salford: AAAI Press.
Noda, M. T., Makino, I., & Saito, T. (1997). Algebraic methods for computing a generalized inverse. ACM SIGSAM Bulletin, 31(3), 51–52.
Oja, E. (1991). Data compression, feature extraction, and autoassociation in feedforward neural networks. In K. Kohonen, et al. (Eds.), Artificial neural networks (pp. 737–745).
Penrose, R. (1955). A generalized inverse for matrices. Proceedings of the Cambridge Philosophical Society, 52, 406–413.
Raskutti, B., & Kowalczyk, A. (2004). Extreme re-balancing for SVMs: a case study. SIGKDD Explorer Newsletter, 6(1), 60–69.
van Rijsbergen, C. J. (1979). Information retrieval. London: Butterworths.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representation by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing: explorations in the microstructures of cognition (pp. 318–362). Cambridge: MIT Press.
Rüping, S. (2000). mySVM-Manual. Universitat Dortmund, Lehrstuhl Informatik VIII, http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/.
Salton, G. (1971). The SMART retrieval system: experiments in automatic document processing. Englewood Cliffs: Prentice Hall.
Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 513–523.
Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A. J., & Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Computation, 13(7), 1443–1471.
Schwab, I., Pohl, W., & Koychev, I. (2000). Learning to recommend from positive evidence. In Proceedings of intelligent user interfaces (pp. 241–247). New York: ACM Press.
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1–47.
Sirois, S., & Mareshal, D. (2004). An interacting systems model of infant habituation. Journal of Cognitive Neuroscience, 16(8), 1352–1362.
Tax, D. M. J., & Duin, R. P. W. (2001). Uniform object generation for optimizing one-class classifiers. Journal of Machine Learning Research, 2, 155–173.
Valle, S., Li, W., & Qin, S. J. (1999). Selection of the number of principal components: the variance of the reconstruction error criterion with a comparison to other methods. Industrial & Engineering Chemistry Research, 38(11), 4389–4401.
Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.
Yang, Y., Slattery, S., & Ghani, R. (2002). A study of approaches to hypertext categorization. Journal of Intelligent Information Systems, 18(2-3), 219–241.
Yu, H., Zhai, C., & Han, J. (2003). Text classification from positive and unlabeled documents. In Proceedings of the twelfth international conference on information and knowledge management (pp. 232–239). New York: ACM Press.
Yu, H., Han, J., & Chang, K. C. C. (2004). PEBL: Web page classification without negative examples. IEEE Transactions on Knowledge and Data Engineering, 16(1), 70–81.
Žižka, J., Hroza, J., Pouliquen, B., Ignat, C., & Steinberger, R. (2006). The selection of electronic text documents supported by only positive examples. In Proceedings of the 8th international conference on the statistical analysis of textual data (JADT) (pp. 19–21).
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Tom Fawcett.
Rights and permissions
About this article
Cite this article
Kassab, R., Alexandre, F. Incremental data-driven learning of a novelty detection model for one-class classification with application to high-dimensional noisy data. Mach Learn 74, 191–234 (2009). https://doi.org/10.1007/s10994-008-5092-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-008-5092-4