A multi-one-class dynamic classifier for adaptive digitization of document streams

  • Anh Khoi Ngo Ho
  • Véronique Eglin
  • Nicolas Ragot
  • Jean-Yves Ramel
Original Paper
  • 156 Downloads

Abstract

In this paper, we present a new dynamic classifier design based on a set of one-class independent SVM for image data stream categorization. Dynamic or continuous learning and classification has been recently investigated to deal with different situations, like online learning of fixed concepts, learning in non-stationary environments (concept drift) or learning from imbalanced data. Most of solutions are not able to deal at the same time with many of these specificities. Particularly, adding new concepts, merging or splitting concepts are most of the time considered as less important and are consequently less studied, whereas they present a high interest for stream-based document image classification. To deal with that kind of data, we explore a learning and classification scheme based on one-class SVM classifiers that we call mOC-iSVM (multi-one-class incremental SVM). Even if one-class classifiers are suffering from a lack of discriminative power, they have, as a counterpart, a lot of interesting properties coming from their independent modeling. The experiments presented in the paper show the theoretical feasibility on different benchmarks considering addition of new classes. Experiments also demonstrate that the mOC-iSVM model can be efficiently used for tasks dedicated to documents classification (by image quality and image content) in a context of streams, handling many typical scenarii for concepts extension, drift, split and merge.

Keywords

Stream-based document images classification Online document content and quality classification Incremental learning Concept drift One-class SVM 

References

  1. 1.
    Helbing, D.: Thinking ahead: essays on big data, digital revolution, and participatory market society. p 194, Springer (2015)Google Scholar
  2. 2.
    di Lenardo, I, Kaplan, F.: Venice Time Machine : Recreating the density of the past; Digital Humanities 2015, Sydney, June 29–July 3 (2015)Google Scholar
  3. 3.
    Chen, N., Blostein, D.: A survey of document image classification: problem statement, classifier architecture and performance evaluation. Int. J. Doc. Anal. Recognit. 10(1), 1–16 (2007)CrossRefGoogle Scholar
  4. 4.
    Baharudin, B., Lee, L.H., Khan, K.: A review of machine learning algorithms for text-documents classification. J. Adv. Inf. Technol. 1(1), 4–20 (2010)Google Scholar
  5. 5.
    Prudent, Y., Ennaji, A.: A New Learning Algorithm For Incremental Self-Organizing Maps, ESANN 2005, pp. 27–29. Bruges, Belgium (2005)MATHGoogle Scholar
  6. 6.
    Singh, U., Hasan, S.: Survey paper on document classification and classifiers. Int. J. Comput. Sci. Trends Technol. 3(2), 83–87 (2015)Google Scholar
  7. 7.
    G. Cauwenberghs, T. Poggio; Incremental and decremental support vector machine learning. In NIPS 2000, 13 (2001)Google Scholar
  8. 8.
    Karasuyama, M., Takeuchi, I.: Multiple incremental decremental learning of support vector machines. IEEE Trans. Neural Networks 21(7), 1048–1059 (2010)CrossRefGoogle Scholar
  9. 9.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of KDD 2000, ACM Press, New York, USA, pp. 71–80 (2000)Google Scholar
  10. 10.
    Su, M.C., Lee, J., Hsieh, K.L.: A new Artmap-based neural network for incremental learning. Neurocomputing 69(16–18), 2284–2300 (2006)CrossRefGoogle Scholar
  11. 11.
    Lughofer, E.: Flexfis : a robust incremental learning approach for evolving Takagi–Sugeno fuzzy models. IEEE Trans. Fuzzy Syst. 16(6), 1393–1410 (2008)CrossRefGoogle Scholar
  12. 12.
    Minku, L., Li, F., Inoue, H., Yao, X.: Negative Correlation In Incremental Learning. Journal Natural Computing: An International Journal Archive, Kluwer Academic Publishers Hingham, MA, USA 8(2), 289–320 (2009)Google Scholar
  13. 13.
    Song, S., Qiao, X., Chen, P.: Hierarchical text classification incremental learning. Neural Inf. Process. LNCS 5863, 247–258 (2009)Google Scholar
  14. 14.
    M.N. Kapp, R. Sabourin, P. Maupin; Adaptive incremental learning with an ensemble of support vector machines. In: 20th International Conference on Pattern Recognition, pp. 4048–4051 (2010)Google Scholar
  15. 15.
    Laskov, P., Gehl, C., Kruger, S., Muller, K.-R.: Incremental support vector learning: analysis, implementation and applications. J. Mach. Learn. Res. 7, 1909–1936 (2006)MathSciNetMATHGoogle Scholar
  16. 16.
    Shilton, A., Palaniswami, M., Ralph, D., Tsoi, A.C.: Incremental training of support vector machines. IEEE Trans. Neural Netw. 16(1), 114–131 (2005)CrossRefGoogle Scholar
  17. 17.
    Polikar, R., Udpa, L., Udpa, S., Honavar, V.: Learn++: an incremental learning algorithm for supervised neural networks. IEEE Trans. Syst. Man And Cybern. (C) Spec. Issue Knowl. Manag. 31(4), 497–508 (2001)CrossRefGoogle Scholar
  18. 18.
    Connolly, J.-F., Granger, E., Sabourin, R.: Supervised Incremental Learning with the Fuzzy ARTMAP. IAPR Workshop on Artificial Neural Networks in Pattern Recognition, LNAI 5064(2008), pp 66–77 (2008)Google Scholar
  19. 19.
    Carpenter, G.A., Grossberg, S., Markuzon, N., Reynolds, J.H., Rosen, D.B.: Fuzzy Artmap: a neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Trans. Neural Netw. 3, 698–713 (1992)CrossRefGoogle Scholar
  20. 20.
    Shen, F., Hasegawa, O.: Self-organizing Incremental Neural Network and Its Application; Artificial Neural Networks (ICANN’10) (2010)Google Scholar
  21. 21.
    Almaksour, A., Anquetil, E.: Fast incremental learning strategy driven by confusion reject for online handwriting recognition. In: 10th International Conference On Document Analysis And Recognition (ICDAR’09), Spain (2009)Google Scholar
  22. 22.
    Almaksour, A., Anquetil, E., Quiniou, S., Cheriet, M.: Evolving fuzzy classifiers application to incremental learning of handwritten gesture recognition. In: International Conference On Pattern Recognition (ICPR’10), Istanbul, Turkey (2010)Google Scholar
  23. 23.
    Muhlbaier, M., Topalis, A., Polikar, R.: Learn++.NC: combining ensemble of classifiers combined with dynamically weighted consult-and-vote for efficient incremental learning of new classes. IEEE Trans. Neural Netw. 20(1), 152–168 (2009)CrossRefGoogle Scholar
  24. 24.
    Erdem, Z., Polikar, R., Gurgen, F., Yumusak, N.: Ensemble Of SVMs For Incremental Learning; Simulation, 3541, pp. 246–256, Springer (2005)Google Scholar
  25. 25.
    Hamza, H., Belaïd, Y., Belaïd, A., Chaudhuri, B.B.: An end-to-end administrative document analysis system. In Document Analysis Systems (DAS’08), pp. 175–182 (2008)Google Scholar
  26. 26.
    Bouguelia, M.R., Belaïd, Y., Belaïd, A.: Document image and zone classification through incremental learning. In: 20th IEEE International Conference On Image Processing (ICIP’13), pp. 4230–4234 (2013)Google Scholar
  27. 27.
    Ristin, M., Guillaumin, M., Gall, J., Gool, L.V.: Incremental Learning of NCM Forests for Large-Scale Image Classification; Computer Vision and Pattern Recognition (CVPR’14), pp. 3654–3661 (2014)Google Scholar
  28. 28.
    Littlestone, N.: Learning quickly when irrelevant attributes abound: a new linear threshold algorithm. Mach. Learn. 2, 285–318 (1988)Google Scholar
  29. 29.
    Bifet, A.: Adaptive Stream Mining: Pattern Learning And Mining From Evolving Data Streams. IOS Press Inc, Amsterdam (2010). http://www.iospress.nl/book/adaptive-stream-mining-pattern-learning-and-mining-from-evolving-data-streams/
  30. 30.
    Lazarescu, M., Venkatesh, S., Bui, H.: Using multiple windows to track concept drift. Intell. Data Anal. IOS Press Amsterdam 8(1), 29–59 (2004)Google Scholar
  31. 31.
    Alippi, C., Roveri, M.: Just-in-time adaptive classifiers in non-stationary conditions, pp. 1014–1019. IJCNN, IEEE, New York (2007)Google Scholar
  32. 32.
    Alippi, C., Boracchi, G., Roveri, M.: Just in time classifiers: managing the slow-drift case, pp. 114–120. IJCNN, IEEE, New York (2009)Google Scholar
  33. 33.
    R. Klinkenberg; Learning drifting concepts: example selection vs. example weighting. Intell. Data Anal. Special Issue On Incremental Learning Systems Capable Of Dealing With Concept Drift, 8(3) pp. 281–300 (2004)Google Scholar
  34. 34.
    Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams classifiers. In: Proceeding of The 9th ACM SIGKDD International Conference, ACM Press, New York, pp. 226–235 (2003)Google Scholar
  35. 35.
    Kolter, J., Maloof, M.: Dynamic Weighted Majority (DWM): An Ensemble Method For Drifting Concepts; JMLR’08, pp. 2755–2790 (2008)Google Scholar
  36. 36.
    Street, W.N., Kim, Y.S.: A streaming ensemble algorithm (SEA) for large-scale classication. In Proceeding of The 7th International Conference On Knowledge Discovery And Data Mining, ACM Press, pp. 377–382 (2001)Google Scholar
  37. 37.
    Oza, N.C.: Online Ensemble Learning; PhD Thesis, University Of California, Berkeley (2001)Google Scholar
  38. 38.
    Nattee, C., Numao, M.: Geometric method for document understanding and classification using online machine learning. In: Proceeding Of The 6th International Conference On Document Analysis And Recognition, Seattle, USA, pp. 602606 (2001)Google Scholar
  39. 39.
    Salles, T., Rocha, L., Pappa, G.L., Mouro, F., Meira, Jr. W., Gonalves, M.: Temporally-aware algorithms for document classification. In: Proceeding of the 33rd International Conference on Research and development in Information Retrieval (SIGIR’10), ACM, New York, NY, USA, pp. 307–314 (2010)Google Scholar
  40. 40.
    Elwell, R., Polikar, R.: Incremental learning in nonstationary environments with controlled forgetting. In: International Joint Conference On Neural Networks (IJCNN 2009), Atlanta, GA, pp. 771–778 (2009)Google Scholar
  41. 41.
    Elwell, R., Polikar, R.: Incremental learning of concept drift in non-stationary environments. IEEE Trans. Neural Netw. 22(10), 1517–1531 (2011)CrossRefGoogle Scholar
  42. 42.
    Bouillon, M., Anquetil, E., Almaksour, A.: Decremental learning of evolving fuzzy inference systems: application to handwritten gesture recognition. Mach. Learn. Data Min. Pattern Recognit. LNCS 7988, 115–129 (2013)Google Scholar
  43. 43.
    Syed, N., Liu, H., Sung, K.: Incremental learning with support vector machines. In: Proceeding of The Workshop On Support Vector Machines IJCAI’99, Stockholm, Sweden (1999)Google Scholar
  44. 44.
    Rüping, S.: Incremental learning with support vector machines; ICDM01, pp. 641–642 (2001)Google Scholar
  45. 45.
    Sato, J.R., Jane, R., Janaina, M.-M.: Measuring abnormal brains: building normative rules in neuroimaging using one-class support vector machines. Front. Neurosci. 6, 178 (2012). doi:10.3389/fnins.2012.00178 CrossRefGoogle Scholar
  46. 46.
    Ngo-Ho, A-K., Ragot, N., Ramel, J-Y., Eglin, V., Sidere, N.: Document Classification in a Non-stationary Environment: a One-Class SVM Approach; ICDAR13, Washington DC, USA (2013)Google Scholar
  47. 47.
    Ngo-Ho, A.-K., Ragot, N., Ramel, J.-Y., Eglin, V., Sidere, N.: Multi one-class incremental SVM for both stationary and non-stationary environment. In: 16th Confrence Francophone sur l’Apprentissage Automatique. Saint-Etienne, France (2014)Google Scholar
  48. 48.
    Ngo-Ho, A.-K., Ragot, N., Ramel, J.-Y., Eglin, V., Sidere, N.: Multi one-class incremental svm for document stream digitization. In: 12th IAPR International Workshop on Document Analysis Systems. Santorini, Greece (2016)Google Scholar
  49. 49.
    Scolkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution; technical report, microsoft research, MSR-TR-99-87 (1999)Google Scholar
  50. 50.
    Kaynak, C.: Methods of combining multiple classifiers and their applications to handwritten digit recognition; Msc. Thesis, Institute Of Graduate Studies In Science And Engineering, Bogazici University (1995)Google Scholar
  51. 51.
    Vinsonneau, E., Domenger, J-P., Cherif, A.: Mesure de la Netteté Sur Une image Seule Dans Des Documents Anciens, CIFED 2014, France (2014)Google Scholar
  52. 52.
    Tong, H., Li, M., Zhang, H., Zhang, C.: Blur detection for digital images using wavelet transform. In: IEEE International Conference on Multimedia and Expo. (ICME04), vol. 1, IEEE, pp. 17–20 (2004)Google Scholar
  53. 53.
    Zhuo, S., Sim, T.: Defocus map estimation from a single image. Pattern Recogniti. 44(9), 1852–1858 (2011)CrossRefGoogle Scholar
  54. 54.
    Lelegard, L., Bredif, M., Vallet, B., Boldo, D.: Motion Blur Detection in Aerial Images Shot with Channel-Dependent Exposure Time; International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences (IAPRS), vol. 38, part 3A, Saint-Mand, France, pp. 180–185 (2010)Google Scholar
  55. 55.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)CrossRefMATHGoogle Scholar
  56. 56.
    Chen, Y., Zhou, X.S., Huang, T.: One-class SVM for learning in image retrieval. In: IEEE International Conference on Image Processing (ICIP’2001), pp. 34–37 (2001)Google Scholar
  57. 57.
    Brodersen, K.H., Ong, C.S., Stephan, K.E., Buhmann, J.M.: The balanced accuracy and its posterior distribution; ICPR’10, pp. 3121–3124 (2010)Google Scholar
  58. 58.
    Chang, C.-C., Lin, C.-J.: LibSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)CrossRefGoogle Scholar
  59. 59.
    Zhou, Z.-H., Chen, Z.-Q.: Hybrid decision tree. Knowl. Based Syst. 15(8), 515–528 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  1. 1.CNRS INSA-Lyon LIRIS - UMR 5205 CNRSUniversité de LyonLyonFrance
  2. 2.Laboratoire Informatique - LI EA 6300Université François Rabelais ToursToursFrance

Personalised recommendations