A Cascade Multiple Classifier System for Document Categorization

Xu, Jian-Wu; Singh, Vartika; Govindaraju, Venu; Neogi, Depankar

doi:10.1007/978-3-642-02326-2_46

A Cascade Multiple Classifier System for Document Categorization

Jian-Wu Xu¹⁹,
Vartika Singh²⁰,
Venu Govindaraju²⁰ &
…
Depankar Neogi¹⁹

Conference paper

2498 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5519))

Abstract

A novel cascade multiple classifier system (MCS) for document image classification is presented in the paper. It consists of two different classifiers with different feature sets. The proceeding classifier uses image features, learns physical representation of the document, and outputs a set of candidate class labels for the second classifier. The succeeding classifier is a hierarchical classification model based on textual features. The candidate labels set from the first classifier provides subtrees for the second classifier to search in the hierarchical tree and derive a final classification decision. Hence, it reduces the computational complexity and improves classification accuracy for the second classifier. We test the proposed cascade MCS on a large scale set of tax document classification. The experimental results show improvement of classification performance over individual classifiers.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chen, N., Blostein, D.: A survey of document image classification: problem statement, classifier architecture and performance evaluation. Int. J. Doc. Anal. Recognit. 10, 1–16 (2007)
Article Google Scholar
Héroux, P., Diana, S., Ribert, A., Trupin, E.: Classification method study for automatic form class identification. In: Proc. Intl. Conf. on Pattern Recognition (ICPR), Brisbane, Australia, pp. 926–929 (1998)
Google Scholar
Wenzel, C., Baumann, S., Jäger, T.: Advances in document classification by voting of competitive approaches. In: Proc. of Intl. Asso. for Pattern Recognition Workshop on Doc. Anal. Syst. (DAS), Malvern, USA, Octber 1996, pp. 352–372 (1996)
Google Scholar
Alpaydin, E., Kaynak, C.: Cascading classifiers. Kybernetika 34, 369–374 (1998)
MATH Google Scholar
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)
Article Google Scholar
Xu, L., Krzyzak, A., Suen, C.Y.: Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans. Syst., Man and Cybern. 22(3), 418–435 (1992)
Article Google Scholar
Kittler, J., Matas, G., Jonsson, K., Sánchez, M.: Combining evidence in personal identity verification systems. Pattern Recog. Lett. 18(9), 845–852 (1997)
Article Google Scholar
Huang, Y.S., Suen, C.Y.: A method of combining multiple experts for the recognition of unconstrained handwritten numerals. IEEE Trans. Pattern Anal. Mach. Intell. 17(1) (1995)
Google Scholar
Woods, K., Kegelmeyer, W.P., Bowyer, K.: Combination of multiple classifiers using local accuracy estimates. IEEE Trans. Pattern Anal. Mach. Intell. 19(4), 405–410 (1997)
Article Google Scholar
Larkey, L.S., Croft, W.B.: Combining classifiers in text categorization. In: Proc. of ACM SIGIR, pp. 289–297 (1996)
Google Scholar
Hull, D., Pedersen, J., Schuetze, H.: Method combination for document filtering. In: Proc. of ACM SIGIR, pp. 279–287 (1996)
Google Scholar
Yang, Y., Ault, T., Pierce, T.: Combining multiple learning strategies for effective cross validation. In: Proc. Intl. Conf. on Mach. Learn. (ICML), pp. 1167–1182 (2000)
Google Scholar
Bennett, P.N., Dumais, S., Horvitz, E.: Probabilistic combination of text classifier using reliability indicators: Models and results. In: Proc. of ACM SIGIR, pp. 207–214 (2002)
Google Scholar
Sarkar, P.: Image classification: classifying distributions of visual features. In: Proc. Intl. Conf. on Pattern Recognition (ICPR), Hong Kong, pp. 472–475 (2006)
Google Scholar
Shin, C., Doermann, D., Rosenfeld, A.: Classification of document pages using structure-based features. Int. J. Doc. Anal. Recognit. 3(4), 232–247 (2001)
Article Google Scholar
Xu, J., Singh, V., Govindaraju, V., Neogi, D.: A hierarchical classification model for document categorization. In: Proc. Intl. Conf. on Doc. Anal. Recognit (ICDAR), Barcelona, Spain (July 2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Copanion Inc., Andover, MA 01810, USA
Jian-Wu Xu & Depankar Neogi
Center for Unified Biometrics and Sensors, University at Buffalo, USA
Vartika Singh & Venu Govindaraju

Authors

Jian-Wu Xu
View author publications
You can also search for this author in PubMed Google Scholar
Vartika Singh
View author publications
You can also search for this author in PubMed Google Scholar
Venu Govindaraju
View author publications
You can also search for this author in PubMed Google Scholar
Depankar Neogi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Electrical and Computer Engineering, University of Iceland, Hjardarhagi 2-6, 107, Reykjavik, Iceland
Jón Atli Benediktsson
Speech and Signal Processing, Guildford, University of Surrey, Centre for Vision, GU2 7XH, Surrey, United Kingdom
Josef Kittler
Department of Electrical and Electronic Engineering, Piazza d’Armi, University of Cagliari, 09123, Cagliari, Italy
Fabio Roli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, JW., Singh, V., Govindaraju, V., Neogi, D. (2009). A Cascade Multiple Classifier System for Document Categorization. In: Benediktsson, J.A., Kittler, J., Roli, F. (eds) Multiple Classifier Systems. MCS 2009. Lecture Notes in Computer Science, vol 5519. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02326-2_46

Download citation

DOI: https://doi.org/10.1007/978-3-642-02326-2_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02325-5
Online ISBN: 978-3-642-02326-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics