Abstract
The intuition that different text classifiers behave in qualitatively different ways has long motivated attempts to build a better metaclassifier via some combination of classifiers. We introduce a probabilistic method for combining classifiers that considers the context-sensitive reliabilities of contributing classifiers. The method harnesses reliability indicators—variables that provide signals about the performance of classifiers in different situations. We provide background, present procedures for building metaclassifiers that take into consideration both reliability indicators and classifier outputs, and review a set of comparative studies undertaken to evaluate the methodology.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Al-Kofahi K, Tyrrell A, Vacher A, Travers T and Jackson P (2001) Combining multiple classifiers for text categorization. In: CIKM '01, Proceedings of the 10th ACM Conference on Information and Knowledge Management, pp. 97–104.
Bartell BT, Cottrell GW and Belew RK (1994) Automatic combination of multiple ranked retrieval systems. In: SIGIR '94, Proceedings of the 17th Annual International ACM Conference on Research and Development in Information Retrieval, pp. 173–181.
Belkin N, Cool C, Croft W and Callan J (1993) The effect of multiple query representations on information retrieval system performance. In: SIGIR '93, Proceedings of the 16th Annual International ACM Conference on Research and Development in Information Retrieval, pp. 339–346.
Bennett PN, Dumais ST and Horvitz E (2002) Probabilistic combination of text classifiers using reliability indicators: Models and results. In: SIGIR '02, Proceedings of the 25th Annual International ACM Conference on Research and Development in Information Retrieval, pp. 207–214.
Bennett PN, Dumais ST and Horvitz E (2003) Inductive transfer for text classification using generalized reliability indicators. In: Working Notes of ICML'03 (The 20th International Conference on Machine Learning), Workshop on The Continuum from Labeled to Unlabeled Data, pp. 72–79.
Chickering D, Heckerman D and Meek C (1997) A Bayesian approach to learning Bayesian networks with local structure. In: UAI '97, Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence, pp. 80–89.
Dietterich T (2000) Ensemble methods. In: MCS '00, Proceedings of the 1st International Workshop on Multiple Classifier Systems, Springer, pp. 1–15.
Duda R, Hart P and Stork D (2001) Pattern Classification. John Wiley & Sons, Inc., New York, NY.
Dumais ST and Chen H (2000) Hierarchical classification of web content. In: SIGIR '00, Proceedings of the 23rd Annual International ACM Conference on Research and Development in Information Retrieval, pp. 256–263.
Dumais ST, Platt J, Heckerman D and Sahami M (1998) Inductive learning algorithms and representations for text categorization. In: CIKM '98, Proceedings of the 7th ACM Conference on Information and Knowledge Management, pp. 148–155.
Gama J (1998a) Combining classifiers by constructive induction. In: ECML '98, Proceedings of the 10th European Conference on Machine Learning, pp. 178–189.
Gama J (1998b) Local cascade generalization. In: ICML '98, Proceedings of the 15th International Conference on Machine Learning, pp. 206–214.
Heckerman D, Chickering D, Meek C, Rounthwaite R and Kadie C (2000) Dependency networks for inference, collaborative filtering, and data visualization. Journal of Machine Learning Research, 1:49–75.
Hersh W, Buckley C, Leone T and Hickam D (1994) OHSUMED: An interactive retrieval evaluation and new large test collection for research. In: SIGIR '94, Proceedings of the 17th Annual International ACM Conference on Research and Development in Information Retrieval, pp. 192–201.
Horvitz E, Breese J and Henrion M (1988) Decision theory in expert systems and artificial intelligence. International Journal of Approximate Reasoning, Special Issue on Uncertain Reasoning, 2:247–302.
Horvitz E, Jacobs A and Hovel D (1999) Attention-sensitive alerting. In: UAI '99, Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, pp. 305–313.
Hull D, Pedersen J and Schuetze H (1996) Method combination for document filtering. In: SIGIR '96, Proceedings of the 19th Annual International ACM Conference on Research and Development in Information Retrieval, pp. 279–287.
Joachims T (1998) Text categorization with support vector machines: Learning with many relevant features. In: ECML '98, Proceedings of the 10th European Conference on Machine Learning, pp. 137–142.
Kargupta H and Chan P, Eds. (2000). Advances in Distributed and Parallel Knowledge Discovery. Cambridge, Massachusetts: AAAI Press/MIT Press.
Katzer J, McGill M, Tessier J, Frakes W and DasGupta P (1982) A study of the overlap among document representations. Information Technology: Research and Development, 1:261–274.
Kessler B, Nunberg G and Schütze H (1997) Automatic detection of text genre. In: ACL '97, Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, pp. 32–38.
Klein LA (1999) Sensor and data fusion concepts and applications. 2nd edition. Society of Photo-Optical Instrumentation Engineers.
Lam Wand Lai KY (2001) Ameta-learning approach for text categorization. In: SIGIR '01, Proceedings of the 24th Annual International ACM Conference on Research and Development in Information Retrieval, pp. 303–309.
Larkey LS and Croft WB (1996) Combining classifiers in text categorization. In: SIGIR '96, Proceedings of the 19th Annual International ACM Conference on Research and Development in Information Retrieval, pp. 289–297.
Lewis DD (1995) A sequential algorithm for training text classifiers: Corrigendum and additional data. ACM SIGIR Forum, 29(2):13–19.
Lewis DD (1997) Reuters-21578, distribution 1.0. http://www.daviddlewis.com/resources/testcollections-/reuters21578 (visited 2002).
Lewis DD and Gale WA (1994) A sequential algorithm for training text classifiers. In: SIGIR '94, Proceedings of the 17th Annual International ACM Conference on Research and Development in Information Retrieval, pp. 3–12.
Lewis DD, Schapire RE, Callan JP and Papka R (1996) Training algorithms for linear text classifiers. In: SIGIR '96, Proceedings of the 19th Annual International ACM Conference on Research and Development in Information Retrieval, pp. 298–306.
Li Y and Jain A (1998) Classification of text documents. The Computer Journal, 41(8):537–546.
McCallum A and Nigam K (1998) A comparison of event models for naive bayes text classification. In: Working Notes of AAAI '98 (The 15th National Conference on Artificial Intelligence), Workshop on Learning for Text Categorization, pp. 41–48.
Nigam K, Lafferty J and McCallum A (1999) Using maximum entropy for text classification. In: Working Notes of IJCAI '99 (The 16th International Joint Conference on Artificial Intelligence), Workshop on Machine Learning for Information Filtering, pp. 61–67.
Platt JC (1999a) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges C and Smola A, Eds. Advances in Kernel Methods—Support Vector Learning. MIT Press, pp. 185–208.
Platt JC (1999b) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola AJ, Bartlett P, Scholkopf B and Schuurmans D, Eds. Advances in Large Margin Classifiers. MIT Press, pp. 61–74.
Provost F and Fawcett T (2001) Robust classification for imprecise environments. Machine Learning, 42:203–231.
Rajashekar T and Croft W (1995) Combining automatic and manual index representations in probabilistic retrieval. Journal of the American Society for Information Science, 6(4):272–283.
Sahami M, Dumais S, Heckerman D and Horvitz E (1998) A bayesian approach to filtering junk e-mail. In: Working Notes of AAAI '98 (The 15th National Conference on Artificial Intelligence), Workshop on Learning for Text Categorization, pp. 55–62.
Schapire RE and Singer Y (2000) BoosTexter: Aboosting-based system for text categorization. Machine Learning, 39:135–168.
Sebastiani F (2002) Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1–47.
Shaw J and Fox E (1995) Combination of multiple searches. In: TREC-3, Proceedings of the 3rd Text Retrieval Conference, pp. 105–108.
Ting K and Witten I (1999) Issues in stacked generalization. Journal of Artificial Intelligence Research, 10:271–289.
Toyama K and Horvitz E (2000) Bayesian modality fusion: Probabilistic integration of multiple vision algorithms for head tracking. In: ACCV 2000, Proceedings of the 4th Asian Conference on Computer Vision.
van Rijsbergen CJ (1979) Information Retrieval. Butterworths, London.
Weiss S, Apte C, Damerau F, Johnson D, Oles F, Goetz T and Hampp T (1999) Maximizing text-mining performance. IEEE Intelligent Systems, 14(4):63–69.
WinMine Toolkit v1.0, http://research.microsoft.com/~dmax/WinMine/ContactInfo.html (visited 2002). Microsoft Corporation.
Wolpert DH (1992) Stacked generalization. Neural Networks, 5:241–259.
Yang Y, Ault T and Pierce T (2000) Combining multiple learning strategies for effective cross validation. In: ICML '00, Proceedings of the 17th International Conference on Machine Learning, pp. 1167–1182.
Yang Y and Liu X (1999) A re-examination of text categorization methods. In: SIGIR '99, Proceedings of the 22nd Annual International ACM Conference on Research and Development in Information Retrieval, pp. 42–49.
Zhang T and Oles FJ (2001) Text categorization based on regularized linear classification methods. Information Retrieval, 4:5–31.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Bennett, P.N., Dumais, S.T. & Horvitz, E. The Combination of Text Classifiers Using Reliability Indicators. Information Retrieval 8, 67–100 (2005). https://doi.org/10.1023/B:INRT.0000048491.59134.94
Issue Date:
DOI: https://doi.org/10.1023/B:INRT.0000048491.59134.94