Abstract
Combining multiple classifiers is of particular interest in multimedia applications. Each modality in multimedia data can be analyzed individually, and combining multiple pieces of evidence can usually improve classification accuracy. However, most combination strategies used in previous studies implement some ad hoc designs, and ignore the varying “expertise” of specialized individual modality classifiers in recognizing a category under particular circumstances. In this paper we present a combination framework called “meta-classification”, which models the problem of combining classifiers as a classification problem itself. We apply the technique on a wearable “experience collection” system, which unobtrusively records the wearer’s conversation, recognizes the face of the dialogue partner, and remember his/her voice. When the system sees the same person’s face or hears the same voice, it can then use a summary of the last conversation to remind the wearer. To identify a person correctly from a mixture of audio and video streams, classification judgments from multiple modalities must be effectively combined. Experimental results show that combining different face recognizers and speaker identification aspects using the meta-classification strategy can dramatically improve classification accuracy, and is more effective than a fixed probability-based strategy. Other work in labeling weather news broadcasts showed that meta-classification is a general framework that can be applied to any application that needs to combine multiple classifiers without much modification.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bush, V.: As we may think. Atlantic Monthly 176(I), 101–108 (1945)
Burges, C.J.C.: A Tutorial on Support Vector Machines for Pattern Recognition. In: Knowledge Discovery and Data Mining, vol. 2(2) (1998)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)
Frischholz, R.W., Dieckmann, U.: BioID: A Multimodal Biomerric Identification System. IEEE Computer, 64–68 (February 2000)
Gray, J.: What next? A few remaining problems in Information Technology. In: ACM Federated Research Computer Conference, Atlanta, GA (May 1999)
Hansen, L., Salamon, P.: Neural Networks Ensembnles. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 993–1001 (1990)
Hashem, S., Schmeiser, B.: Improving Model Accuracy using Optimal Linear Combinations of Trained Neural Networks. IEEE Transactions on Neural Networks 6(3), 792–794 (1995)
Jain, K., Hong, L., Pankanti, S., Bolle, R.: An Identify-Authentication System Using Fingerprints. In: Proceedings of EuroSpeech 1997, pp. 1365–1388. IEEE CS Press, Los Alamitos (1997)
Joachims, T.: Making large-Scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT-Press, Cambridge (1999)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Proceedings of European Conference on Machine Learning (1998)
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On Combining Classifiers. IEEE Transactions on Pattern Analysis and Machine lntelligence 20(3) (March 1998)
Lin, W.-H., Hauptmann, A.: News Video Classification Using SVM-based Multimodal Classifiers and Combination Strategies. In: ACM Multimedia, Juan Les Pins, France (December 1-6, 2002)
Lin, W.-H., Jin, R., Hauptmann, A.G.: Triggering Memories of Conversations using Multimodal Classifiers. In: AAAI-2002 Workshop on Intelligent Situation-Aware Media and Presentation, Edmonton, Alberta, Canada (July 28 ,2002)
Negin, M., Chmielewski, T.A., Salganicoff, M., Camus, T.A., von Seelen, U.M.C., Venetianer, P.L., Zhang, G.G.: An Iris Biometric System for Public and Personal Use. IEEE Computer, 70–75 (February 2000)
Papageorgiou, C., Oren, M., Poggio, T.: A General Framework for Object Detection. In: Proceedings of International Conference on Computer Vision (1998)
Pentland, A., Choudhury, T.: Face Recognition for Smart Environments. IEEE Computer, 50–55 (February 2000)
Rowley, H.A., Baluja, S., Kanade, T.: Neural Network-Based Face Detection. IEEE Transaction on Pateern Analysis and Machine Intelligence, 23–38 (January 1998)
Rowley, H.A., Baluja, S., Kanade, T.: Human Face Detection in Visual Scenes, Carnegie Mellon University, Technical Report CMU-CS-95-158, Pittsburgh, PA
Rowley, H.A., Baluja, S., Kanade, T.: Rotation invariant neural network-based face detection. In: IEEE CVPR, Santa Barbara (1998)
Schmidt, M., Golden, J., Gish, H.: GMM sample statistic log-likelihoods for text-independent speaker recognition. In: Eurospeech-9, Rhodes, Greece, September 1997, pp. 855–858 (1997)
Schölkopf, K.-K., Sung, C., Burges, F., Girosi, P., Niyogi, T.: Comparing Support Vector Machines with Gaussian Kernels to Radial Basis Function Classifiers. IEEE Transactions on Signal Processing 45(11) (November 1997)
Schneiderman, H., Kanade, T.: Probabilistic Modeling of Local Appearance and Spatial Relationships of Object Recognition. In: IEEE CVPR, Santa Barbara (1998)
Turk, M., Pentland, A.: Eigenfaces for recognition. Journal of Cognitive Neuroscience 3(1), 71–86 (1991)
Twiddler, Handykey Corporation, http://www.handykey.com/
Vapnik, V.N.: The Nature of Statistical Learning Theory, 2nd edn. Springer, New York (2000)
Visionics FaceIt Developer Kit, http://www.visionics.com
Satoh, S., Kanade, T.: Name-It: Association of Face and Name Video, tech. report CMU-CS-96-205, ComputerScience Department, Carnegie MellonUniversity (1996)
Kimball, O., Schmidt, M., Gish, H., Waterman, J.: Speaker verification with limited enrollment data. In: ICCSLP-1996, International Conference on Spoken Language Processing, Philadelphia, PA, vol. 2, pp. 967–970 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lin, WH., Hauptmann, A. (2003). Meta-classification: Combining Multimodal Classifiers. In: Zaïane, O.R., Simoff, S.J., Djeraba, C. (eds) Mining Multimedia and Complex Data. PAKDD 2002. Lecture Notes in Computer Science(), vol 2797. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39666-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-39666-6_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20305-6
Online ISBN: 978-3-540-39666-6
eBook Packages: Springer Book Archive