Meta-classification: Combining Multimodal Classifiers

  • Wei-Hao Lin
  • Alexander Hauptmann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2797)


Combining multiple classifiers is of particular interest in multimedia applications. Each modality in multimedia data can be analyzed individually, and combining multiple pieces of evidence can usually improve classification accuracy. However, most combination strategies used in previous studies implement some ad hoc designs, and ignore the varying “expertise” of specialized individual modality classifiers in recognizing a category under particular circumstances. In this paper we present a combination framework called “meta-classification”, which models the problem of combining classifiers as a classification problem itself. We apply the technique on a wearable “experience collection” system, which unobtrusively records the wearer’s conversation, recognizes the face of the dialogue partner, and remember his/her voice. When the system sees the same person’s face or hears the same voice, it can then use a summary of the last conversation to remind the wearer. To identify a person correctly from a mixture of audio and video streams, classification judgments from multiple modalities must be effectively combined. Experimental results show that combining different face recognizers and speaker identification aspects using the meta-classification strategy can dramatically improve classification accuracy, and is more effective than a fixed probability-based strategy. Other work in labeling weather news broadcasts showed that meta-classification is a general framework that can be applied to any application that needs to combine multiple classifiers without much modification.


Feature Vector Face Recognition Gaussian Mixture Model Combination Strategy Multiple Classifier 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bush, V.: As we may think. Atlantic Monthly 176(I), 101–108 (1945)Google Scholar
  2. 2.
    Burges, C.J.C.: A Tutorial on Support Vector Machines for Pattern Recognition. In: Knowledge Discovery and Data Mining, vol. 2(2) (1998)Google Scholar
  3. 3.
    Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)Google Scholar
  4. 4.
    Frischholz, R.W., Dieckmann, U.: BioID: A Multimodal Biomerric Identification System. IEEE Computer, 64–68 (February 2000)Google Scholar
  5. 5.
    Gray, J.: What next? A few remaining problems in Information Technology. In: ACM Federated Research Computer Conference, Atlanta, GA (May 1999)Google Scholar
  6. 6.
    Hansen, L., Salamon, P.: Neural Networks Ensembnles. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 993–1001 (1990)CrossRefGoogle Scholar
  7. 7.
    Hashem, S., Schmeiser, B.: Improving Model Accuracy using Optimal Linear Combinations of Trained Neural Networks. IEEE Transactions on Neural Networks 6(3), 792–794 (1995)CrossRefGoogle Scholar
  8. 8.
    Jain, K., Hong, L., Pankanti, S., Bolle, R.: An Identify-Authentication System Using Fingerprints. In: Proceedings of EuroSpeech 1997, pp. 1365–1388. IEEE CS Press, Los Alamitos (1997)Google Scholar
  9. 9.
    Joachims, T.: Making large-Scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT-Press, Cambridge (1999)Google Scholar
  10. 10.
    Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Proceedings of European Conference on Machine Learning (1998)Google Scholar
  11. 11.
    Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On Combining Classifiers. IEEE Transactions on Pattern Analysis and Machine lntelligence 20(3) (March 1998)Google Scholar
  12. 12.
    Lin, W.-H., Hauptmann, A.: News Video Classification Using SVM-based Multimodal Classifiers and Combination Strategies. In: ACM Multimedia, Juan Les Pins, France (December 1-6, 2002)Google Scholar
  13. 13.
    Lin, W.-H., Jin, R., Hauptmann, A.G.: Triggering Memories of Conversations using Multimodal Classifiers. In: AAAI-2002 Workshop on Intelligent Situation-Aware Media and Presentation, Edmonton, Alberta, Canada (July 28 ,2002)Google Scholar
  14. 14.
    Negin, M., Chmielewski, T.A., Salganicoff, M., Camus, T.A., von Seelen, U.M.C., Venetianer, P.L., Zhang, G.G.: An Iris Biometric System for Public and Personal Use. IEEE Computer, 70–75 (February 2000)Google Scholar
  15. 15.
    Papageorgiou, C., Oren, M., Poggio, T.: A General Framework for Object Detection. In: Proceedings of International Conference on Computer Vision (1998)Google Scholar
  16. 16.
    Pentland, A., Choudhury, T.: Face Recognition for Smart Environments. IEEE Computer, 50–55 (February 2000)Google Scholar
  17. 17.
    Rowley, H.A., Baluja, S., Kanade, T.: Neural Network-Based Face Detection. IEEE Transaction on Pateern Analysis and Machine Intelligence, 23–38 (January 1998)Google Scholar
  18. 18.
    Rowley, H.A., Baluja, S., Kanade, T.: Human Face Detection in Visual Scenes, Carnegie Mellon University, Technical Report CMU-CS-95-158, Pittsburgh, PAGoogle Scholar
  19. 19.
    Rowley, H.A., Baluja, S., Kanade, T.: Rotation invariant neural network-based face detection. In: IEEE CVPR, Santa Barbara (1998)Google Scholar
  20. 20.
    Schmidt, M., Golden, J., Gish, H.: GMM sample statistic log-likelihoods for text-independent speaker recognition. In: Eurospeech-9, Rhodes, Greece, September 1997, pp. 855–858 (1997)Google Scholar
  21. 21.
    Schölkopf, K.-K., Sung, C., Burges, F., Girosi, P., Niyogi, T.: Comparing Support Vector Machines with Gaussian Kernels to Radial Basis Function Classifiers. IEEE Transactions on Signal Processing 45(11) (November 1997)Google Scholar
  22. 22.
    Schneiderman, H., Kanade, T.: Probabilistic Modeling of Local Appearance and Spatial Relationships of Object Recognition. In: IEEE CVPR, Santa Barbara (1998)Google Scholar
  23. 23.
    Turk, M., Pentland, A.: Eigenfaces for recognition. Journal of Cognitive Neuroscience 3(1), 71–86 (1991)CrossRefGoogle Scholar
  24. 24.
    Twiddler, Handykey Corporation,
  25. 25.
    Vapnik, V.N.: The Nature of Statistical Learning Theory, 2nd edn. Springer, New York (2000)zbMATHGoogle Scholar
  26. 26.
    Visionics FaceIt Developer Kit,
  27. 27.
    Satoh, S., Kanade, T.: Name-It: Association of Face and Name Video, tech. report CMU-CS-96-205, ComputerScience Department, Carnegie MellonUniversity (1996)Google Scholar
  28. 28.
    Kimball, O., Schmidt, M., Gish, H., Waterman, J.: Speaker verification with limited enrollment data. In: ICCSLP-1996, International Conference on Spoken Language Processing, Philadelphia, PA, vol. 2, pp. 967–970 (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Wei-Hao Lin
    • 1
  • Alexander Hauptmann
    • 1
  1. 1.Language Technologies Institute, School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations