Abstract
Typical cue integration techniques work by combining estimates produced by computations associated with each visual cue. Most of these computations are iterative, leading to partial results that are available upon each iteration, culminating in complete results when the algorithm finally terminates. Combining partial results upon each iteration would be the preferred strategy for cue integration, as early cue integration strategies are inherently more stable and more efficient. Surprisingly, existing cue integration techniques cannot correctly use partial results, but must wait for all of the cue computations to finish. This is because the intrinsic error in partial results, which arises entirely from the fact that the algorithm has not yet terminated, is not represented. While cue integration methods do exist which attempt to use partial results (such as one based on an iterated extended Kalman Filter), they make critical errors.
I address this limitation with the development of a probabilistic model of errors in estimates from partial results, which represents the error that remains in iterative algorithms prior to their completion. This enables existing cue integration frameworks to draw upon partial results correctly. Results are presented on using such a model for tracking faces using feature alignment, contours, and optical flow. They indicate that this framework improves accuracy, efficiency, and robustness over one that uses complete results.
The eventual goal of this line of research is the creation of a decision-theoretic meta-reasoning framework for cue integration—a vital mechanism for any system with real-time deadlines and variable computational demands. This framework will provide a means to decide how to best spend computational resources on each cue, based on how much it reduces the uncertainty of the combined result.
Chapter PDF
Similar content being viewed by others
References
Dosher, B., Sperling, G., Wurst, S.: Tradeoffs between stereopsis and proximity luminance covariance as determinants of perceived 3D structure. Vision Research 26 (1986) 973–990
Jacobs, R., Fine, I.: Experience-dependent integration of texture and motion cues to depth. Vision Research 39 (1999) 4062–4075
Johnston, E.B., Cumming, B.G., Landy, M.: Integration of motion and steropsis cues. Vision Research 34 (1994) 2259–2275
Aloimonos, Y., Shulman, D.: Integration of Visual Modules: An Extension of the Marr Paradigm. Academic Press (1989)
Ayache, N., Faugeras, O.: Building, registering, and fusing noisy visual maps. International Journal of Robotics Research 7 (1988) 45–65
Kriegman, D., Triendl, E., Binford, T.: Stereo vision and navigation in buildings for mobile robots. IEEE Transactions on Robotics and Automation 5 (1989) 792–803
McKendall, R., Mintz, M.: Data fusion techniques using robust statistics. In Abidi, M., Gonzalez, R., eds.: Data Fusion in Robotics and Machine Intelligence. (1992) 211–244
Brautigam, C., Eklundh, J., Christensen, H.: A model-free voting approach for integrating multiple cues. In: ECCV’ 98. (1998) 734–750
Dean, T., Kaelbling, L., Kirman, J., Nicholson, A.: Planning under time constraints in stochastic domains. AI 76 (1995) 35–74
Horvitz, E., Lengyel, J.: Perception, attention, and resources: A decision-theoretic approach to graphics rendering. In: UAI’ 97. (1997) 238–249
Dean, T., Boddy, M.: An analysis of time-dependent planning. In: AAAI’ 88. (1988) 49–54
Zilberstein, S., Russell, S.: Optimal composition of real-time systems. AI 82 (1996) 181–213
Horvitz, E.: Reasoning under varying and uncertain resource constraints. In: AAAI’ 88. (1988) 111–116
Bulthoff, H., Yuille, A.: A bayesian framework for the integration of visual modules. In Inui, T., McClelland, J., eds.: Attention and Performance XVI: Information Integration in Perception and Communication. (1996) 49–70
Terzopoulos, D.: Physically-based fusion of visual data over space, time and scale. In Aggarwal, J., ed.: Multisensor Fusion for Computer Vision. Springer-Verlag (1993) 63–69
Zhang, G., Wallace, A.: Physical modeling and combination of range and intensity edge data. CVGIP 58 (1993) 191–220
Das, S., Ahuja, N.: Performance analysis of stereo, vergence, and focus as depth cues for active vision. PAMI 17 (1995) 1213–1219
Pankanti, S., Jain, A.: Integrating vision modules: Stereo, shading, grouping, and line labeling. PAMI 17 (1995) 831–842
Cho, K., Meer, P.: Image segmentation from consensus information. CVIU 68 (1997) 72–89
Azoz, Y., Devi, L., Sharma, R.: Reliable tracking of human arm dynamics by multiple cue integration and constraint fusion. In: CVPR’ 98. (1998) 905–910
Darrell, T., Gordon, G., Harville, M., Woodfill, J.: Integrated person tracking using stereo, color, and pattern detection. IJCV 37 (2000) 175–185
Graf, H., Cosatto, E., Gibbon, D., Kocheisen, M., Petajan, E.: Multi-modal system for locating heads and faces. In: AFGR’ 96. (1996) 88–93
Rasmussen, C., Hager, G.: Joint probabilistic techniques for tracking multi-part objects. In: CVPR’ 98. (1998) 16–21
Toyama, K., Horvitz, E.: Bayesian modality fusion: Probabilistic integration of multiple vision algorithms for head tracking. In: Fourth Asian Conference on Computer Vision. (2000)
Hennecke, M., Stork, D., Prasad, K.: Visionary speech: Looking ahead to practical speechreading system. In Stork, D., Hennecke, M., eds.: Speechreading by Humans and Machines: Models, Systems, and Applications (NATO ASI Series F: Computer and Systems Sciences, volume 150). (1996) 331–350
Szeliski, R.: Bayesian modeling of uncertainty in low-level vision. IJCV 5 (1990) 271–302
Meer, P., Mintz, D., Kim, D., Rosenfeld, A.: Robust regression methods for computer vision: A review. IJCV 6 (1991) 59–70
Maybeck, P.: Stochastic Models, Estimation and Control, Volume 1. Academic Press (1979)
Reynard, D., Wildenberg, A., Blake, A., Marchant, J.: Learning dynamics of complex motions from image sequences. In: ECCV’ 96. (1996) I:357–368
DeCarlo, D., Metaxas, D.: Optical flow constraints on deformable models with applications to face tracking. IJCV 32 (2000) 99–127
Bergen, J., Anandan, P., Hanna, K., Hingorani, R.: Hierarchical model-based motion estimation. In: ECCV’ 92. (1992) 237–252
Lowe, D.: Fitting parameterized three-dimensional models to images. PAMI 13 (1991) 441–450
Terzopoulos, D., Witkin, A., Kass, M.: Constraints on deformable models: Recovering 3D shape and nonrigid motion. AI 36 (1988) 91–123
Yuille, A., Cohen, D., Halliman, P.: Feature extraction from faces using deformable templates. IJCV 8 (1992) 104–109
Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
DeCarlo, D. (2002). Towards Real-Time Cue Integration by Using Partial Results. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds) Computer Vision — ECCV 2002. ECCV 2002. Lecture Notes in Computer Science, vol 2353. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47979-1_22
Download citation
DOI: https://doi.org/10.1007/3-540-47979-1_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43748-2
Online ISBN: 978-3-540-47979-6
eBook Packages: Springer Book Archive