## Abstract

High-resolution images can be used to resolve matching ambiguities between trajectory fragments (tracklets), which is a key challenge in multiple-target tracking. A pan–tilt–zoom (PTZ) camera, which can pan, tilt and zoom, is a powerful and efficient tool that offers both close-up views and wide area coverage on demand. The wide area enables tracking of many targets, while the close-up view allows individuals to be identified from high-resolution images of their faces. A central component of a PTZ tracking system is a scheduling algorithm that determines which target to zoom in on, particularly when the high-resolution images are also used for tracklet matching. In this paper, we study this scheduling problem from a theoretical perspective. We propose a novel data structure, the Multi-strand Tracking Graph (MSG), which represents the set of tracklets computed by a tracker and the possible associations between them. The MSG allows efficient scheduling as well as resolving of matching ambiguities between tracklets. The main feature of the MSG is the auxiliary data saved in each vertex, which allows efficient computation while avoiding time-consuming graph traversal. Synthetic data simulations are used to evaluate our scheduling algorithm and to demonstrate its superiority over a naïve one.

This is a preview of subscription content, access via your institution.

## Notes

Supplementary material for this paper. It is available here: https://www.dropbox.com/s/fzxsq8ifklct53c/The_Multi_Strand_Graph_for_a_PTZ_Tracker_supplementary_material.avi?dl=1.

## References

Bagdanov, A., del Bimbo, A., Pernici, F.: Acquisition of high-resolution images through on-line saccade sequence planning. In: International Workshop on Video Surveillance and Sensor Networks (VSSN) (2005)

Cai, Y., Medioni, G.: Persistent people tracking and face capture using a PTZ camera. Mach. Vis. Appl.

**27**(3), 397–413 (2016)Cai, Y., Medioni, G., Dinh, T.: Towards a practical PTZ face detection and tracking system. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2013)

Costello, C., Diehl, C., Banerjee, A., Fisher, H.: Scheduling an active camera to observe people. In: International Workshop on Video Surveillance and Sensor Networks (VSSN) (2004)

Costello, C., Wang, I.: Surveillance camera coordination through distributed scheduling. In: IEEE Conference on Decision and Control and European Control Conference (CDC-ECC) (2005)

Del Bimbo, A., Pernici, F.: Towards on-line saccade planning for high-resolution image sensing. Pattern Recognit. Lett.

**27**(15), 1826–1834 (2006)Henriques, J., Caseiro, R., Batista, J.: Globally optimal solution to multi-object tracking with merged measurements. In: International Conference on Computer Vision (ICCV) (2011)

Krahnstoever, N., Yu, T., Lim, S., Patwardhan, K., Tu, P.: Collaborative real-time control of active cameras in large scale surveillance systems. In: Workshop on Multi-camera and Multi-modal Sensor Fusion Algorithms and Applications (2008)

Lim, S., Davis, L., Mittal, A.: Constructing task visibility intervals for video surveillance. Multimed. Syst.

**12**(3), 211–226 (2006)Lim, S., Davis, L., Mittal, A.: Task scheduling in large camera networks. In: Asian Conference on Computer Vision (2007)

Morye, A., Ding, C., Roy-Chowdhury, A., Farrell, J.: Distributed constrained optimization for bayesian opportunistic visual sensing. IEEE Trans. Control Syst. Technol.

**22**(6), 2302–2318 (2014)Natarajan, P., Hoang, T., Low, K., Kankanhalli, M.: Decision-theoretic approach to maximizing observation of multiple targets in multi-camera surveillance. In: International Conference on Autonomous Agents and Multiagent Systems (2012)

Neves, J.C., Proença, H.: Dynamic camera scheduling for visual surveillance in crowded scenes using Markov random fields. In: IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS) (2015)

Nillius, P., Sullivan, J., Carlsson, S.: Multi-target tracking-linking identities using Bayesian network inference. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2006)

Prokaj, J., Duchaineau, M., Medioni, G.: Inferring tracklets for multi-object tracking. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2011)

Qureshi, F., Terzopoulos, D.: Surveillance camera scheduling: a virtual vision approach. Multimed. Syst.

**12**(3), 269–283 (2006)Qureshi, F., Terzopoulos, D.: Surveillance in virtual reality: system design and multi-camera control. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2007)

Qureshi, F., Terzopoulos, D.: Planning ahead for ptz camera assignment and handoff. In: ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC) (2009)

Reta, C., Altamirano, L., Gonzalez, J.A., Medina-Carnicer, R.: Three hypothesis algorithm with occlusion reasoning for multiple people tracking. J. Electron. Imaging

**24**(1), 013015–013015 (2015)Salvagnini, P., Pernici, F., Cristani, M., Lisanti, G., Del Bimbo, A., Murino, V.: Non-myopic information theoretic sensor management of a single pan-tilt-zoom camera for multiple object detection and tracking. Comput. Vis. Image Underst.

**134**, 74–88 (2015)Salvagnini, P., Pernici, F., Cristani, M., Lisanti, G., Masi, I., Del Bimbo, A., Murino, V.: Information theoretic sensor management for multi-target tracking with a single pan-tilt-zoom camera. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2014)

Sekii, T.: Robust, real-time 3D tracking of multiple objects with similar appearances. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

Sommerlade, E., Reid, I.: Information-theoretic active scene exploration. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2008)

Strat, T., Arambel, P., Antone, M., Rago, C., Landan, H.: A multiple-hypothesis tracking of multiple ground targets from aerial video with dynamic sensor control. In: Signal Processing, Sensor Fusion, and Target Recognition (SPIE) (2004)

Sullivan, J., Carlsson, S.: Tracking and labelling of interacting multiple targets. In: European Conference on Computer Vision (ECCV) (2006)

Wang, X., Türetken, E., Fleuret, F., Fua, P.: Tracking interacting objects optimally using integer programming. In: European Conference on Computer Vision (ECCV) (2014)

Ward, C., Naish, M.: Scheduling active camera resources for multiple moving targets. In: Canadian Conference on Electrical and Computer Engineering (CCECE) (2009)

Wheeler, F.W., Weiss, R.L., Tu, P.H.: Face recognition at a distance system for surveillance applications. In: IEEE International Conference on Biometrics: Theory Applications and Systems (BTAS) (2010)

Wu, Z., Kunz, T., Betke, M.: Efficient track linking methods for track graphs using network-flow and set-cover techniques. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2011)

Yang, B., Nevatia, R.: An online learned CRF model for multi-target tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012)

## Acknowledgements

This research was supported by the Israeli Ministry of Science, Grant No. 3-8700 and by Award No. 2011-IJ-CX-K054, awarded by the National Institute of Justice, Office of Justice Programs, U.S. Department of Justice.

## Author information

### Authors and Affiliations

### Corresponding author

## Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (avi 49205 KB)

## Appendix A: Labeling Extensions

### Appendix A: Labeling Extensions

In this appendix we describe a refined computation of \(r_1(v)\) (Sect. 4.2.1). The labeled tracklet of each labeled origin of *v*, \(u\in O^\star (v)\), clearly consists of the tracklet represented by *u* itself. In addition, a labeling of a vertex can sometimes also be extended to its descendants. Let *u* be a labeled vertex with a single compound child, \(u_\mathrm{c}\) (for example, see the labeled \(v_1\) and its compound child \(v_4\) in Fig. 4a). The tracklet of \(u_\mathrm{c}\) follows the tracklet of *u*, and both clearly represent the same target. Hence, the labeled tracklet of *u* can be extended to include also the tracklet of \(u_\mathrm{c}\). As a result, the length of the labeled tracklet is given by \(\tau (u)+\tau (u_\mathrm{c})\). Additional vertices may also be added to this length, as follows. We define the *forward labeling extension* to be the path \(\ell (u_\mathrm{c},u')\) in which each vertex is a single compound child of its parent. The computation of \(r_1(v)\) can be refined by considering forward labeling extensions of the labeled origins of *v*, as described next.

Let us consider again the path \(\ell (u,v)\), where *u* is a labeled origin of *v*. We wish to find the contribution of *u* to \(r_1(v)\) when considering not only *u* itself but also its forward labeling extension. This contribution excludes the entire extension of *u*, which is labeled prior to the labeling of *v*. The length of the forward labeling extension of *u*, \(\tau (\ell (u_\mathrm{c},u'))\), is therefore subtracted from \(\tau (\ell (u,v))\). That is, the contribution of the possible matching of *u* to *v* is given by \(\tau (\ell (u,v))-\tau (\ell (u_\mathrm{c},u'))\). We next describe the auxiliary data needed to compute the refined \(r_1(v)\) efficiently.

For each vertex *v*, we define \(n_\mathrm{ret}(v)\) to be the number of forward labeling extensions of the labeled origins of *v* in which *v* is included. This value can be recursively computed based only on the vertex itself and its direct parents, as follows:

where \({\psi }(v)\) is a binary value that determines whether *v* has only one compound child. Note that \(n_\mathrm{ret}(v)\le n_o^\star (v)\) and that \(n_\mathrm{ret}\) equals 1 for a labeled origin and 0 for an unlabeled origin.

The refined recursive computation of \(r_1(v)\) (that replaces Eq. 14 above) is given by:

For example, in Fig. 4a we wish to compute \(r_1(v_9)\), that sums the contributions of \(v_1\) and \(v_3\), the two labeled origins of the unlabeled vertex \(v_9\). Due to forward labeling extensions, \(v_1\) contributes \(\tau (\ell (v_5,v_9))\) and \(v_3\) contributes \(\tau (v_9)\).

The labeling of a vertex can also be extended backwards, in a manner similar to the forward labeling extension (as explained in Sect. 4.2.1). While the refined \(r_1(v)\) computation uses only the forward labeling extension, the \(r_2(v)\) computation in Eq. 20 uses only the backward labeling extension. For example, in Fig. 5b, only \(v_3\) is labeled. The unlabeled vertex \(v_9\) has two unlabeled origins, \(v_1\) and \(v_2\), and a potential match by elimination, \(v_5\). We wish to compute \(r_2(v_9)\), that sums the expected increase of \(L^\star \) (Eq. 10) over \(v_1\) and \(v_2\), in the event where \(v_9\) is labeled and is matched to \(v_5\). Since \(v_5\) has a single compound parent, \(v_4\), the tracklet of \(v_5\) can be backward extended to include also the tracklet of \(v_4\). Indeed, the recursive computation (Eq. 20) results in \(r_2(v_9)=2\tau (\ell (v_4,v_9))\).

Note that both forward and backward labeling extensions are considered in the experiments for the evaluation of our scheduler.

## Rights and permissions

## About this article

### Cite this article

Melman, S., Moses, Y., Medioni, G. *et al.* The Multi-strand Graph for a PTZ Tracker.
*J Math Imaging Vis* **60**, 594–608 (2018). https://doi.org/10.1007/s10851-017-0774-9

Received:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s10851-017-0774-9