Abstract
Content-based videos search engines often use the output of concept detectors to answer queries. The improvement of detectors requires computational power and human labor. It is therefore important to predict detector performance economically and improve detectors adaptively. Detector performance prediction, however, has not received much research attention so far. In this paper, we propose a prediction approach that uses human annotators. The annotators estimate the number of images in a grid in which a concept is present, a task that can be performed efficiently. Using these estimations, we define a model for the posterior probability of a concept being present given its confidence score. We then use the model to predict the average precision of a detector. We evaluate our approach using a TRECVid collection of Internet archive videos, comparing it to an approach that labels individual images. Our approach requires fewer resources while achieving good prediction quality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Performance evaluation and performance prediction can be performed similar but differ in their aim: performance evaluation aims at comparing detectors and performance prediction aims at deriving actions (e.g. change of detector technique).
- 2.
Detector confidence scores indicates the belief of a detector that an image contains a concept.
- 3.
We measured the pure annotation time, excluding the time to load the images.
References
von Ahn, L., Dabbish, L.: Designing games with a purpose. Commun. ACM 51(8), 58–67 (2008), http://doi.acm.org/10.1145/1378704.1378719
Aly, R., Hiemstra, D., de Jong, F., Apers, P.: Simulating the future of concept-based video retrieval under improved detector performance. Multimed. Tools Appl. 60(1), 203–231 (2012)
Aly, R., Hiemstra, D., de Vries, A.P.: Reusing annotation labor for concept selection. In: CIVR ’09: Proceedings of the International Conference on Content-Based Image and Video Retrieval. ACM, New York (2009)
Goldstein, E.: The perception of multiple images. Educ. Technol. Res. Dev. 23, 34–68 (1975)
Hajimirza, S., Proulx, M., Izquierdo, E.: Reading users’ minds from their eyes: a method for implicit image annotation. IEEE Trans. Multimed. 14(3), 805–815 (2012)
Hauff, C., Azzopardi, L., Hiemstra, D., de Jong, F.: Query performance prediction: evaluation contrasted with effectiveness. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 204–216. Springer, Heidelberg (2010)
Hauff, C., Hiemstra, D., de Jong, F.: A survey of pre-retrieval query performance predictors. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management. CIKM ’08, pp. 1419–1420. ACM, New York (2008). http://doi.acm.org/10.1145/1458082.1458311
Jonassen, D.: Implications of multi-image for concept acquisition. Educ. Technol. Res. Dev. 27(4), 291–302 (1979)
Over, P., Awad, G., Fiscus, J., Antonishek, B., Michel, M., Smeaton, A., Kraaij, W., Quénot, G.: TRECVID 2011 – An overview of the goals, tasks, data, evaluation mechanisms, and metrics. In: TREC 2011 Video Retrieval Evaluation Online Proceedings (TRECVid 2010). National Institute of Standards and Technology, Gaithersburg (2011)
Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and TRECVid. In: MIR ’06: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, pp. 321–330. ACM, New York (2006)
Snoek, C.G.M., Worring, M.: Are concept detector lexicons effective for video search? In: 2007 IEEE International Conference on Multimedia and Expo, pp. 1966–1969 (2007)
Snoek, C.G.M., Worring, M.: Concept-based video retrieval. Found. Trends Inf. Retr. 4(2), 215–322 (2009)
Yang, J., Hauptmann, A.G.: (un)Reliability of video concept detection. In: CIVR ’08: Proceedings of the 2008 International Conference on Content-based Image and Video Retrieval, pp. 85–94. ACM, New York (2008)
Yilmaz, E., Kanoulas, E., Aslam, J.: A simple and efficient sampling method for estimating AP and NDCG. In: SIGIR’08: Proceedings of the 31st Annual International ACM SIGIR Conference on Research And Development in Information Retrieval, pp. 603–610. ACM, New York (2008)
Yilmaz, E., Aslam, J.A.: Inferred AP: estimating average precision with incomplete judgments. In: Fifteenth ACM International Conference on Information and Knowledge Management (CIKM), pp. 102–111. ACM, New York, November 2006
Acknowledgments
This work was co-funded by the EU FP7 Project AXES ICT-269980 and CUbRIK ICT-287704.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix - Optimization of Maximum Likelihood
A Appendix - Optimization of Maximum Likelihood
In this section, we present a procedure for the maximization problem of finding the maximum likelihood weights for the logistic regression model defined in Sect. 3.2. The maximization problem was formulated as follows:
where \(h\) is the estimate given by the annotator. We assume that \(x({\mathbf {w}})\) is Gaussian distributed around a mean \(\mu _h\) with variance \(v_h\), where both depend on the annotator’s estimation \(h\) and possibly the annotator himself. In this paper, we choose a simple method to come from \(h\) to \(\mu _h\) by choosing \(\mu _h=h\) for \(1\le h < N\) and \(\mu _0=1\) and \(\mu _N=N-1\) modeling the case where the annotator oversees at least one example when annotating extreme values. We ignore \(v_h\) because it does not play a role in the optimization. Therefore, for the likelihood a weight vector \({\mathbf {w}}\) given an annotation \(h\), we have:
where \(\mathcal {N}\!\) is the Gaussian density function. Taking the log of \(\mathcal {N}\!\) yields:
By expanding \((\cdot )^2\), leaving out constant terms and factors, and multiplying by \({-}1\) to convert the maximization to a minimization problem, we get:
By expanding the definition of the expected number of positive examples in (2), leaving out the constant \(\mu _h^2\) and combining factors we get:
And by expanding the square of the expectation \(x({\mathbf {w}})^2\):
To optimize this function we use the gradient decent method with the update rule:
where \(t\) refers to the \(t\)th iteration, \(\lambda \) is the “update speed” of the method (in this paper we chose \(\lambda = 0.03\)) and \(\triangledown y({\mathbf {w}}^{t})\) is gradient of the method with respect to \({\mathbf {w}}\). The gradient \(\triangledown y\) is the vector of partial derivatives:
To calculate the two partial derivations of \(\triangledown y\), we start by calculating the gradient \(\triangledown \sigma \) which used in the second expression of (5). As an intermediate step, we give the derivation of a general sigmoid function \(\sigma (s)\):
Given this relationship we get the partial derivatives for \(\triangledown \sigma \):
Furthermore, for the derivation of the products of two sigmoid functions \(u_{ij}({\mathbf {w}}) = \sigma _i({\mathbf {w}}) \sigma _j({\mathbf {w}})\) in (5), we use the product rule and the results from (9). For \(w_1\) we have:
and for \(w_2\):
Therefore, the partial derivatives for the gradient \(\triangledown y\) in (7) are:
and
Note that although quadratic in the number of images, the gradient can be calculated efficiently by memorizing the values for \(\sigma _i({\mathbf {w}}^t)\) for \(1\le i \le N\).
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Aly, R., Larson, M. (2014). Detector Performance Prediction Using Set Annotations. In: Nürnberger, A., Stober, S., Larsen, B., Detyniecki, M. (eds) Adaptive Multimedia Retrieval: Semantics, Context, and Adaptation. AMR 2012. Lecture Notes in Computer Science(), vol 8382. Springer, Cham. https://doi.org/10.1007/978-3-319-12093-5_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-12093-5_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12092-8
Online ISBN: 978-3-319-12093-5
eBook Packages: Computer ScienceComputer Science (R0)