Evaluating Performance in Continuous Context Recognition Using Event-Driven Error Characterisation
Evaluating the performance of a continuous activity recognition system can be a challenging problem. To-date there is no widely accepted standard for dealing with this, and in general methods and measures are adapted from related fields such as speech and vision. Much of the problem stems from the often imprecise and ambiguous nature of the real-world events that an activity recognition system has to deal with. A recognised event might have variable duration, or be shifted in time from the corresponding real-world event. Equally it might be broken up into smaller pieces, or joined together to form larger events. Most evaluation attempts tend to smooth over these issues, using “fuzzy” boundaries, or some other parameter based error decision, so as to make possible the use of standard performance measures (such as insertions and deletions.) However, we argue that reducing the various facets of a activity system into limited error categories – that were originally intended for different problem domains – can be overly restrictive. In this paper we attempt to identify and characterise the errors typical to continuous activity recognition, and develop a method for quantifying them in an unambiguous manner.
By way of an initial investigation, we apply the method to an example taken from previous work, and discuss the advantages that this provides over two of the most commonly used methods.
KeywordsGround Truth Activity Recognition Automatic Speech Recognition Optical Character Recognition Event Error
Unable to display preview. Download preview PDF.
- 1.Agrawal, R., Lin, K.-I., Sawhney, H.S., Shim, K.: Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In: 21st Int’l Conf. on Very Large Data Bases, Zurich, CH, pp. 490–501. MKaufmann, San Francisco (1995)Google Scholar
- 3.Berndt, D., Clifford, J.: Using dynamic time warping to find patterns in time series. In: Proc. KDD Workshop, Seattle, WA, pp. 359–370 (1994)Google Scholar
- 4.Perng, C.-S., Wang, H., Zhang, S.R., Parker, D.S.: Landmarks: a new model for similarity-based pattern querying in time series databases. In: ICDE (2000)Google Scholar
- 5.Clark, A.F., Clark, C.: Performance characterization in computer vision a tutorial. Essex, UK (1999), http://www.peipa.essex.ac.uk/benchmark/
- 7.Eickeler, S., Rigoll, G.: A novel error measure for the evaluation of video indexing systems. In: Int’l. Conf. on Acous., Speech & Sig. Proc. (June 2000)Google Scholar
- 8.Junker, H., Ward, J.A., Lukowicz, P., Tröster, G.: Benchmarks and a Data Base for Context Recognition (2004), ISBN 3-9522686-2-3Google Scholar
- 10.Hsu, W., Kennedy, L., Huang, C.-W., Chang, S.-F., Lin, C.-Y., Iyengar, G.: News video story segmentation using fusion of multi-level multi-modal features in trecvid 2003. In: ICASSP (May 2004)Google Scholar
- 11.Ali, A., Aggarwal, J.K.: Segmentation and recognition of continuous human activity. In: IEEE Workshop on detection and recognition of Events in Video, Vancouver, Canada, pp. 28–35 (2001)Google Scholar
- 12.Junker, H., Lukowicz, P., Tröster, G.: Continuous recognition of arm activities with body-worn inertial sensors. In: Proc. IEEE Int’l Symp. on Wearable Comp., pp. 188–189 (2004)Google Scholar
- 13.Kanungo, T., Marton, G.A., Bulbul, O.: Paired model evaluation of ocr algorithms. Technical report, Center for Automation Research, Uni.Maryland (1998)Google Scholar
- 14.Keogh, E., Kasetty, S.: On the need for time series data mining benchmarks: a survey and empirical demonstration. In: 8th int’l. conf. on knowledge discovery and data mining, pp. 102–111. ACM, New York (2002)Google Scholar
- 15.Kern, N., Junker, H., Lukowicz, P., Schiele, B., Tröster, G.: Wearable sensing to annotate meeting recordings. Personal and Ubiquitous Computing (2003)Google Scholar
- 17.Lampe, M., Strassner, M., Fleisch, E.: A ubiquitous computing environment for aircraft maintenance. In: ACM symp. on Applied comp, pp. 1586–1592 (2004)Google Scholar
- 18.Lester, J., Choudhury, T., Kern, N., Borriello, G., Hannaford, B.: A hybrid discriminative/generative approach for modeling human activities. In: IJCAI (2005)Google Scholar
- 19.Müller, H., Müller, W., McG, D.: Squire, S. Marchand-Maillet, and T. Pun. Performance evaluation in content-based image retrieval: Overview and proposals. Technical report, Uni. Geneve, Switzerland (1999)Google Scholar
- 20.NIST: Proc. of TREC Video Retrieval Evaluation Conference (TRECVID) (2003)Google Scholar
- 21.Pavlovic, V., Rehg, J.: Impact of dynamic model learning on classification of human motion. In: Comp. Vision and Pattern Rec (CVPR), pp. 788–795 (2000)Google Scholar