Skip to main content
Log in

A scalable algorithm for extraction and clustering of event-related pictures

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The event detection problem, which is closely related to clustering, has gained a lot of attentions within event detection for textual documents. However, although image clustering is a problem that has been treated extensively in both Content-Based Image Retrieval (CBIR) and Text-Based Image Retrieval (TBIR) systems, event detection within image management is a relatively new area. Having this in mind, we propose a novel approach for event extraction and clustering of images, taking into account textual annotations, time and geographical positions. Our goal is to develop a clustering method based on the fact that an image may belong to an event cluster. Here, we stress the necessity of having an event clustering and cluster extraction algorithm that are both scalable and allow online applications. To achieve this, we extend a well-known clustering algorithm called Suffix Tree Clustering (STC), originally developed to cluster text documents using document snippets. The idea is that we consider an image along with its annotation as a document. Further, we extend it to also include time and geographical position so that we can capture the contextual information from each image during the clustering process. This has appeared to be particularly useful on images gathered from online photo-sharing applications such as Flickr. Hence, our STC-based approach is aimed at dealing with the challenges induced by capturing contextual information from Flickr images and extracting related events. We evaluate our algorithm using different annotated datasets mainly gathered from Flickr. As part of this evaluation we investigate the effects of using different parameters, such as time and space granularities, and compare these effects. In addition, we evaluate the performance of our algorithm with respect to mining events from image collections. Our experimental results clearly demonstrate the effectiveness of our STC-based algorithm in extracting and clustering events.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. See http://www.flickr.com

  2. See http://www.panoramio.com

  3. See http://picasaweb.google.com

  4. See http://blog.flickr.net/en/2010/09/19/5000000000/.

  5. See http://blog.flickr.net/en/2009/02/05/100000000-geotagged-photos-plus/.

  6. See http://blog.flickr.net/en/2005/08/01/the-new-new-things/.

  7. This paper is the winner of the “Where I am?” ICCV 2005 contest. See also http://research.microsoft.com/en-us/um/people/szeliski/visioncontest05/default.htm.

  8. In our method, the maximum value of k is 3.

  9. EXIF is shortened for Exchangeable image file format. See also http://en.wikipedia.org/wiki/Exchangeable_image_file_format.

  10. http://www.flickr.com/services/api

  11. The number of persons that participated in our evaluation varied between 4 and 8.

References

  1. Allan J, Carbonell J, Doddington G, Yamron J, Yang Y, Umass JA, Cmu BA, Cmu DB, Cmu AB, Cmu RB, Dragon IC, Darpa GD, Cmu AH, Cmu JL, Umass VL, Cmu XL, Dragon SL, Dragon VM, Umass RP, Cmu TP, Umass JP, Umass MS (1998) Topic detection and tracking pilot study final report. In: Proceedings of the DARPA broadcast news transcription and understanding workshop, pp 194–218

  2. Allan J, Papka R, Lavrenko V (1998) On-line new event detection and tracking. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’98. ACM, New York, pp 37–45. doi:10.1145/290941.290954

    Chapter  Google Scholar 

  3. Allan J, Papka R, Lavrenko V (1998) On-line new event detection and tracking. In: SIGIR ’98: proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 37–45

    Chapter  Google Scholar 

  4. Baeza-Yates RA, Ribeiro-Neto B (2011) Modern information retrieval: the concepts and technology behind search. Addison-Wesley, New York

    Google Scholar 

  5. Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Comput Vis Image Underst 110:346–359

    Article  Google Scholar 

  6. Becker H, Naaman M, Gravano L (2010) Learning similarity metrics for event identification in social media. In: WSDM ’10: proceedings of the 3rd ACM international conference on web search and data mining. ACM, New York, pp 291–300

    Chapter  Google Scholar 

  7. Brants T, Chen F, Farahat A (2003) A system for new event detection. In: SIGIR ’03: proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval. ACM, New York, pp 330–337

    Chapter  Google Scholar 

  8. Bttcher S, Clarke CLA, Cormack GV (2010) Information retrieval: implementing and evaluating search engines. MIT Press, Cambridge

    Google Scholar 

  9. Carpineto C, Osiński S, Romano G, Weiss D (2009) A survey of web clustering engines. ACM Comput Surv 41:17:1–17:38. doi:10.1145/1541880.1541884

    Article  Google Scholar 

  10. Chen L, Roy A (2009) Event detection from flickr data through wavelet-based spatial analysis. In: CIKM ’09: proceeding of the 18th ACM conference on information and knowledge management. ACM, New York, pp 523–532

    Chapter  Google Scholar 

  11. Das M, Loui AC (2009) Detecting significant events in personal image collections. In: ICSC ’09: proceedings of the 2009 IEEE international conference on semantic computing. IEEE Computer Society, Washington, pp 116–123

    Chapter  Google Scholar 

  12. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD, pp 226–231. http://dblp.uni-trier.de

  13. Fialho A, Troncy R, Hardman L, Saathoff C, Scherp A (2010) What’s on this evening? Designing user support for event-based annotation and exploration of media. In: EVENTS’10, 1st international workshop on EVENTS—recognising and tracking events on the web and in real life. Athens, Greece, 4 May 2010

  14. Fung GPC, Yu JX, Yu PS, Lu H (2005) Parameter free bursty events detection in text streams. In: VLDB ’05: proceedings of the 31st international conference on Very Large Data Bases. VLDB Endowment, pp 181–192

  15. Gammeter S, Bossard L, Quack T, Gool LJV (2009) I know what you did last summer: object-level auto-annotation of holiday snaps. In: ICCV. IEEE, pp 614–621

  16. Gao B, Liu TY, Qin T, Zheng X, Cheng QS, Ma WY (2005) Web image clustering by consistent utilization of visual features and surrounding texts. In: Proceedings of the 13th annual ACM international conference on multimedia, MULTIMEDIA ’05. ACM, New York, pp 112–121. doi:10.1145/1101149.1101167

    Chapter  Google Scholar 

  17. Granville B, Kutti NS, Missikoff M, Nguyen NT (eds) (2007) International conference on enterprise information systems and web technologies. EISWT-07, Orlando, ISRST, 9–12 July 2007

  18. Gulla JA, Borch HO, Ingvaldsen JE (2008) Contexualized clustering in exploratory web search. In: Emerging technologies of text mining: techniques and applications, vol 3237. IGI Global, pp 184–207

  19. Hays J, Efros A (2008) Im2gps: estimating geographic information from a single image. In: IEEE conference on computer vision and pattern recognition. CVPR

  20. Hernandez-Aranda D, Granados R, Garcia-Serrano A (2010) Uned at mediaeval 2010: exploiting text metadata for automatic video tagging. In: MediaEval 2010 working notes proceedings. Pisa, Italy. http://www.multimediaeval.org/worknotes2010/UNED_TaggingProf.pdf. Accessed 10 May 2012

  21. Hu M, Sun A, Lim EP (2008) Event detection with common user interests. In: WIDM ’08: proceeding of the 10th ACM workshop on Web information and data management. ACM, New York, pp 1–8

    Chapter  Google Scholar 

  22. Jin Y, Myaeng SH, Jung Y (2007) Use of place information for improved event tracking. Inf Process Manage 43(2):365–378

    Article  Google Scholar 

  23. Kamps J, Monz C, de Rijke M, Sigurbjörnsson B (2003) Language-dependent and language-independent approaches to cross-lingual text retrieval. In: CLEF, pp 152–165

  24. Kurtz S (1999) Reducing the space requirement of suffix trees. Softw Pract Exper 29:1149–1171. doi:10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O., http://portal.acm.org/citation.cfm?id=335579.335580

    Article  Google Scholar 

  25. Larson M, Soleymani M, Serdyukov P, Murdock V, Jones G (eds) (2010) Working notes proceedings of the MediaEval 2010 workshop. Pisa, Italy. http://www.multimediaeval.org/mediaeval2010/2010worknotes/

  26. Li LJ, Fei-Fei L (2007) What, where and who? Classifying events by scene and object recognition. In: IEEE international conference on computer vision, pp 1–8

  27. Lin Y, Lin H, Jin S, Ye Z (2011) Social annotation in query expansion: a machine learning approach. In: Ma WY, Nie JY, Baeza-Yates RA, Chua TS, Croft WB (eds) SIGIR. ACM, pp 405–414

  28. Loui AC, Savakis AE (2003) Automated event clustering and quality screening of consumer pictures for digital albuming. IEEE Trans Multimedia 5(3):390–402

    Article  Google Scholar 

  29. Nemrava, J (2006) Refining search queries using wordnet glosses. In: Poster and demo proceedings of 15th international conference on knowledge engineering and knowledge management managing knowledge in a world of networks (EKAW 2006), pp 33–34

  30. Papadopoulos S, Zigkolis C, Kompatsiaris Y, Vakali A (2011) Cluster-based landmark and event detection for tagged photo collections. IEEE Multimedia 18(1):52–63

    Article  Google Scholar 

  31. Porter MF (1997) An algorithm for suffix stripping. Morgan Kaufmann Publishers Inc., San Francisco

    Google Scholar 

  32. Quack T, Leibe B, Van Gool L (2008) World-scale mining of objects and events from community photo collections. In: Proceedings of the 2008 international conference on content-based image and video retrieval, CIVR ’08. ACM, New York, pp 47–56

    Chapter  Google Scholar 

  33. Rattenbury T, Good N, Naaman M (2007) Towards automatic extraction of event and place semantics from flickr tags. In: SIGIR ’07: proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 103–110

    Chapter  Google Scholar 

  34. Ruocco M, Ramampiaro H (2010) Event clusters detection on flickr images using a suffix-tree structure. In: 2010 IEEE international symposium on multimedia. IEEE, pp 41–48

  35. Serdyukov P, Murdock V, van Zwol R (2009) Placing flickr photos on a map. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, SIGIR ’09. ACM, New York

    Google Scholar 

  36. Shapira B, Taieb-Maimon M, Nemeth Y (2005) Subjective and objective evaluation of interactive and automatic query expansion. Online Inform Rev 29(4):374–390

    Article  Google Scholar 

  37. Smith DA (2002) Detecting events with date and place information in unstructured text. In: JCDL ’02: proceedings of the 2nd ACM/IEEE-CS joint conference on digital libraries. ACM, New York, pp 191–196

    Chapter  Google Scholar 

  38. Trieschnigg D, Kraaij W (2005) Hierarchical topic detection in large digital news archives: exploring a sample based approach. J Digital Inf Manag 3(1):21–27

    Google Scholar 

  39. Troncy R, Malocha B, Fialho ATS (2010) Linking events with media. In: I-SEMANTICS, ACM international conference proceeding series. ACM

  40. Ukkonen E (1995) On-line construction of suffix trees. Algorithmica 14(3):249–260

    Article  MATH  MathSciNet  Google Scholar 

  41. Wartena C (2010) Using a divergence model for mediaevals tagging task. In: MediaEval 2010 working notes proceedings. Pisa, Italy. http://www.multimediaeval.org/worknotes2010/Novay_TaggingProf.pdf. Accessed 10 May 2012

  42. Yang Y, Pierce T, Carbonell J (1998) A study of retrospective and on-line event detection. In: SIGIR ’98: proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 28–36

    Chapter  Google Scholar 

  43. Yuan J, Luo J, Kautz H, Wu Y (2008) Mining gps traces and visual words for event classification. In: MIR ’08: proceeding of the 1st ACM international conference on multimedia information retrieval. ACM, New York, pp 2–9

    Chapter  Google Scholar 

  44. Zamir O, Etzioni O (1998) Web document clustering: a feasibility demonstration. In: SIGIR ’98: proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 46–54

    Chapter  Google Scholar 

  45. Zhang K, Zi J, Wu LG (2007) New event detection based on indexing-tree and named entity. In: SIGIR ’07: proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 215–222

    Chapter  Google Scholar 

  46. Zhang W, Kosecka J (2006) Image based localization in urban environments. In: Proceedings of the 3rd international symposium on 3d data processing, visualization, and transmission, pp 33–40. doi:10.1109/3DPVT.2006.80

Download references

Acknowledgements

We would like to thank Symeon Papadopoulos for being helpful and providing us the Barcelona dataset. This work is supported by the Research Council of Norway, grant number 176858 under the VERDIKT program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Massimiliano Ruocco.

Additional information

This paper is an extended and revised version of IEEE ISM 2010 [34]

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ruocco, M., Ramampiaro, H. A scalable algorithm for extraction and clustering of event-related pictures. Multimed Tools Appl 70, 55–88 (2014). https://doi.org/10.1007/s11042-012-1087-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-012-1087-z

Keywords

Navigation