Discovery of interactive graphs for understanding and searching time-indexed corpora

Subašić, Ilija; Berendt, Bettina

doi:10.1007/s10115-009-0227-x

Discovery of interactive graphs for understanding and searching time-indexed corpora

Regular Paper
Published: 17 July 2009

Volume 23, pages 293–319, (2010)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Ilija Subašić¹ &
Bettina Berendt¹

170 Accesses
13 Citations
Explore all metrics

Abstract

Rich information spaces (like the Web or scientific publications) are full of “stories”: sets of statements that evolve over time, manifested as, for example, collections of news articles reporting events that relate to an evolving crime investigation, sets of news articles and blog posts accompanying the development of a political election campaign, or sequences of scientific papers on a topic. In this paper, we formulate the problem of discovering such stories as Evolutionary Theme Pattern Discovery, Summary and Exploration (ETP3). We propose a method and a visualisation tool for solving ETP3 by understanding, searching and interacting with such stories and their underlying documents. In contrast to existing approaches, our method concentrates on relational information and on local patterns rather than on the occurrence of individual concepts and global models. In addition, it relies on interactive graphs rather than natural language as the abstracted story representations. Furthermore, we present an evaluation framework. Two real-life case studies are used to illustrate and evaluate the method and tool.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Adar E, Dontcheva M, Fogarty J, Weld DS (2008) Zoetrope: interacting with the ephemeral web. In: UIST ’08: Proceedings of the 21st annual ACM symposium on user interface software and technology. ACM, New York, pp 239–248
Allan J, Gupta R, Khandelwal V (2001) Temporal summaries of news topics. In: SIGIR 2001: Proceedings of the 24th Annual International ACM SIGIR conference on research and development in information retrieval. ACM, pp New York, pp 10–18
Allan JF (2002) Topic detection and tracking. Springer, Berlin
MATH Google Scholar
Banko M, Cafarella MJ, Soderland S, Broadhead M, Etzioni O (2007) Open information extraction from the web. In: IJCAI 2007, proceedings of the 20th international joint conference on artificial intelligence, pp 2670–2676
Berendt B, Subašić I (2009) Measuring graph topology for interactive temporal event detection. Künstliche Intelligenz 02/09: 11–17
Google Scholar
Biryukov M, Angheluta R, Moens M-F (2005) Multidocument question answering text summarization using topic signatures. J Digital Inf Manag 3(1): 27–33
Google Scholar
Bonchi F, Castillo C, Donato D, Gionis A (2008) Topical query decomposition. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York
Brandes U, Lerner J (2008) Visual analysis of controversy in user-generated encyclopedias. Inf Vis 7(1): 34–48
Article Google Scholar
Chan J, Bailey J, Leckie C (2008) Discovering correlated spatio-temporal changes in evolving graphs. Knowl Inf Syst 16(1): 53–96
Article Google Scholar
Chen C (2003) Mapping scientific frontiers. Springer, London
Google Scholar
Chen C (2006) Citespace II: detecting and visualizing emerging trends and transient patterns in scientific literature. J Am Soc Inf Sci Technol 57(3): 359–377
Article Google Scholar
Chen CC, Chen MC (2008) TSCAN: a novel method for topic summarization and content anatomy. In SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 579–586
Choudhary R, Mehta S, Bagchi A, Balakrishnan R (2008) Towards characterization of actor evolution and interactions in news corpora. In: Advances in information retrieval, 30th European conference on IR research, ECIR 2008. Lecture notes in computer science, vol 4956. Springer, Heidelberg, pp 422–429
Clifton C, Cooley R, Rennie J (2004) Topcat: data mining for topic identification in a text corpus. IEEE Trans Knowl Data Eng 16(8): 949–964
Article Google Scholar
Cui H, Wen J-R, Nie J-Y, Ma W-Y (2002) Probabilistic query expansion using query logs. In: WWW’02: Proceedings of the 11th international conference on World Wide Web. ACM, New York, pp 325–332
Debnath S, Mitra P, Pal N, Giles C (2005) Automatic identification of informative sections of web pages. IEEE Trans Knowl Data Eng 17(9): 1233–1246
Article Google Scholar
Elsas JL, Arguello J, Callan J, Carbonell JG (2008) Retrieval and feedback models for blog feed search. In: SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, New York, pp 347–354
Etzioni O, Cafarella M, Downey D, Kok S, Popescu A-M, Shaked T, Soderland S, Weld DS, Yates A (2004) Web-scale information extraction in KnowItAll (preliminary results). In: WWW’04: Proceedings of the 13th international conference on World Wide Web. ACM, New York, pp 100–110
Feldman R, Fresko M, Goldenberg J, Netzer O, Ungar LH (2007) Extracting product comparisons from discussion boards. In: Proceedings of the 7th IEEE international conference on data mining (ICDM 2007). IEEE Computer Society, pp 469–474
Fonseca BM, Golgher P, Pôssas B, Ribeiro-Neto B, Ziviani N (2005) Concept-based interactive query expansion. In: CIKM ’05: Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, New York, pp 696–703
Fung GPC, Yu JX, Yu PS, Lu H (2005) Parameter free bursty events detection in text streams. In: VLDB ’05: Proceedings of the 31st international conference on Very large data bases. VLDB Endowment, pp 181–192
Gruhl D, Guha RV, Kumar R, Novak J, Tomkins A (2005) The predictive power of online chatter. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 78–87
He Q, Chang K, Lim E-P, Zhang J (2007) Bursty feature representation for clustering text streams. In: Proceedings of the seventh SIAM international conference on data mining. SIAM
Hearst MA, Pedersen JO (1996) Reexamining the cluster hypothesis: scatter/gather on retrieval results. In: SIGIR ’96: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 76–84
Hollyscoop. Britney Spears news & pictures (2007) http://www.hollyscoop.com/britney-spears/16.aspx, retrieved 1 March 2009
Huang W, Eades P (2005) How people read graphs. In: APVis ’05: proceedings of the 2005 Asia-Pacific symposium on Information visualisation. Darlinghurst, Australia, Australia. Australian Computer Society, Inc., pp 51–58
Janssens FAL, Glänzel W, Moor BD (2007) Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 360–369
Kim H-J, Lee S-G (2004) An intelligent information system for organizing online text documents. Knowl Inf Syst 6(2): 125–149
Google Scholar
Kleinberg JM (2003) Bursty and hierarchical structure in streams. Data Min Knowl Discov 7(4): 373–397
Article MathSciNet Google Scholar
Kules W, Wilson ML, schraefel mc, Shneiderman B (2008) From keyword search to exploration: How result visualization aids discovery on the web. Technical report, University of Southampton, February 2008. http://eprints.ecs.soton.ac.uk/15169/
Leydesdorff L, Schank T (2008) Dynamic animations of journal maps: indicators of structural change and interdisciplinary developments. J Am Soc Inf Sci Technol 59(11): 1810–1818
Article Google Scholar
Lin C-Y (2004) Rouge: a package for automatic evaluation of summaries. In: Proceedings of the workshop on text summarization branches out (WAS 2004)
Lin C-Y, Hovy E (2002) Automated multi-document summarization in neats. In: Proceedings of the second international conference on human language technology research. Morgan Kaufmann Publishers Inc., San Francisco, pp 59–62
Ling X, Mei Q, Zhai C, Schatz B (2008) Mining multi-faceted overviews of arbitrary topics in a text collection. In: KDD ’08: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 497–505
Mei Q, Zhai C (2005) Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 198–207
Nallapati R, Feng A, Peng F, Allan J (2004) Event threading within news topics. In: CIKM ’04: Proceedings of the thirteenth ACM international conference on Information and knowledge management. ACM, New York, pp 446–453
Navigli R, Velardi P (2004) Learning domain ontologies from document warehouses and dedicated web sites. Comput Linguist 30(2): 151–179
Article Google Scholar
Nijssen S, Kok JN (2004) A quickstart in frequent structure mining can make a difference. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, pp 647–652
Oka M, Abe H, Kato K (2006) Extracting topics from weblogs through frequency segments. In: Proc. of WWW2006 3rd annual workshop on the weblogging ecosystem http://www.blogpulse.com/www2006-workshop/papers/wwe2006-oka.pdf
OneStat.com (2004) Most people use 2 word phrases in search engines according to onestat.com. http://www.onestat.com/html/aboutus_pressbox27.html
Rozenfeld B, Feldman R (2008) Self-supervised relation extraction from the web. Knowl Inf Syst 17(1): 17–33
Article Google Scholar
Schult R, Spiliopoulou M (2006) Discovering emerging topics in unlabelled text collections. In: Advances in databases and information systems, 10th east european conference, ADBIS 2006. Lecture notes in computer science, vol 4152. Springer, Heidelberg, pp 353–366
Smith DA (2002) Detecting and browsing events in unstructured text. In: Proceedings of the 25th annual ACM SIGIR conference. VLDB Endowment, pp 73–80
Subašić I, Berendt B (2008) Web mining for understanding stories through graph visualisation. In: Proceedings of the 2008 IEEE international conference on data mining (ICDM 2008). IEEE Computer Society Press, Los Alamitos, pp 570–579
Thelwall M (2006) Blogs during the london attacks: Top information sources and topics. In: Proc. of WWW2006 WS Weblogging Ecosystem. http://www.blogpulse.com/www2006-workshop/papers/blogs-during-london-at tacks.pdf
Ussery B (2008) Google—average number of words per query have increased!. http://www.beussery.com/blog/index.php/2008/02/google-average-number-of-words-per-query-have-increased/
Wang P, Hu J, Zeng H-J, Chen Z (2009) Using Wikipedia knowledge to improve text classification. Knowl Inf Syst 19(3): 265–394
Article Google Scholar
Wang S-C, Tanaka Y (2006) Topic-oriented query expansion for web search. In: WWW ’06: Proceedings of the 15th international conference on World Wide Web. ACM, New York, pp 1029–1030
Wang X, McCallum A (2006) Topics over time: a non-markov continuous-time model of topical trends. In: KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 424–433
Ware C (2004) Information visualization: perception for design. Morgan Kaufmann, San Francisco
Google Scholar
Ware C, Bobrow R (2004) Motion to support rapid interactive queries on node–link diagrams. ACM Trans Appl Percept 1(1): 3–18
Article Google Scholar
Wei F, Li W, Lu Q, He Y (2009) A document-sensitive graph model for multi-document summarization. Knowl Inf Syst. doi:10.1007/s10115-009-0194-2
Wikipedia (2008) Disappearance of Madeleine McCann http://en.wikipedia.org/w/index.php?title=Disappearance_of_Madeleine_McCann&oldid=224183687
Wikipedia (2008) Disappearance of Madeleine McCann http://en.wikipedia.org/w/index.php?title=Disappearance_of_Madeleine_McCann&oldid=215814790
Wong PC, Cowley W, Foote H, Jurrus E, Thomas J (2000) Visualizing sequential patterns for text mining. In: Proceedings of the IEEE symposium on information visualization (InfoVis’00), pp 105–111
Xu J, Croft WB (1996) Query expansion using local and global document analysis. In SIGIR ’96: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 4–11
Xu J, Croft WB (2000) Improving the effectiveness of information retrieval with local context analysis. ACM Trans Inf Syst 18(1): 79–112
Article Google Scholar
Zamir O, Etzioni O (1998) Web document clustering: a feasibility demonstration. In: SIGIR ’98: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 46–54
Zeng H-J, He Q-C, Chen Z, Ma W-Y, Ma J (2004) Learning to cluster web search results. In: SIGIR ’04: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 210–217

Download references

Author information

Authors and Affiliations

Department of Computer Science, Katholieke Universiteit Leuven, Leuven-Heverlee, Belgium
Ilija Subašić & Bettina Berendt

Authors

Ilija Subašić
View author publications
You can also search for this author in PubMed Google Scholar
Bettina Berendt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bettina Berendt.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Subašić, I., Berendt, B. Discovery of interactive graphs for understanding and searching time-indexed corpora. Knowl Inf Syst 23, 293–319 (2010). https://doi.org/10.1007/s10115-009-0227-x

Download citation

Received: 15 January 2009
Revised: 07 April 2009
Accepted: 09 May 2009
Published: 17 July 2009
Issue Date: June 2010
DOI: https://doi.org/10.1007/s10115-009-0227-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discovery of interactive graphs for understanding and searching time-indexed corpora

Abstract

Access this article

Similar content being viewed by others

Analyzing evolving stories in news articles

StoryTracker: A Semantic-Oriented Tool for Automatic Tracking Events by Web Documents

A framework for intelligence analysis using spatio-temporal storytelling

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Discovery of interactive graphs for understanding and searching time-indexed corpora

Abstract

Access this article

Similar content being viewed by others

Analyzing evolving stories in news articles

StoryTracker: A Semantic-Oriented Tool for Automatic Tracking Events by Web Documents

A framework for intelligence analysis using spatio-temporal storytelling

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation