Skip to main content

Extracting Temporal Equivalence Relationships among Keywords from Time-Stamped Documents

  • Conference paper
Database and Expert Systems Applications (DEXA 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6860))

Included in the following conference series:

  • 1244 Accesses

Abstract

Identifying keyword associations from text and search sources is often used to facilitate many tasks such as understanding relationships among concepts, extracting relevant documents, matching advertisements to web pages, expanding user queries, etc. However, these keyword associations change as the underlying content changes with time. Two keywords that are associated with each other during one time period may not be associated in another time period or the context under which these keywords are associated may be different. In this paper, we define an equivalence relationship among a pair of keywords and develop methods to construct a temporal view of the equivalence relationship. Given a document set D, a keyword a is associated with a context consisting of frequently occurring keyword sets (f s ) of D in which a appears. Two keywords a and b are equivalent in D if their contexts are the same. We say that a and b are temporally equivalent in a time interval if a and b are equivalent in the documents published during that time interval. Given a time-stamped document set D published over a time period T, we define the temporal equivalence partitioning problem to construct a partitioning of the time period T into a sequence of maximal length time intervals such that in each time interval keywords a and b are either temporally equivalent or the equivalence relationship does not hold. A temporal equivalence partitioning of a document set for a given pair of keywords highlights all of the different contexts in which the given keywords are associated which can be used to generate time-varying keyword suggestions to users. We show the effectiveness of the approach by constructing the temporal equivalence partitionings of several pairs of keywords from the Multi-Domain Sentiment data set and the ICWSM 2009 Spinn3r data set.

This work was partially supported by NSF Grant IIS-0534616 and by Grant Number P20 RR16469 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT Press, Cambridge

    Google Scholar 

  2. Chundi, P., Rosenkrantz, D.J.: Information Preserving Time Decompositions of Time Stamped Documents. Data Mining and Knowledge Discovery Journal (July 2006)

    Google Scholar 

  3. Multi-Domain Sentiment Data Set (version 2.0), http://www.cs.jhu.edu/~mdredze/datasets/sentiment/

  4. Bohn, C., Norvag, K.: Extracting named entities and synonyms from Wikipedia. In: IEEE Conference on Advanced Information Networking and Applications (April 2010)

    Google Scholar 

  5. Cohen, A.M., Hersh, W.R., Dubay, C., Spackman, K.: Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts. BMC Bioinformatics 6, 103 (2005)

    Article  Google Scholar 

  6. Kanhabua, N., Norvag, K.: Exploiting Time-Based Synonyms in Searching Document Archives. In: ACM 2010 Conference on Digital Libraries (June 2010)

    Google Scholar 

  7. Burton, K., Java, A., Soboroff, I.: The ICWSM 2009 Spinn3r Data Set. In: International AAAI Conference on Weblogs and Social Media (May 2009)

    Google Scholar 

  8. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: International Conference on Very Large Data Bases (September 1994)

    Google Scholar 

  9. Apriori Implementation, http://www2.cs.uregina.ca/~hamilton/courses/831/notes/itemsets/itemset_prog1.html

  10. Kage, T., Sumiya, K.: A Web Search Method Based on the Temporal Relation of Query Keywords. In: 2006 International Conference on Web Information Systems Engineering (October 2006)

    Google Scholar 

  11. Kage, T., Sumiya, K.: A Temporal Clustering Method forWeb Archives. In: 22nd International Conference on Data Engineering Workshops (April 2006)

    Google Scholar 

  12. Jin, Y., Ishizuka, M., Matsuo, Y.: Extracting inter-firm networks from the World Wide Web using a general-purpose search engine. Online Information Review 32(2) (2008)

    Google Scholar 

  13. Rybinski, H., Kryszkiewicz, M., Protaziuk, G., Kontkiewicz, A., Marcinkowska, K., Delteil, A.: Discovering Word Meanings Based on Frequent item sets. In: ECML/PKDD Workshop on Mining Complex Data (September 2007)

    Google Scholar 

  14. Kiran, G.V.R., Shankar, K.R., Pudi, V.: Frequent Itemset based Hierarchical Document Clustering using Wikipedia as External Knowledge. In: Proceedings of the 14th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (September 2010)

    Google Scholar 

  15. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley Longman Publishing Company, Amsterdam

    Google Scholar 

  16. Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chundi, P., Subramaniam, M., Weerakoon, R.M.A. (2011). Extracting Temporal Equivalence Relationships among Keywords from Time-Stamped Documents. In: Hameurlain, A., Liddle, S.W., Schewe, KD., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2011. Lecture Notes in Computer Science, vol 6860. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23088-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23088-2_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23087-5

  • Online ISBN: 978-3-642-23088-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics