Abstract
Identifying keyword associations from text and search sources is often used to facilitate many tasks such as understanding relationships among concepts, extracting relevant documents, matching advertisements to web pages, expanding user queries, etc. However, these keyword associations change as the underlying content changes with time. Two keywords that are associated with each other during one time period may not be associated in another time period or the context under which these keywords are associated may be different. In this paper, we define an equivalence relationship among a pair of keywords and develop methods to construct a temporal view of the equivalence relationship. Given a document set D, a keyword a is associated with a context consisting of frequently occurring keyword sets (f s ) of D in which a appears. Two keywords a and b are equivalent in D if their contexts are the same. We say that a and b are temporally equivalent in a time interval if a and b are equivalent in the documents published during that time interval. Given a time-stamped document set D published over a time period T, we define the temporal equivalence partitioning problem to construct a partitioning of the time period T into a sequence of maximal length time intervals such that in each time interval keywords a and b are either temporally equivalent or the equivalence relationship does not hold. A temporal equivalence partitioning of a document set for a given pair of keywords highlights all of the different contexts in which the given keywords are associated which can be used to generate time-varying keyword suggestions to users. We show the effectiveness of the approach by constructing the temporal equivalence partitionings of several pairs of keywords from the Multi-Domain Sentiment data set and the ICWSM 2009 Spinn3r data set.
This work was partially supported by NSF Grant IIS-0534616 and by Grant Number P20 RR16469 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT Press, Cambridge
Chundi, P., Rosenkrantz, D.J.: Information Preserving Time Decompositions of Time Stamped Documents. Data Mining and Knowledge Discovery Journal (July 2006)
Multi-Domain Sentiment Data Set (version 2.0), http://www.cs.jhu.edu/~mdredze/datasets/sentiment/
Bohn, C., Norvag, K.: Extracting named entities and synonyms from Wikipedia. In: IEEE Conference on Advanced Information Networking and Applications (April 2010)
Cohen, A.M., Hersh, W.R., Dubay, C., Spackman, K.: Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts. BMC Bioinformatics 6, 103 (2005)
Kanhabua, N., Norvag, K.: Exploiting Time-Based Synonyms in Searching Document Archives. In: ACM 2010 Conference on Digital Libraries (June 2010)
Burton, K., Java, A., Soboroff, I.: The ICWSM 2009 Spinn3r Data Set. In: International AAAI Conference on Weblogs and Social Media (May 2009)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: International Conference on Very Large Data Bases (September 1994)
Apriori Implementation, http://www2.cs.uregina.ca/~hamilton/courses/831/notes/itemsets/itemset_prog1.html
Kage, T., Sumiya, K.: A Web Search Method Based on the Temporal Relation of Query Keywords. In: 2006 International Conference on Web Information Systems Engineering (October 2006)
Kage, T., Sumiya, K.: A Temporal Clustering Method forWeb Archives. In: 22nd International Conference on Data Engineering Workshops (April 2006)
Jin, Y., Ishizuka, M., Matsuo, Y.: Extracting inter-firm networks from the World Wide Web using a general-purpose search engine. Online Information Review 32(2) (2008)
Rybinski, H., Kryszkiewicz, M., Protaziuk, G., Kontkiewicz, A., Marcinkowska, K., Delteil, A.: Discovering Word Meanings Based on Frequent item sets. In: ECML/PKDD Workshop on Mining Complex Data (September 2007)
Kiran, G.V.R., Shankar, K.R., Pudi, V.: Frequent Itemset based Hierarchical Document Clustering using Wikipedia as External Knowledge. In: Proceedings of the 14th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (September 2010)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley Longman Publishing Company, Amsterdam
Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chundi, P., Subramaniam, M., Weerakoon, R.M.A. (2011). Extracting Temporal Equivalence Relationships among Keywords from Time-Stamped Documents. In: Hameurlain, A., Liddle, S.W., Schewe, KD., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2011. Lecture Notes in Computer Science, vol 6860. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23088-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-23088-2_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23087-5
Online ISBN: 978-3-642-23088-2
eBook Packages: Computer ScienceComputer Science (R0)