Extracting Temporal Equivalence Relationships among Keywords from Time-Stamped Documents

Chundi, Parvathi; Subramaniam, Mahadevan; Weerakoon, R. M. Aruna

doi:10.1007/978-3-642-23088-2_8

Parvathi Chundi²⁰,
Mahadevan Subramaniam²⁰ &
R. M. Aruna Weerakoon²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6860))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1244 Accesses

Abstract

Identifying keyword associations from text and search sources is often used to facilitate many tasks such as understanding relationships among concepts, extracting relevant documents, matching advertisements to web pages, expanding user queries, etc. However, these keyword associations change as the underlying content changes with time. Two keywords that are associated with each other during one time period may not be associated in another time period or the context under which these keywords are associated may be different. In this paper, we define an equivalence relationship among a pair of keywords and develop methods to construct a temporal view of the equivalence relationship. Given a document set D, a keyword a is associated with a context consisting of frequently occurring keyword sets (f _s) of D in which a appears. Two keywords a and b are equivalent in D if their contexts are the same. We say that a and b are temporally equivalent in a time interval if a and b are equivalent in the documents published during that time interval. Given a time-stamped document set D published over a time period T, we define the temporal equivalence partitioning problem to construct a partitioning of the time period T into a sequence of maximal length time intervals such that in each time interval keywords a and b are either temporally equivalent or the equivalence relationship does not hold. A temporal equivalence partitioning of a document set for a given pair of keywords highlights all of the different contexts in which the given keywords are associated which can be used to generate time-varying keyword suggestions to users. We show the effectiveness of the approach by constructing the temporal equivalence partitionings of several pairs of keywords from the Multi-Domain Sentiment data set and the ICWSM 2009 Spinn3r data set.

This work was partially supported by NSF Grant IIS-0534616 and by Grant Number P20 RR16469 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT Press, Cambridge
Google Scholar
Chundi, P., Rosenkrantz, D.J.: Information Preserving Time Decompositions of Time Stamped Documents. Data Mining and Knowledge Discovery Journal (July 2006)
Google Scholar
Multi-Domain Sentiment Data Set (version 2.0), http://www.cs.jhu.edu/~mdredze/datasets/sentiment/
Bohn, C., Norvag, K.: Extracting named entities and synonyms from Wikipedia. In: IEEE Conference on Advanced Information Networking and Applications (April 2010)
Google Scholar
Cohen, A.M., Hersh, W.R., Dubay, C., Spackman, K.: Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts. BMC Bioinformatics 6, 103 (2005)
Article Google Scholar
Kanhabua, N., Norvag, K.: Exploiting Time-Based Synonyms in Searching Document Archives. In: ACM 2010 Conference on Digital Libraries (June 2010)
Google Scholar
Burton, K., Java, A., Soboroff, I.: The ICWSM 2009 Spinn3r Data Set. In: International AAAI Conference on Weblogs and Social Media (May 2009)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: International Conference on Very Large Data Bases (September 1994)
Google Scholar
Apriori Implementation, http://www2.cs.uregina.ca/~hamilton/courses/831/notes/itemsets/itemset_prog1.html
Kage, T., Sumiya, K.: A Web Search Method Based on the Temporal Relation of Query Keywords. In: 2006 International Conference on Web Information Systems Engineering (October 2006)
Google Scholar
Kage, T., Sumiya, K.: A Temporal Clustering Method forWeb Archives. In: 22nd International Conference on Data Engineering Workshops (April 2006)
Google Scholar
Jin, Y., Ishizuka, M., Matsuo, Y.: Extracting inter-firm networks from the World Wide Web using a general-purpose search engine. Online Information Review 32(2) (2008)
Google Scholar
Rybinski, H., Kryszkiewicz, M., Protaziuk, G., Kontkiewicz, A., Marcinkowska, K., Delteil, A.: Discovering Word Meanings Based on Frequent item sets. In: ECML/PKDD Workshop on Mining Complex Data (September 2007)
Google Scholar
Kiran, G.V.R., Shankar, K.R., Pudi, V.: Frequent Itemset based Hierarchical Document Clustering using Wikipedia as External Knowledge. In: Proceedings of the 14th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (September 2010)
Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley Longman Publishing Company, Amsterdam
Google Scholar
Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge
Google Scholar

Download references

Author information

Authors and Affiliations

University of Nebraska at Omaha, NE, 68182, Omaha, NE, US
Parvathi Chundi, Mahadevan Subramaniam & R. M. Aruna Weerakoon

Authors

Parvathi Chundi
View author publications
You can also search for this author in PubMed Google Scholar
Mahadevan Subramaniam
View author publications
You can also search for this author in PubMed Google Scholar
R. M. Aruna Weerakoon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut de Recherche en Informatique de Toulouse (IRIT), Paul Sabatier University, 118, route de Narbonne, 31062, Toulouse Cedex, France
Abdelkader Hameurlain
Brigham Young University, 784 TNRB, 84602, Provo, UT, USA
Stephen W. Liddle
Software Competence Center Hagenberg and Johannes-Keppler-University Linz, Softwarepark 21, 4232, Hagenberg, Austria
Klaus-Dieter Schewe
School of Information Technology and Electrical Engineering, University of Queensland, 4072, Brisbane, QLD, Australia
Xiaofang Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chundi, P., Subramaniam, M., Weerakoon, R.M.A. (2011). Extracting Temporal Equivalence Relationships among Keywords from Time-Stamped Documents. In: Hameurlain, A., Liddle, S.W., Schewe, KD., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2011. Lecture Notes in Computer Science, vol 6860. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23088-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-23088-2_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23087-5
Online ISBN: 978-3-642-23088-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics