Abstract
Most of the Information Retrieval (IR) techniques are based on representing the documents using the traditional vector space and probabilistic language model i.e., bag-of- words model. In this paper, associations among words in the documents are assessed and it is expressed in Term Association Graph model to represent the document content and the relationship among the keywords. Earlier attempt on exploiting term association graph was done for non-personalized document re-ranking task. This paper experiments improved non-personalized and personalized re-ranking strategy which exploits term association graph data structure to assess the importance of a document for the user query and thus documents are re-ranked according to the association and similarity exists among the documents. This paper proposes various approaches under two models namely, Term Rank based Approach (TRA) and Path Traversal based Approaches (PTA1, PTA2, and PTA3). These approaches employ term association graph and has been evaluated using manually prepared real dataset and benchmark OHSUMED dataset. The results obtained are reasonably promising.
Similar content being viewed by others
References
Agrawal R and Srikant R 1994 Fast algorithm for mining association rules. In: Proc. 20th Intl. Conf. VLDB, 487–499, ACM
Baeza-Yates R and Ribeiro-Neto B 1999 Modern information retrieval. Addison Wesley: ACM Press
Berger A and Lafferty J 1999 Information retrieval as statistical translation, In: Proc. SIGIR, 222–229, ACM
Blanco R and Lioma C 2007 Random Walk Term Weighting for Information Retrieval. In: Proc. SIGIR, 829–830, ACM
Blanco R and Lioma C 2012 Graph-based term weighting for information retrieval. Springer Information Retrieval 15 (1): 54–92
Blondel V D, Gajardo A, Heymans M, Senellart P and Dooren P V 2004 A measure of similarity between graph vertices: Applications to synonym extraction and web searching. SIAM Rev. 46 (4): 647–666
Boldi P, Bonchi F, Castillo C, Donato D, Gionis A and Vigna S 2008 The query-flow graph: model and applications. In: Proc. ACM CIKM, pp. 609–618
Brin S and Page L 1998 The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: Proc. 7th International World Wide Web Conference (WWW), pp. 107–117
Carpineto C and Romano G 2009 ODP239 dataset, http://credo.fub.it/odp239/
Craswell N and Szummer M 2007 Random Walks on the Click Graph, In: ACM SIGIR, pp. 239–246, ACM
Croft W B and Lafferty J 2010 Language Modeling for information retrieval. Kluwer Academic Publishers, Springer Netherlands
Eirinaki M and Vazirgiannis M 2005 UPR:Usage-based page ranking for web persoanalization. In: Proc. 5th IEEE Intl. Conf. on Data Mining (ICDM), pp. 130–137
Han J and Kamber M 2006 Data mining concepts and techniques, Morgan Kaufmann publishers, Elsevier San Francisco, Second edition pp. 227–232
Haveliwala T H 2003 Topic-sensitive PageRank: A context-sensitive ranking algorithm for web search. IEEE Trans. Knowl. Data Eng. 15 (4): 784–796
Hersh W, Buckley C, Leone T J and Hickam D 1994 OHSUMED: An interactive retrieval evaluation and new large test collection for research, In: Proc. 17th Annual SIGIR Conference, pp. 192–201, ACM
Hofmann T 1999 Probabilistic latent semantic indexing, In: Proc. ACM SIGIR, pp. 50–57
Jain A and Mishne G 2010 Organizing query completions for web search. ACM Intl. Conf. on Information and Knowledge Management (CIKM): 1169–1178
Jarvelin K and Kekalainen J 2000 IR evaluation methods for retrieving highly relevant documents. In: Proc. SIGIR, pp. 41–48, ACM
Jarvelin K and Kekalainen J 2002 Cumulated Gain-Based Evaluation of IR Techniques. ACM Trans. Inf. Syst. 20 (4): 422–446
Kleinberg J M 1999 Authoritative Sources in a Hyperlinked Environment. J. ACM 46 (5): 604–632
Koopman B, Zuccon G, Bruza P, Sitbon L and Lawley M 2012 Graph-based Concept Weighting for Medical Information Retrieval. In: Proc. 17th Australasian Document Computing Symposium (ADCS), pp. 80–87
Lafferty J and Zhai C 2001 Document language models, query models, and risk minimization for information retrieval. In: Proc. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 111–119
Leicht E A, Holme P and Newman M E J 2006. Physical Review Vertex similarity in networks
Leung K W T and Lee D L 2010 Deriving Concept-based User profiles from Search Engine Logs. IEEE Trans. Knowl. Data Eng. 22 (7): 969–982
Ma H, King I and Lyu M R -T 2012 Mining Web Graphs for Recommendations. IEEE Trans. Knowl. Data Eng. 24 (6): 1051–1064
Manning C D, Raghavan P and Schutze H 2008 Introduction to Information Retrieval. Cambridge University Press London, pp. 151–168
Masucci A P and Rodgers G J 2006 Network properties of written human language. Phys. Rev. 74 (2)
Mei Q, Zhou D and Church K 2008 Query suggestion using hitting time. In: Proc. 17th ACM Conf. on Information and Knowledge Management (CIKM), pp. 469–478
Mihalcea R and Tarau P 2004 TextRank: Bringing Order into Texts. In: Proc. Empirical Methods in Natural Language Processing. Association of Computational Linguistics (ACL), pp. 404–411
Montes-y-Gomez M, López-López A and Gelbukh A 2000 Information retrieval with conceptual graph matching. In: Proc. 12th Intl. Conf. Database and Expert Systems Applications, Springer LNCS, Volume 1873, 312–321
Nastase V, Sayyad-Shirabad J, Sokolova M and Szpakowicz S 2006 Learning noun-modifier semantic relations with corpus-based and wordnet-based features. In: American Association for Artificial Intelligence, 781–786
Pado S and Lapata M 2007 Dependency-based construction of semantic space models. Comput. Linquist. 33 (2): 161–199
Page L, Brin S, Motwani R and Winograd T 1998 The PageRank citation ranking: Bringing order to the Web. Technical report, Stanford Digital Library Technologies
Ponte J M and Croft W B 1998 A language modeling approach to information retrieval, In: Proc. SIGIR, ACM 275-281
Qin T, Liu T -Y, Xu J and Li H 2010 LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval. J. Inf. Retrieval 13 (4): 346–374
Robertson S E, Walker S, Jones S, Hancock-Beaulieu M M, Gatford M, Gull A and Lau M 1992 Okapi at TREC, In: Proc. Text Retrieval Conference, pp. 21–30
Salton G and McGill M J 1986 Introduction to modern information retrieval. New York: McGraw-Hill, pp. 98–112
Scott C D., Susan T Dumais, Thomas K Landauer, George W Furnas and Richard A Harshman 1990 Indexing by Latent Semantic Analysis. J. Am. Soc. Inf. Sci. 41 (6): 391–407
Sim K M and Wong P T 2004 Toward agency and ontology for web-based information retrieval. IEEE Trans. Systems, Man, Cybern. 34 (3): 257–269
Sparck J K, Walker S and Robertson S E 2000 A Probabilistic model of Information Retrieval: Development and Comparative Experiments. Inf. Process. Manag. 36 (6): 779–808
Veningston K and Shanmugalakshmi R 2014 Information Retrieval by Document Re-ranking using Term Association Graph In: Proc. ACM Intl. Conf. on Interdisciplinary Advances in Applied Computing (ICONIAAC), Article No. 21.
Viswanathan V and Ilango K 2012 Ranking semantic relationships between two entities using personalization in context specification. Elsevier Information Sciences: 35–49
Wang W, Do D B and Lin X 2005 Term Graph Model for Text Classification, In: Proc. Lecture Note in Artificial Intelligence, 3584: 19–30, Springer
Wong S K M and Raghavan V V 1984 Vector space model of Information Retrieval: A reevaluation, In: Proc. SIGIR, ACM 167–185
Wu Z and Palmer M 1994 Verb semantics and lexical selection. In: Proc. Annual Meeting of the Association for Computational Linguistics, 133–138
Yi J and Maghoul F 2009 Query Clustering using Click-Through Graph. In: Proc. 18th Intl. Conf. on World Wide Web (WWW), 1055–1056
Zhang B, Li H, Liu Y, Ji L, Xi W, Fan W, Chen Z and Ma W -Y 2005 Improving web search results using affinity graph, In: Proc. SIGIR, ACM 504–511
Acknowledgements
The work presented in this paper was supported and funded by the Department of Science and Technology (DST), Ministry of Science and Technology, Government of India under INSPIRE scheme. Authors thank the DST and also the anonymous reviewers for their helpful comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
VENINGSTON, K., SHANMUGALAKSHMI, R. & NIRMALA, V. Semantic association ranking schemes for information retrieval applications using term association graph representation. Sadhana 40, 1793–1819 (2015). https://doi.org/10.1007/s12046-015-0413-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12046-015-0413-3