Processing Conjunctive and Phrase Queries with the Set-Based Model

Pôssas, Bruno; Ziviani, Nivio; Ribeiro-Neto, Berthier; Meira, Wagner

doi:10.1007/978-3-540-30213-1_25

Bruno Pôssas¹⁸,
Nivio Ziviani¹⁸,
Berthier Ribeiro-Neto¹⁸ &
…
Wagner Meira Jr.¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3246))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

708 Accesses
3 Citations

Abstract

The objective of this paper is to present an extension to the set-based model (SBM), which is an effective technique for computing term weights based on co-occurrence patterns, for processing conjunctive and phrase queries. The intuition that semantically related term occurrences often occur closer to each other is taken into consideration. The novelty is that all known approaches that account for co-occurrence patterns was initially designed for processing disjunctive (OR) queries, and our extension provides a simple, effective and efficient way to process conjunctive (AND) and phrase queries. This technique is time efficient and yet yields nice improvements in retrieval effectiveness. Experimental results show that our extension improves the average precision of the answer set for all collection evaluated, keeping computational cost small. For the TReC-8 collection, our extension led to a gain, relative to the standard vector space model, of 23.32% and 18.98% in average precision curves for conjunctive and phrase queries, respectively.

This work was supported in part by the GERINDO project-grant MCT/CNPq/CT-INFO 552.087/02-5 and by CNPq grant 520.916/94-8 (Nivio Ziviani).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Pôssas, B., Ziviani, N., Meira, W., Ribeiro-Neto, B.: Set-based model: A new approach for information retrieval. In: The 25th ACM-SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, pp. 230–237 (2002)
Google Scholar
Pôssas, B., Ziviani, N., Meira, W.: Enhancing the set-based model using proximity information. In: The 9th International Symposium on String Processing and Information Retrieval, Lisbon, Portugal (2002)
Google Scholar
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD International Conference Management of Data, Washington, D.C, pp. 207–216 (1993)
Google Scholar
Voorhees, E., Harman, D.: Overview of the eighth text retrieval conference (trec 8). In: The Eighth Text Retrieval Conference, National Institute of Standards and Technology, pp. 1–23 (1999)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: The 20th International Conference on Very Large Data Bases, Santiago, Chile, pp. 487–499 (1994)
Google Scholar
Zaki, M.J.: Generating non-redundant association rules. In: 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, USA, pp. 34–43 (2000)
Google Scholar
Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: Proceedings of the, IEEE International Conference on Data Mining, pp. 163–170 (2001)
Google Scholar
Yu, C.T., Salton, G.: Precision weighting – an effective automatic indexing method. Journal of the ACM 23(1), 76–88 (1976)
Article MATH MathSciNet Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic retrieval. Information Processing and Management 24(5), 513–523 (1988)
Article Google Scholar
Hawking, D., Craswell, N.: Overview of TREC-2001 web track. In: The Tenth Text REtrieval Conference (TREC-2001), Gaithersburg, Maryland, USA, pp. 61–67 (2001)
Google Scholar
Salton, G., Lesk, M.E.: Computer evaluation of indexing and text processing. Journal of the ACM 15(1), 8–36 (1968)
Article MATH Google Scholar
Salton, G.: The SMART retrieval system – Experiments in automatic document processing. Prentice Hall Inc., Englewood Cliffs (1971)
Google Scholar
Raghavan, V.V., Yu, C.T.: Experiments on the determination of the relationships between terms. ACM Transactions on Databases Systems 4, 240–260 (1979)
Article Google Scholar
Harper, D.J., Rijsbergen, C.J.V.: An evaluation of feedback in document retrieval using cooccurrence data. Journal of Documentation 34, 189–216 (1978)
Article Google Scholar
Salton, G., Buckley, C., Yu, C.T.: An evaluation of term dependencies models in information retrieval. In: The 5th ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 151–173 (1982)
Google Scholar
Wong, S.K.M., Ziarko, W., Raghavan, V.V., Wong, P.C.N.: On modeling of information retrieval concepts in vector spaces. The ACM Transactions on Databases Systems 12(2), 299–321 (1987)
Article Google Scholar
Wong, S.K.M., Ziarko, W., Raghavan, V.V., Wong, P.C.N.: On extending the vector space model for boolean query processing. In: Proceedings of the 9th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, Pisa, Italy, September 8-10, 1986, pp. 175–185. ACM, New York (1986)
Chapter Google Scholar
Bollmann-Sdorra, P., Hafez, A., Raghavan, V.V.: A theoretical framework for association mining based on the boolean retrieval model. In: Data Warehousing and Knowledge Discovery: Third International Conference, Munich, Germany, pp. 21–30 (2001)
Google Scholar
Ahonen-Myka, H., Heinonen, O., Klemettinen, M., Verkamo, A.: Finding co-occurring text phrases by combining sequence and frequent set discovery. In: Feldman, R. (ed.) Proceedings of 16th International Joint Conference on Artificial Intelligence IJCAI 1999 Workshop on Text Mining: Foundations, Techniques and Applications, Stockholm, Sweden, pp. 1–9 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Ciência da Computação, Universidade Federal de Minas Gerais, 30161-970, Belo Horizonte, MG, Brazil
Bruno Pôssas, Nivio Ziviani, Berthier Ribeiro-Neto & Wagner Meira Jr.

Authors

Bruno Pôssas
View author publications
You can also search for this author in PubMed Google Scholar
Nivio Ziviani
View author publications
You can also search for this author in PubMed Google Scholar
Berthier Ribeiro-Neto
View author publications
You can also search for this author in PubMed Google Scholar
Wagner Meira Jr.
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Georgia Institute of Technology and Università di Padova,
Alberto Apostolico
Department of Information Engineering, University of Padova,
Massimo Melucci

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pôssas, B., Ziviani, N., Ribeiro-Neto, B., Meira, W. (2004). Processing Conjunctive and Phrase Queries with the Set-Based Model. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_25

Download citation

DOI: https://doi.org/10.1007/978-3-540-30213-1_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23210-0
Online ISBN: 978-3-540-30213-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics