Advertisement

Processing Conjunctive and Phrase Queries with the Set-Based Model

  • Bruno Pôssas
  • Nivio Ziviani
  • Berthier Ribeiro-Neto
  • Wagner MeiraJr.
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3246)

Abstract

The objective of this paper is to present an extension to the set-based model (SBM), which is an effective technique for computing term weights based on co-occurrence patterns, for processing conjunctive and phrase queries. The intuition that semantically related term occurrences often occur closer to each other is taken into consideration. The novelty is that all known approaches that account for co-occurrence patterns was initially designed for processing disjunctive (OR) queries, and our extension provides a simple, effective and efficient way to process conjunctive (AND) and phrase queries. This technique is time efficient and yet yields nice improvements in retrieval effectiveness. Experimental results show that our extension improves the average precision of the answer set for all collection evaluated, keeping computational cost small. For the TReC-8 collection, our extension led to a gain, relative to the standard vector space model, of 23.32% and 18.98% in average precision curves for conjunctive and phrase queries, respectively.

Keywords

Association Rule Mining Association Rule Vector Space Model Conjunctive Query Query Type 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Pôssas, B., Ziviani, N., Meira, W., Ribeiro-Neto, B.: Set-based model: A new approach for information retrieval. In: The 25th ACM-SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, pp. 230–237 (2002)Google Scholar
  2. 2.
    Pôssas, B., Ziviani, N., Meira, W.: Enhancing the set-based model using proximity information. In: The 9th International Symposium on String Processing and Information Retrieval, Lisbon, Portugal (2002)Google Scholar
  3. 3.
    Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD International Conference Management of Data, Washington, D.C, pp. 207–216 (1993)Google Scholar
  4. 4.
    Voorhees, E., Harman, D.: Overview of the eighth text retrieval conference (trec 8). In: The Eighth Text Retrieval Conference, National Institute of Standards and Technology, pp. 1–23 (1999)Google Scholar
  5. 5.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: The 20th International Conference on Very Large Data Bases, Santiago, Chile, pp. 487–499 (1994)Google Scholar
  6. 6.
    Zaki, M.J.: Generating non-redundant association rules. In: 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, USA, pp. 34–43 (2000)Google Scholar
  7. 7.
    Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: Proceedings of the, IEEE International Conference on Data Mining, pp. 163–170 (2001)Google Scholar
  8. 8.
    Yu, C.T., Salton, G.: Precision weighting – an effective automatic indexing method. Journal of the ACM 23(1), 76–88 (1976)zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic retrieval. Information Processing and Management 24(5), 513–523 (1988)CrossRefGoogle Scholar
  10. 10.
    Hawking, D., Craswell, N.: Overview of TREC-2001 web track. In: The Tenth Text REtrieval Conference (TREC-2001), Gaithersburg, Maryland, USA, pp. 61–67 (2001)Google Scholar
  11. 11.
    Salton, G., Lesk, M.E.: Computer evaluation of indexing and text processing. Journal of the ACM 15(1), 8–36 (1968)zbMATHCrossRefGoogle Scholar
  12. 12.
    Salton, G.: The SMART retrieval system – Experiments in automatic document processing. Prentice Hall Inc., Englewood Cliffs (1971)Google Scholar
  13. 13.
    Raghavan, V.V., Yu, C.T.: Experiments on the determination of the relationships between terms. ACM Transactions on Databases Systems 4, 240–260 (1979)CrossRefGoogle Scholar
  14. 14.
    Harper, D.J., Rijsbergen, C.J.V.: An evaluation of feedback in document retrieval using cooccurrence data. Journal of Documentation 34, 189–216 (1978)CrossRefGoogle Scholar
  15. 15.
    Salton, G., Buckley, C., Yu, C.T.: An evaluation of term dependencies models in information retrieval. In: The 5th ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 151–173 (1982)Google Scholar
  16. 16.
    Wong, S.K.M., Ziarko, W., Raghavan, V.V., Wong, P.C.N.: On modeling of information retrieval concepts in vector spaces. The ACM Transactions on Databases Systems 12(2), 299–321 (1987)CrossRefGoogle Scholar
  17. 17.
    Wong, S.K.M., Ziarko, W., Raghavan, V.V., Wong, P.C.N.: On extending the vector space model for boolean query processing. In: Proceedings of the 9th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, Pisa, Italy, September 8-10, 1986, pp. 175–185. ACM, New York (1986)CrossRefGoogle Scholar
  18. 18.
    Bollmann-Sdorra, P., Hafez, A., Raghavan, V.V.: A theoretical framework for association mining based on the boolean retrieval model. In: Data Warehousing and Knowledge Discovery: Third International Conference, Munich, Germany, pp. 21–30 (2001)Google Scholar
  19. 19.
    Ahonen-Myka, H., Heinonen, O., Klemettinen, M., Verkamo, A.: Finding co-occurring text phrases by combining sequence and frequent set discovery. In: Feldman, R. (ed.) Proceedings of 16th International Joint Conference on Artificial Intelligence IJCAI 1999 Workshop on Text Mining: Foundations, Techniques and Applications, Stockholm, Sweden, pp. 1–9 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Bruno Pôssas
    • 1
  • Nivio Ziviani
    • 1
  • Berthier Ribeiro-Neto
    • 1
  • Wagner MeiraJr.
    • 1
  1. 1.Departamento de Ciência da ComputaçãoUniversidade Federal de Minas GeraisBelo HorizonteBrazil

Personalised recommendations