Using stem rules to refine document retrieval queries

  • Ye Liu
  • Hanxiong Chen
  • Jeffrey Xu Yu
  • Nobuo Ohbo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1495)


In this paper, a data mining approach for query refinement is proposed using Association Rules (ARs) among keywords being extracted from a document database. When a query is under-specified or contains ambiguous keywords, a set of association rules will be displayed to assist the user to choose additional keywords in order to refine his/her original query. To the best of our knowledge, no reported study has discussed on how to screen the number of documents being retrieved using ARs. The issues we are concerned in this paper are as follows. First, an AR, X ⟹ Y, with high confidence will intend to show that the number of documents that contain both sets of keywords X and Y is large. Therefore, the effectiveness of using minimum support and minimum confidence to screen documents can be little. To address this issue, maximum support and maximum confidence are used. Second, a large number of rules will be stored in a rule base, and will be displayed at run time in response to a user query. In order to reduce the number of rules, in this paper, we introduce two co-related concepts: “stem rule” and “coverage”. The stem rules are the rules by which other rules can be derived. A set of keywords is said to be a coverage of a set of documents if these documents can be retrieved using the same set of keywords. A minimum coverage can reduce the number of keywords to cover a certain number of documents, and therefore can assist to reduce the number of rules to be managed. In order to demonstrate the applicability of the proposed method, we have built an interactive interface, and a mediumsized document database is maintained. The effectiveness of using ARs to screen will be addressed in this paper as well.


Information Retrieval Query Refinement Data Mining Stem Rule Keyword Coverage 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [Agra93]
    R. Agrawal, T.Imielinski and A.Swami: Mining Association Rules between Sets of Items in Large Databases. ACM SIGMOD'93, pp.207–216, Washington, DC, USA.Google Scholar
  2. [Alla95]
    J.Allan: Relevance Feedback With Too Much Data. ACM SIGIR'95, pp.337–343, Seattll, WA, USA.Google Scholar
  3. [Andr98]
    T. Andreasen, H. L. Larsen, & H. Christiansen: Term Associations and Flexible Querying. Proc. FQAS'98, International Conference on Flexible Query Answering Systems, May 13–15, 1998, Roskilde, Danmark. Lecture Notes in Artificial Intelligence, Springer-Verlag 1998 (this volume).Google Scholar
  4. [Buck95]
    C. Buckley et al. Automatic query expansion using SMART: TREC 3. In D. K. Harman ed. Overview of the 3rd Text REtrieval Conference. NIST Special Publication, 1995.Google Scholar
  5. [Chen94]
    C.M.Chen and N.Roussopoulos: Adaptive Selectivity Estimation Using Query Feedback. ACM SIGMOD'94, pp.161–172, Minneapolis, Minnesota, USA.Google Scholar
  6. [Chen97]
    H. Chen, Y. Liu & N. Ohbo: Keyword Document Retrieval by Data Mining. IPSJ SIG Notes, Vol.97(64), pp.227–232, Sapporo, Japan, 1997 (in Japanese)Google Scholar
  7. [Fayy96]
    U.Fayyad, G.Piatestsky & P.Smyth: From Data Mining to Knowledge Discovery in Databases. The 3rd Knowledge Discovery and Data Mining, pp.37–53, California, USA, 1996.Google Scholar
  8. [Han95]
    J.Han and Y.Fu: Discovery of Multiple-Level Association Rules from Large Databases. 21st VLDB, pp.420–431, Zurich, Swizerland, 1995.Google Scholar
  9. [Naga90]
    M. Nagao et al. ed. Encyclopedic Dictionary of Computer Science. ISBN4-00-080074-4, pp.215, 1990(in Japanese).Google Scholar
  10. [Peat91]
    H.J. Peat and P. Willett: The Limitations of Term Co-Occurrence Data for Data for Query Expansion in Document Retrieval Systems. Journal of The American Society for Information Science, vol.42(5), pp.378–383, 1991.CrossRefGoogle Scholar
  11. [Sava95]
    A.Savasere, E.Omiecinski and S.Navathe: An Efficient Algorithm for Mining Association Rules in Large Databases. 21st VLDB, pp.432–444, Zurich, Swizerland, 1995.Google Scholar
  12. [Salt90]
    G. Salton and C. Buckley: Improving Retrieval Performance By Relevance Feedback. Journal of The American Society for Information Science, vol.41(4), pp.288–297, 1990.CrossRefGoogle Scholar
  13. [Srik96]
    R.Srikant and R.Agrawal: Mining Quantitative Association Rules in Large Relational Tables. ACM SIGMOD'96, pp.1–12, Montreal, Canada, 1996.Google Scholar
  14. [Xu96]
    Jinxi Xu and W.Bruce Croft: Query Expansion Using Local and Global Document Analysis. ACM SIGIR '96, pp.4–11, Zurich, Switzerland, 1996.Google Scholar
  15. [Vele97]
    B. Vélez, et al: Fast and Effective Query Refinement. ACM SIGIR'97, pp.6–15, Philadelphia, PA, USA 1997.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Ye Liu
    • 1
  • Hanxiong Chen
    • 2
  • Jeffrey Xu Yu
    • 3
  • Nobuo Ohbo
    • 1
  1. 1.Institute of Electronic & Information ScienceUniversity of TsukubaTsukubaJapan
  2. 2.Tsukuba International UniversityTsuchiuraJapan
  3. 3.Department of Computer ScienceAustralian National UniversityCanberraAustralia

Personalised recommendations