Which Should We Try First? Ranking Information Resources through Query Classification

  • Joshua Church
  • Amihai Motro
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7022)

Abstract

Users seeking information in distributed environments of large numbers of disparate information resources are often burdened with the task of repeating their queries for each and every resource. Invariably, some of the searched resources are more productive (yield more useful documents) than others, and it would undoubtedly be useful to try these resources first. If the environment is federated and a single search tool is used to process the query against all the disparate resources, then a similar issue arises: Which information resources should be searched first, to guarantee that useful answers are streamed to users in a timely fashion. In this paper we propose a solution that incorporates techniques from text classification, machine learning and information retrieval. Given a set of pre-classified information resources and a keyword query, our system suggests a relevance ordering of the resources. The approach has been implemented in prototype form, and initial experimentation has given promising results.

Keywords

Information Retrieval Information Resource Close Neighbor Latent Semantic Analysis User Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arguello, J., Callan, J., Diaz, F.: Classification-based resource selection. In: Proceedings of CIKM-2009, 18th ACM Conference on Information and Knowledge Management, pp. 1277–1286. ACM, New York (2009)Google Scholar
  2. 2.
    Beitzel, S.M., Jensen, E.C., Lewis, D.D., Chowdhury, A., Frieder, O.: Automatic classification of web queries using very large unlabeled query logs. ACM Transactions on Information Systems 25(2), Article 9 (April 2007)Google Scholar
  3. 3.
    Broder, A.Z., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., Zhang, T.: Robust classification of rare queries using web knowledge. In: Proceedings of SIGIR-2007, 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 231–238. ACM, New York (2007)Google Scholar
  4. 4.
    Do, C.B., Ng, A.Y.: Transfer learning for text classification. In: Advances in Neural Information Processing Systems 18, NIPS (2005)Google Scholar
  5. 5.
    Dumais, S.T.: Latent semantic indexing (LSI) and TREC-2. In: Proceedings of the Second Text Retrieval Conference, pp. 105–116. National Institute of Standards and Technology (NIST), Special Publication 500-215 (1993)Google Scholar
  6. 6.
    Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2006)CrossRefGoogle Scholar
  7. 7.
    Furnas, G.W., Deerwester, S.C., Dumais, S.T., Landauer, T.K., Harshman, R.A., Streeter, L.A., Lochbaum, K.E.: Information retrieval using a singular value decomposition model of latent semantic structure. In: Proceedings of SIGIR-1988, 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 465–480. ACM, New York (1988)Google Scholar
  8. 8.
    Geng, X., Liu, T.-Y., Qin, T., Arnold, A., Li, H., Shum, H.-Y.: Query dependent ranking using k-nearest neighbor. In: Proceedings of SIGIR-2008, 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 115–122. ACM, New York (2008)Google Scholar
  9. 9.
    Li, Y., Zheng, Z., Dai, H.K.: KDD CUP 2005 Report: facing a great challenge. SIGKDD Explorations Newsletter 7(2), 91–99 (2005)CrossRefGoogle Scholar
  10. 10.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRefMATHGoogle Scholar
  11. 11.
    Motro, A.: VAGUE: A user interface to relational databases that permits vague queries. ACM Transactions on Information Systems 6(3), 187–214 (1988)CrossRefGoogle Scholar
  12. 12.
    Shen, D., Pan, R., Sun, J.-T., Pan, J.J., Wu, K., Yin, J., Yang, Q.: Q2C@UST: Our winning solution to query classification in KDD CUP 2005. SIGKDD Explorations Newsletter 7(2), 100–110 (2005)CrossRefGoogle Scholar
  13. 13.
    Shen, D., Sun, J.-T., Yang, Q., Chen, Z.: Building bridges for web query classification. In: Proceedings of SIGIR-2006, 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 131–138. ACM, New York (2006)Google Scholar
  14. 14.
    Taylor, M.E., Stone, P.: Cross-domain transfer for reinforcement learning. In: Proceedings of ICML-2007, 24th International Conference on Machine Learning, pp. 879–886. ACM, New York (2007)Google Scholar
  15. 15.
    Vogel, D., Bickel, S., Haider, P., Schimpfky, R., Siemen, P., Bridges, S., Scheffer, T.: Classifying search engine queries using the web as background knowledge. ACM SIGKDD Explorations Newsletter 7(2), 117–122 (2005)CrossRefGoogle Scholar
  16. 16.
    Yu, B., Li, G., Sollins, K., Tung, A.K.H.: Effective keyword-based selection of relational databases. In: Proceedings of SIGMOD-2007, the 2007 ACM SIGMOD International Conference on Management of Data, pp. 139–150. ACM, New York (2007)CrossRefGoogle Scholar
  17. 17.
    Zelikovitz, S., Hirsh, H.: Using LSI for text classification in the presence of background text. In: Proceedings of CIKM-2001, 10th ACM International Conference on Information and Knowledge Management, pp. 113–118. ACM, New York (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Joshua Church
    • 1
  • Amihai Motro
    • 1
  1. 1.Department of Computer ScienceGeorge Mason UniversityFairfaxUSA

Personalised recommendations