Advertisement

Information Retrieval and Web Search

  • Bing LiuEmail author
Chapter
Part of the Data-Centric Systems and Applications book series (DCSA)

Abstract

Web search needs no introduction. Due to its convenience and the richness of information on the Web, searching the Web is increasingly becoming the dominant information seeking method. People make fewer and fewer trips to libraries, but more and more searches on the Web. In fact, without effective search engines and rich Web contents, writing this book would have been much harder.

Keywords

Search Engine Information Retrieval Singular Value Decomposition Query Term User Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Bibliography

  1. 1.
    Adali, S., T. Liu, and M. Magdon-Ismail. Optimal Link Bombs are Uncoordinated. In Proceedings of 1st International Workshop on Adversarial Information Retrieval on the Web, 2005.Google Scholar
  2. 2.
    Amitay, E., D. Carmel, A. Darlow, R. Lempel, and A. Soffer. The connectivity sonar: detecting site functionality by structural patterns. In Proceedings of ACM Conference on Hypertext and Hypermedia, 2003.Google Scholar
  3. 3.
    Aslam, J. and M. Montague. Models for metasearch. In Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-2001), 2001.Google Scholar
  4. 4.
    Baeza-Yates, R., C. Castillo, V. López, and C. Telefónica. PageRank increase under different collusion topologies. In Proceedings of Intl. Workshop on Adversarial Information Retrieval on the Web, 2005.Google Scholar
  5. 5.
    Baeza-Yates, R. and B. Ribeiro-Neto. Modern information retrieval. 1999: Addison-Wesley.Google Scholar
  6. 6.
    Bar-Yossef, Z. and M. Gurevich. Random sampling from a search engine'sindex. Journal of the ACM (JACM), 2008, 55(5): p. 1–74.CrossRefMathSciNetGoogle Scholar
  7. 7.
    Bar-Yossef, Z. and S. Rajagopalan. Template detection via data mining and its applications. In Proceedings of International Conference on World Wide Web (WWW-2002), 2002.Google Scholar
  8. 8.
    Bell, T., A. Moffat, C. Nevill-Manning, I. Witten, and J. Zobel. Data compression in full-text retrieval systems. Journal of the American Society for Information Science, 1993, 44(9): p. 508–531.CrossRefGoogle Scholar
  9. 9.
    Berry, M., S. Dumais, and G. O'Brien. Using linear algebra for intelligent information retrieval. SIAM review, 1995, 37(4): p. 573–595.zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Brin, S. and P. Lawrence. The anatomy of a large-scale hypertextual web search engine. Computer Networks, 1998, 30(1–7): p. 107–117.Google Scholar
  11. 11.
    Cai, D., S. Yu, J. Wen, and W. Ma. Block-based web search. In Proceedings of ACM SIGIR Research and Development in Information Retrieval (SIGIR-2004), 2004.Google Scholar
  12. 12.
    Cai, D., S. Yu, J. Wen, and W. Ma. Extracting content structure for web pages based on visual representation. In In Processings of APWeb-2003, 2003.Google Scholar
  13. 13.
    Cao, Y., J. Xu, T. Liu, H. Li, Y. Huang, and H. Hon. Adapting ranking SVM to document retrieval. In Proceedings of ACM SIGIR Research and Development in Information Retrieval (SIGIR-2006), 2006.Google Scholar
  14. 14.
    Chakrabarti, S. Mining the Web: discovering knowledge from hypertext data. 2003: Morgan Kaufmann Publishers.Google Scholar
  15. 15.
    Chakrabarti, S., K. Puniyani, and S. Das. Optimizing scoring functions and indexes for proximity search in type-annotated corpora. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.Google Scholar
  16. 16.
    Chen, S. and J. Goodman. An empirical study of smoothing techniques for language modeling, 1996: Association for Computational Linguistics.Google Scholar
  17. 17.
    Debnath, S., P. Mitra, and C. Giles. Automatic extraction of informative blocks from webpages. In Proceedings of ACM Symposium on Applied Computing, 2005.Google Scholar
  18. 18.
    Deerwester, S., S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990, 41(6): p. 391–407.CrossRefGoogle Scholar
  19. 19.
    Deng, L., X. Chai, Q. Tan, W. Ng, and D. Lee. Spying out real user preferences for metasearch engine personalization. In Proceedings of Workshop on WebKDD, 2004.Google Scholar
  20. 20.
    Elias, P. Universal codeword sets and representations of the integers. Information Theory, IEEE Transactions on, 1975, 21(2): p. 194–203.zbMATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Fetterly, D., M. Manasse, and M. Najork. Detecting phrase-level duplication on the world wide web. In Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-2005), 2005.Google Scholar
  22. 22.
    Fox, E. and J. Shaw. Combination of multiple searches. NIST Special Publications, 1994: p. 243–243.Google Scholar
  23. 23.
    Gibson, D., K. Punera, and A. Tomkins. The volume and evolution of web page templates. In Proceedings of International Conference on World Wide Web (WWW-2005), 2005.Google Scholar
  24. 24.
    Golomb, S. Run-length encoding. IEEE Transactions on Information Theory, 1966, 12(3): p. 399–401.zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Golub, G. and C. Van Loan. Matrix computations. 1996: Johns Hopkins Univ Press.Google Scholar
  26. 26.
    Grossman, D.A. and O. Frieder. Information Retrieval: Algorithms and Heuristics. 2004: Springer.Google Scholar
  27. 27.
    Gyöngyi, Z. and H. Garcia-Molina. Link spam alliances. In Proceedings of International Conference on Very Large Data Bases (VLDB-2005), 2005: VLDB Endowment.Google Scholar
  28. 28.
    Gyöngyi, Z. and H. Garcia-Molina. Web spam taxonomy. In Technical Report, Stanford University, 2004.Google Scholar
  29. 29.
    Gyöngyi, Z., H. Garcia-Molina, and J. Pedersen. Combating web spam with TrustRank. In Proceedings of International Conference on Very Large Data Bases (VLDB-2004), 2004.Google Scholar
  30. 30.
    Ho Kwok, S. and C. Yang. Searching the peer-to-peer networks: The community and their queries. Journal of the American Society for Information Science and Technology, 2004, 55(9): p. 783–793.Google Scholar
  31. 31.
    Joachims, T. Optimizing search engines using clickthrough data. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002), 2002.Google Scholar
  32. 32.
    Jones, R., B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.Google Scholar
  33. 33.
    Kelly, J. Social choice theory: An introduction. 1988: Springer-Verlag.Google Scholar
  34. 34.
    Klavans, J. and S. Muresan. DEFINDER: Rule-based methods for the extraction of medical terminology and their associated definitions from online text. In Proceedings of Conference of American Medical Informatics Association, 2000.Google Scholar
  35. 35.
    Korn, F., H. Jagadish, and C. Faloutsos. Efficiently supporting ad hoc queries in large datasets of time sequences. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD-1997), 1997.Google Scholar
  36. 36.
    Kraft, R., C. Chang, F. Maghoul, and R. Kumar. Searching with context. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.Google Scholar
  37. 37.
    Li, X., T. Phang, M. Hu, and B. Liu. Using micro information units for internet search. In Proceedings of ACM International Conference on Information and knowledge management (CIKM-2002), 2002.Google Scholar
  38. 38.
    Lin, S. and J. Ho. Discovering informative content blocks from Web documents. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002), 2002.Google Scholar
  39. 39.
    Liu, B., C. Chin, and H. Ng. Mining topic-specific concepts and definitions on the web. In Proceedings of International Conference on World Wide Web (WWW-2003), 2003.Google Scholar
  40. 40.
    McBryan, O. GENVL and WWWW: Tools for Taming the Web. In Proceedings of International Conference on World Wide Web (WWW-1994), 1994.Google Scholar
  41. 41.
    Meng, W., C. Yu, and K. Liu. Building efficient and effective metasearch engines. ACM Computing Surveys (CSUR), 2002, 34(1): p. 48–89.CrossRefGoogle Scholar
  42. 42.
    Moffat, A., R. Neal, and I. Witten. Arithmetic coding revisited. ACM Transactions on Information Systems (TOIS), 1998, 16(3): p. 256–294.CrossRefGoogle Scholar
  43. 43.
    Montague, M. and J. Aslam. Condorcet fusion for improved retrieval. In Proceedings of ACM International Conference on Information and knowledge management (CIKM-2002), 2002.Google Scholar
  44. 44.
    Ntoulas, A., M. Najork, M. Manasse, and D. Fetterly. Detecting spam web pages through content analysis. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.Google Scholar
  45. 45.
    Nuray, R. and F. Can. Automatic ranking of information retrieval systems using data fusion. Information Processing & Management, 2006, 42(3): p. 595–614.zbMATHCrossRefGoogle Scholar
  46. 46.
    Ponte, J. and W. Croft. A language modeling approach to information retrieval. In Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-1998), 1998.Google Scholar
  47. 47.
    Porter, M. An algorithm for suffix stripping. Program: electronic library and information systems, 2006, 40(3): p. 211–218.CrossRefGoogle Scholar
  48. 48.
    Qiu, F. and J. Cho. Automatic identification of user interest for personalized search. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.Google Scholar
  49. 49.
    Ramaswamy, L., A. Iyengar, L. Liu, and F. Douglis. Automatic detection of fragments in dynamically generated web pages. In Proceedings of International Conference on World Wide Web (WWW-2004), 2004.Google Scholar
  50. 50.
    Richardson, M., A. Prakash, and E. Brill. Beyond PageRank: machine learning for static ranking. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.Google Scholar
  51. 51.
    Robertson, S., S. Walker, and M. Beaulieu. Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive track. NIST Special Publications, 1999: p. 253–264.Google Scholar
  52. 52.
    Salton, G. and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 1988, 24(5): p. 513–523.CrossRefGoogle Scholar
  53. 53.
    Salton, G. and M. McGill. An Introduction to Modern Information Retrieval. 1983: McGraw-Hill.Google Scholar
  54. 54.
    Shen, X., B. Tan, and C. Zhai. Context-sensitive information retrieval using implicit feedback. In Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-2005), 2005.Google Scholar
  55. 55.
    Singhal, A. Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin, 2001, 24(4): p. 35–43.Google Scholar
  56. 56.
    Song, R., H. Liu, J. Wen, and W. Ma. Learning block importance models for web pages. In Proceedings of International Conference on World Wide Web (WWW-2004), 2004.Google Scholar
  57. 57.
    Sun, J., X. Wang, D. Shen, H. Zeng, and Z. Chen. CWS: a comparative web search system. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.Google Scholar
  58. 58.
    Williams, H. and J. Zobel. Compressing integers for fast file access. The Computer Journal, 1999, 42(3): p. 193.CrossRefGoogle Scholar
  59. 59.
    Witten, I., A. Moffat, and T. Bell. Managing gigabytes: compressing and indexing documents and images. 1999: Morgan Kaufmann Publishers.Google Scholar
  60. 60.
    Wu, B. and B. Davison. Cloaking and redirection: A preliminary study. Adversarial Information Retrieval on the Web, 2005.Google Scholar
  61. 61.
    Wu, B. and B. Davison. Identifying link farm spam pages. In Proceedings of International Conference on World Wide Web (WWW-2005), 2005.Google Scholar
  62. 62.
    Wu, B., V. Goel, and B. Davison. Topical TrustRank: Using topicality to combat web spam. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.Google Scholar
  63. 63.
    Yang, B. and G. Jeh. Retroactive answering of search queries. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.Google Scholar
  64. 64.
    Yang, C. and K. Chan. Retrieving multimedia web objects based on pagerank algorithm. In WWW’05 Poster, 2005.Google Scholar
  65. 65.
    Yi, L., B. Liu, and X. Li. Eliminating noisy information in web pages for data mining. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), 2003.Google Scholar
  66. 66.
    Yin, X. and W. Lee. Using link analysis to improve layout on mobile devices. In Proceedings of International Conference on World Wide Web (WWW-2004), 2004.Google Scholar
  67. 67.
    Yu, C. and W. Meng. Principles of database query processing for advanced applications. 1998: Morgan Kaufmann Publishers.Google Scholar
  68. 68.
    Zhai, C. Statistical Language Model for Information Retrieval. In Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-2001), 2001.Google Scholar
  69. 69.
    Zhai, C. and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems (TOIS), 2004, 22(2): p. 179–214.CrossRefGoogle Scholar
  70. 70.
    Zhao, Q., S. Hoi, T. Liu, S. Bhowmick, M. Lyu, and W. Ma. Time-dependent semantic similarity measure of queries using historical click-through data. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of Illinois, ChicagoChicagoUSA

Personalised recommendations