The hardware/software balancing act for information retrieval on symmetric multiprocessors
Web search engines, such as AltaVista and Infoseek, handle tremendous loads by exploiting the parallelism implicit in their tasks and using symmetric multiprocessors to support their services. The web searching problem that they solve is a special case of the more general information retrieval (IR) problem of locating documents relevant to the information need of users. In this paper, we investigate how to exploit a symmetric multiprocessor to build high performance IR servers. Although the problem can be solved by throwing lots of CPU and disk resources at it, the important questions are how much of which hardware and what software structure is needed to effectively exploit hardware resources. We have found, to our surprise, that in some cases adding hardware degrades performance rather than improves it. We show that multiple threads are needed to fully utilize hardware resources. Our investigation is based on InQuery, a state-of-the-art full-text information retrieval engine.
Unable to display preview. Download preview PDF.
- 1.B. Cahoon and K. S. McKinley. Performance evaluation of a distributed architecture for information retrieval. In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 110–118, Zurich, Switzerland, August 1996.Google Scholar
- 3.J. P. Callan, W. B. Croft, and S. M. Harding. The INQUERY retrieval system. In Proceedings of the 3rd International Conference on Database and Expert System Applications, Valencia, Spain, September 1992.Google Scholar
- 4.W. B. Croft, R. Cook, and D. Wilder. Providing government information on the internet: Experiences with THOMAS. In The Second International Conference on the Theory and Practice of Digital Libraries, Austin, TX, June 1995.Google Scholar
- 5.InQuery. http://ciir.cs.umass.edu/info/highlights.html.Google Scholar
- 6.Zhihong Lu, Kathryn S. McKinley, and Brendon Cahoon. The hardware/software balancing act for information retrieval on symmetric multiprocessors. Technical Report TR98-25, University of Massachusetts, Amherst, 1998.Google Scholar