Predicting Indexer Performance in a Distributed Digital Library
Resource discovery in a distributed digital library poses many challenges, one of which is how to choose search engines for query distribution, given a query and a set of search engines. This paper focuses on search engine performance as a criterion for search engine selection and defines two measurements of search engine performance: availability – will the search engine respond within a time limit, and response time – how quickly will the search engine respond, given that it responds at all. We predicted both of these performance characteristics with a variety of algorithms, all of which required little computation time and combined past performance data for each search engine into a succinct record. We used operational data from the NCSTRL distributed digital library to make and evaluate predictions, and we found that simple prediction methods performed as well as more complex methods and that prediction accuracy was closely related to data consistency.
Unable to display preview. Download preview PDF.
- 1."Information Retrieval (Z39.50): Application Service Definition and Protocol Specification," ANSI/NISO, 1995.Google Scholar
- 2.Cahoon, B. and K. McKinley, "Performance Evaluation of a Distributed Architecture for Information Retrieval," presented at ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, 1996.Google Scholar
- 3.Callan, J. P., Z. Lu, et al., "Searching Distributed Collections with Inference Networks," presented at 18th International Conference on Research and Development in Information Retrieval, Seattle, 1995.Google Scholar
- 4.Chang, C.-C. K. and H. Garcia-Molina, "Evaluating the Cost of Boolean Query Mapping," presented at ACM Digital Libraries’ 97, Philadelphia, 1997.Google Scholar
- 5.Chu, W., "Optimal File Allocation in Multiple Computer Systems," IEEE Transactions on Computers, October, 1969.Google Scholar
- 6.Davis, J. and C. Lagoze, "Dienst Protocol Version 5.0," 1997; http://www.cs.cornell.edu/lagoze/dienst/protocol5.htm.
- 7.Dushay, N., J. C. French, et al., "A Characterization Study of NCSTRL Distributed Searching," Cornell University Computer Science, Technical Report TR99-1725, January 1999.Google Scholar
- 8.Dushay, N., J. C. French, et al., "Using Query Mediators for Distributed Searching in Federated Digital Libraries," to be presented at ACM Digital Libraries’ 99, Berkeley, CA, 1999.Google Scholar
- 9.French, J. C., A. L. Powell, et al., "Comparing the Performance of Database Selection Algorithms," to be presented at ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, 1999.Google Scholar
- 10.French, J. C., A. L. Powell, et al., "Efficient Searching in Distributed Digital Libraries," presented at ACM Digital Libraries’ 98, Pittsburgh, 1998.Google Scholar
- 11.French, J. C., A. L. Powell, et al., "Evaluating Database Selection Techniques: A Testbed and Experiment," presented at ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, 1998.Google Scholar
- 13.Gravano, L., C.-C. Chang, et al., "STARTS: Stanford Proposal for Internet Meta-Searching," presented at ACM SIGMOD International Conference on the Management of Data, 1997.Google Scholar
- 14.Gravano, L., H. Garcia-Molina, et al., "The Effectiveness of GlOSS for the Text-Database Discovery Problem," presented at ACM SIGMOD International Conference on the Management of Data, 1994.Google Scholar
- 15.Lagoze, C., "From Static to Dynamic Surrogates: Resource Discovery in the Digital Age," D-Lib Magazine, June 1997.Google Scholar
- 16.Lagoze, C., E. Shaw, et al., "Dienst Implementation Reference Manual," Cornell University Computer Science, Technical Report TR95-1514, May 1995.Google Scholar
- 17.Lasher, R. and D. Cohen, "A Format for Bibliographic Records," Internet Engineering Task Force, RFC 1807, June 1995.Google Scholar
- 18.Leiner, B. M., "The NCSTRL Approach to Open Architecture for the Confederated Digital Library," D-Lib Magazine, December 1998.Google Scholar
- 19.Roszkowski, M. and C. Lukas, "A Distributed Architecture for Resource Discovery Using Metadata," D-Lib Magazine, June 1998.Google Scholar
- 20.Vingralek, R., Y. Breitbart, et al., "Web++: A System for Fast and Reliable Web Service," to be presented at the 15th International Conference on Data Engineering, Sydney, Australia, 1999.Google Scholar