Predicting Indexer Performance in a Distributed Digital Library

  • Naomi Dushay
  • James C. French
  • Carl Lagoze
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1696)

Abstract

Resource discovery in a distributed digital library poses many challenges, one of which is how to choose search engines for query distribution, given a query and a set of search engines. This paper focuses on search engine performance as a criterion for search engine selection and defines two measurements of search engine performance: availability – will the search engine respond within a time limit, and response time – how quickly will the search engine respond, given that it responds at all. We predicted both of these performance characteristics with a variety of algorithms, all of which required little computation time and combined past performance data for each search engine into a succinct record. We used operational data from the NCSTRL distributed digital library to make and evaluate predictions, and we found that simple prediction methods performed as well as more complex methods and that prediction accuracy was closely related to data consistency.

References

  1. 1.
    "Information Retrieval (Z39.50): Application Service Definition and Protocol Specification," ANSI/NISO, 1995.Google Scholar
  2. 2.
    Cahoon, B. and K. McKinley, "Performance Evaluation of a Distributed Architecture for Information Retrieval," presented at ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, 1996.Google Scholar
  3. 3.
    Callan, J. P., Z. Lu, et al., "Searching Distributed Collections with Inference Networks," presented at 18th International Conference on Research and Development in Information Retrieval, Seattle, 1995.Google Scholar
  4. 4.
    Chang, C.-C. K. and H. Garcia-Molina, "Evaluating the Cost of Boolean Query Mapping," presented at ACM Digital Libraries’ 97, Philadelphia, 1997.Google Scholar
  5. 5.
    Chu, W., "Optimal File Allocation in Multiple Computer Systems," IEEE Transactions on Computers, October, 1969.Google Scholar
  6. 6.
    Davis, J. and C. Lagoze, "Dienst Protocol Version 5.0," 1997; http://www.cs.cornell.edu/lagoze/dienst/protocol5.htm.
  7. 7.
    Dushay, N., J. C. French, et al., "A Characterization Study of NCSTRL Distributed Searching," Cornell University Computer Science, Technical Report TR99-1725, January 1999.Google Scholar
  8. 8.
    Dushay, N., J. C. French, et al., "Using Query Mediators for Distributed Searching in Federated Digital Libraries," to be presented at ACM Digital Libraries’ 99, Berkeley, CA, 1999.Google Scholar
  9. 9.
    French, J. C., A. L. Powell, et al., "Comparing the Performance of Database Selection Algorithms," to be presented at ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, 1999.Google Scholar
  10. 10.
    French, J. C., A. L. Powell, et al., "Efficient Searching in Distributed Digital Libraries," presented at ACM Digital Libraries’ 98, Pittsburgh, 1998.Google Scholar
  11. 11.
    French, J. C., A. L. Powell, et al., "Evaluating Database Selection Techniques: A Testbed and Experiment," presented at ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, 1998.Google Scholar
  12. 12.
    French, J. C. and C. L. Viles, "Ensuring Retrieval Effectiveness in Distributed Digital Libraries," Journal of Visual Communication and Image Representation, 7(1), pp. 61–73, 1996.CrossRefGoogle Scholar
  13. 13.
    Gravano, L., C.-C. Chang, et al., "STARTS: Stanford Proposal for Internet Meta-Searching," presented at ACM SIGMOD International Conference on the Management of Data, 1997.Google Scholar
  14. 14.
    Gravano, L., H. Garcia-Molina, et al., "The Effectiveness of GlOSS for the Text-Database Discovery Problem," presented at ACM SIGMOD International Conference on the Management of Data, 1994.Google Scholar
  15. 15.
    Lagoze, C., "From Static to Dynamic Surrogates: Resource Discovery in the Digital Age," D-Lib Magazine, June 1997.Google Scholar
  16. 16.
    Lagoze, C., E. Shaw, et al., "Dienst Implementation Reference Manual," Cornell University Computer Science, Technical Report TR95-1514, May 1995.Google Scholar
  17. 17.
    Lasher, R. and D. Cohen, "A Format for Bibliographic Records," Internet Engineering Task Force, RFC 1807, June 1995.Google Scholar
  18. 18.
    Leiner, B. M., "The NCSTRL Approach to Open Architecture for the Confederated Digital Library," D-Lib Magazine, December 1998.Google Scholar
  19. 19.
    Roszkowski, M. and C. Lukas, "A Distributed Architecture for Resource Discovery Using Metadata," D-Lib Magazine, June 1998.Google Scholar
  20. 20.
    Vingralek, R., Y. Breitbart, et al., "Web++: A System for Fast and Reliable Web Service," to be presented at the 15th International Conference on Data Engineering, Sydney, Australia, 1999.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Naomi Dushay
    • 1
  • James C. French
    • 2
  • Carl Lagoze
    • 1
  1. 1.Dept. of Computer ScienceCornell UniversityIthacaUSA
  2. 2.Dept. of Computer ScienceUniversity of VirginiaCharlottesvilleUSA

Personalised recommendations