The MINERVA Project: Towards Collaborative Search in Digital Libraries Using Peer-to-Peer Technology

  • Matthias Bender
  • Sebastian Michel
  • Christian Zimmer
  • Gerhard Weikum
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3664)


We consider the problem of collaborative search across a large number of digital libraries and query routing strategies in a peer-to-peer (P2P) environment. Both digital libraries and users are equally viewed as peers and, thus, as part of the P2P network. Our system provides a versatile platform for a scalable search engine combining local index structures of autonomous peers with a global directory based on a distributed hash table (DHT) as an overlay network. Experiments with the MINERVA prototype testbed study the benefits and costs of P2P search for keyword queries.


Digital Library Overlay Network Distribute Hash Table Local Index Query Execution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aberer, K., Punceva, M., Hauswirth, M., Schmidt, R.: Improving data access in p2p systems. IEEE Internet Computing 6(1), 58–67 (2002)CrossRefGoogle Scholar
  2. 2.
    Alonso, G., Casati, F., Kuno, H.: Web Services - Concepts, Architectures and Applications. Springer, Heidelberg (2004)zbMATHGoogle Scholar
  3. 3.
    Bender, M., Michel, S., Weikum, G., Zimmer, C.: The minerva project: Database selection in the context of p2p search. In: BTW (2005)Google Scholar
  4. 4.
    Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)zbMATHCrossRefGoogle Scholar
  5. 5.
    Buchmann, E., Böhm, K.: How to Run Experiments with Large Peer-to-Peer Data Structures. In: Proceedings of the 18th International Parallel and Distributed Processing Symposium, Santa Fe, USA (April 2004)Google Scholar
  6. 6.
    Callan, J.: Distributed information retrieval. In: Advances in information retrieval, pp. 127–150. Kluwer Academic Publishers, Dordrecht (2000)Google Scholar
  7. 7.
    Callan, J.P., Lu, Z., Bruce Croft, W.: Searching distributed collections with inference networks. In: Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 21–28. ACM Press, New York (1995)CrossRefGoogle Scholar
  8. 8.
    Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann, San Francisco (2002)Google Scholar
  9. 9.
    Cohen, E., Fiat, A., Kaplan, H.: Associative search in peer to peer networks: Harnessing latent semantics. In: Proceedings of the IEEE INFOCOM 2003 Conference (April 2003)Google Scholar
  10. 10.
    Crespo, A., Garcia-Molina, H.: Routing indices for peer-to-peer systems. In: Proc. of the 28th Conference on Distributed Computing Systems (July 2002)Google Scholar
  11. 11.
    Crespo, A., Garcia-Molina, H.: Semantic Overlay Networks for P2P Systems. Technical report, Stanford University (October 2002)Google Scholar
  12. 12.
    Cuenca-Acuna, F.M., Peery, C., Martin, R.P., Nguyen, T.D.: PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities. Technical Report DCS-TR-487, Rutgers University (September 2002)Google Scholar
  13. 13.
    Fagin, R.: Combining fuzzy information from multiple systems. J. Comput. Syst. Sci. 58(1), 83–99 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Fuhr, N.: A decision-theoretic approach to database selection in networked IR. ACM Transactions on Information Systems 17(3), 229–249 (1999)CrossRefGoogle Scholar
  15. 15.
    Grabs, T., Böhm, K., Schek, H.-J.: Powerdb-ir: information retrieval on top of a database cluster. In: Proceedings of the tenth international conference on Information and knowledge management, pp. 411–418. ACM Press, New York (2001)CrossRefGoogle Scholar
  16. 16.
    Gravano, L., Garcia-Molina, H., Tomasic, A.: Gloss: text-source discovery over the internet. ACM Trans. Database Syst. 24(2), 229–264 (1999)CrossRefGoogle Scholar
  17. 17.
    Karger, D., Lehman, E., Leighton, T., Levine, M., Lewin, D., Panigrahy, R.: Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In: ACM Symposium on Theory of Computing, May 1997, pp. 654–663 (1997)Google Scholar
  18. 18.
    Litwin, W., Neimat, M.-A., Schneider, D.A.: Lh* – a scalable, distributed data structure. ACM Trans. Database Syst. 21(4), 480–525 (1996)CrossRefGoogle Scholar
  19. 19.
    Löser, A., Naumann, F., Siberski, W., Nejdl, W., Thaden, U.: Semantic overlay clusters within super-peer networks. In: Proceedings of the International Workshop on Databases, Information Systems and Peer-to-Peer Computing (DBISP2P 2003), pp. 33–47 (2003)Google Scholar
  20. 20.
    Lu, J., Callan, J.: Content-based retrieval in hybrid peer-to-peer networks. In: Proceedings of the twelfth international conference on Information and knowledge management, pp. 199–206. ACM Press, New York (2003)CrossRefGoogle Scholar
  21. 21.
    Melnik, S., Raghavan, S., Yang, B., Garcia-Molina, H.: Building a distributed full-text index for the web. ACM Trans. Inf. Syst. 19(3), 217–241 (2001)CrossRefGoogle Scholar
  22. 22.
    Meng, W., Yu, C.T., Liu, K.-L.: Building efficient and effective metasearch engines. ACM Computing Surveys 34(1), 48–89 (2002)CrossRefGoogle Scholar
  23. 23.
    Nottelmann, H., Fuhr, N.: Evaluating different methods of estimating retrieval quality for resource selection. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 290–297. ACM Press, New York (2003)CrossRefGoogle Scholar
  24. 24.
    Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S.: A scalable content-addressable network. In: Proceedings of ACM SIGCOMM 2001, pp. 161–172. ACM Press, New York (2001)Google Scholar
  25. 25.
    Reynolds, P., Vahdat, A.: Efficient peer-to-peer keyword searching. In: Endler, M., Schmidt, D.C. (eds.) Middleware 2003. LNCS, vol. 2672, pp. 21–40. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  26. 26.
    Rowstron, A., Druschel, P.: Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  27. 27.
    Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of the 2001 conference on applications, technologies, architectures, and protocols for computer communications, pp. 149–160. ACM Press, New York (2001)CrossRefGoogle Scholar
  28. 28.
    Suel, T., Mathur, C., Wu, J., Zhang, J., Delis, A., Kharrazi, M., Long, X., Shanmugasunderam, K.: Odissea: A peer-to-peer architecture for scalable web search and information retrieval. Technical report, Polytechnic Univ. (2003)Google Scholar
  29. 29.
    Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-peer information retrieval using self-organizing semantic overlay networks. In: Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, pp. 175–186. ACM Press, New York (2003)CrossRefGoogle Scholar
  30. 30.
    Theobald, M., Weikum, G., Schenkel, R.: Top-k query evaluation with probabilistic guarantees. In: VLDB, pp. 648–659 (2004)Google Scholar
  31. 31.
    Vingralek, R., Breitbart, Y., Weikum, G.: Snowball: Scalable storage on networks of workstations with balanced load. Distributed and Parallel Databases 6(2), 117–156 (1998)CrossRefGoogle Scholar
  32. 32.
    Wu, Z., Meng, W., Yu, C.T., Li, Z.: Towards a highly-scalable and effective metasearch engine. In: World Wide Web, pp. 386–395 (2001)Google Scholar
  33. 33.
    Yang, B., Garcia-Molina, H.: Improving search in peer-to-peer networks. In: Proceedings of the 22nd International Conference on Distributed Computing Systems (ICDCS 2002), pp. 5–14. IEEE Computer Society, Los Alamitos (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Matthias Bender
    • 1
  • Sebastian Michel
    • 1
  • Christian Zimmer
    • 1
  • Gerhard Weikum
    • 1
  1. 1.Max-Planck-Institut für InformatikSaarbrückenGermany

Personalised recommendations