A Peer-to-Peer Architecture for Information Retrieval Across Digital Library Collections

  • Ivana Podnar
  • Toan Luu
  • Martin Rajman
  • Fabius Klemm
  • Karl Aberer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4172)


Peer-to-peer networks have been identified as promising architectural concept for developing search scenarios across digital library collections. Digital libraries typically offer sophisticated search over their local content, however, search methods involving a network of such stand-alone components are currently quite limited. We present an architecture for highly-efficient search over digital library collections based on structured P2P networks. As the standard single-term indexing strategy faces significant scalability limitations in distributed environments, we propose a novel indexing strategy–key-based indexing. The keys are term sets that appear in a restricted number of collection documents. Thus, they are discriminative with respect to the global document collection, and ensure scalable search costs. Moreover, key-based indexing computes posting list joins during indexing time, which significantly improves query performance. As search efficient solutions usually imply costly indexing procedures, we present experimental results that show acceptable indexing costs while the retrieval performance is comparable to the standard centralized solutions with TF-IDF ranking.


Information Retrieval Retrieval Performance Document Frequency Indexing Strategy Global Collection 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lu, J., Callan, J.: Federated search of text-based digital libraries in hierarchical peer-to-peer networks. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 52–66. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  2. 2.
    Balke, W.T., Nejdl, W., Siberski, W., Thaden, U.: DL Meets P2P - Distributed Document Retrieval Based on Classification and Content. In: Rauber, A., Christodoulakis, S., Tjoa, A.M. (eds.) ECDL 2005. LNCS, vol. 3652, pp. 379–390. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  3. 3.
    Li, J., Loo, B., Hellerstein, J., Kaashoek, F., Karger, D., Morris, R.: The feasibility of peer-to- peer web indexing and search. In: Kaashoek, M.F., Stoica, I. (eds.) IPTPS 2003. LNCS, vol. 2735, pp. 207–215. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  4. 4.
    Buntine, W., Aberer, K., Podnar, I., Rajman, M.: Opportunities from open source search. In: Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 2–8 (2005)Google Scholar
  5. 5.
    Lv, Q., Cao, P., Cohen, E., Li, K., Shenker, S.: Search and replication in unstructured peer-to- peer networks. In: 16th International Conference on Supercomputing, pp. 84–95 (2002)Google Scholar
  6. 6.
    Lu, J., Callan, J.: Content-based retrieval in hybrid peer-to-peer networks. In: Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM), pp. 199–206 (2003)Google Scholar
  7. 7.
    Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable contentaddressable network. In: SIGCOMM 2001, pp. 161–172 (2001)Google Scholar
  8. 8.
    Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to- peer lookup service for internet applications. In: SIGCOMM 2001, pp. 149–160 (2001)Google Scholar
  9. 9.
    Aberer, K.: P-Grid: A self-organizing access structure for P2P information systems. In: Batini, C., Giunchiglia, F., Giorgini, P., Mecella, M. (eds.) CoopIS 2001. LNCS, vol. 2172, pp. 179–194. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  10. 10.
    Aberer, K., Alima, L.O., Ghodsi, A., Girdzijauskas, S., Haridi, S., Hauswirth, M.: The Essence of P2P: A Reference Architecture for Overlay Networks. In: Fifth IEEE International Conference on Peer-to-Peer Computing, pp. 11–20 (2005)Google Scholar
  11. 11.
    Reynolds, P., Vahdat, A.: Efficient Peer-to-Peer Keyword Searching. In: Middleware 2003 (2003)Google Scholar
  12. 12.
    Salton, G., Yang, C.: On the specification of term values in automatic indexing. Journal of Documentation 4, 351–372 (1973)CrossRefGoogle Scholar
  13. 13.
    Salton, G., Allan, J., Buckley, C.: Approaches to Passage Retrieval in Full Text Information Systems. In: SIGIR 1993, pp. 49–58 (1993)Google Scholar
  14. 14.
    Pôssas, B., Ziviani, N., Ribeiro-Neto, B., Wagner Meira, J.: Maximal termsets as a query structuring mechanism. In: CIKM 2005, pp. 287–288 (2005)Google Scholar
  15. 15.
    Rajman, M., Bonnet, A.: Corpora-Base Linguistics: New Tools for Natural Language Processing. In: 1st Annual Conference of Association for Global Strategic Information (1992)Google Scholar
  16. 16.
    Aberer, K., Klemm, F., Rajman, M., Wu, J.: An Architecture for Peer-to-Peer Information Retrieval. In: SIGIR 2004, Workshop on Peer-to-Peer Information Retrieval (2004)Google Scholar
  17. 17.
    Cuenca-Acuna, F.M., Peery, C., Martin, R.P., Nguyen, T.D.: PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities. In: 12th IEEE International Symposium on High Performance Distributed Computing (HPDC-12), pp. 236–246. IEEE Press, Los Alamitos (2003)CrossRefGoogle Scholar
  18. 18.
    Bender, M., Michel, S., Triantafillou, P., Weikum, G., Zimmer, C.: Improving collection selection with overlap awareness in P2P search engines. In: SIGIR 2005: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 67–74 (2005)Google Scholar
  19. 19.
    Balke, W., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to- peer networks. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), pp. 174–185 (2005)Google Scholar
  20. 20.
    Michel, S., Triantafillou, P., Weikum, G.: KLEE: a framework for distributed top-k query algorithms. In: VLDB 2005, pp. 637–648 (2005)Google Scholar
  21. 21.
    Pôssas, B., Ziviani, N., Wagner Meira, J., Ribeiro-Neto, B.: Set-based vector model: An efficient approach for correlation-based ranking. ACM Trans. Inf. Syst. 23, 397–429 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ivana Podnar
    • 1
  • Toan Luu
    • 1
  • Martin Rajman
    • 1
  • Fabius Klemm
    • 1
  • Karl Aberer
    • 1
  1. 1.School of Computer and Communication SciencesEcole Polytechnique Fédérale de Lausanne (EPFL)LausanneSwitzerland

Personalised recommendations