Skip to main content

A Peer-to-Peer Architecture for Information Retrieval Across Digital Library Collections

  • Conference paper
Research and Advanced Technology for Digital Libraries (ECDL 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4172))

Included in the following conference series:

Abstract

Peer-to-peer networks have been identified as promising architectural concept for developing search scenarios across digital library collections. Digital libraries typically offer sophisticated search over their local content, however, search methods involving a network of such stand-alone components are currently quite limited. We present an architecture for highly-efficient search over digital library collections based on structured P2P networks. As the standard single-term indexing strategy faces significant scalability limitations in distributed environments, we propose a novel indexing strategy–key-based indexing. The keys are term sets that appear in a restricted number of collection documents. Thus, they are discriminative with respect to the global document collection, and ensure scalable search costs. Moreover, key-based indexing computes posting list joins during indexing time, which significantly improves query performance. As search efficient solutions usually imply costly indexing procedures, we present experimental results that show acceptable indexing costs while the retrieval performance is comparable to the standard centralized solutions with TF-IDF ranking.

The work presented in this paper was carried out in the framework of the EPFL Center for Global Computing and supported by the Swiss National Funding Agency OFES as part of the European FP 6 STREP project ALVIS (002068).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lu, J., Callan, J.: Federated search of text-based digital libraries in hierarchical peer-to-peer networks. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 52–66. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  2. Balke, W.T., Nejdl, W., Siberski, W., Thaden, U.: DL Meets P2P - Distributed Document Retrieval Based on Classification and Content. In: Rauber, A., Christodoulakis, S., Tjoa, A.M. (eds.) ECDL 2005. LNCS, vol. 3652, pp. 379–390. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  3. Li, J., Loo, B., Hellerstein, J., Kaashoek, F., Karger, D., Morris, R.: The feasibility of peer-to- peer web indexing and search. In: Kaashoek, M.F., Stoica, I. (eds.) IPTPS 2003. LNCS, vol. 2735, pp. 207–215. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  4. Buntine, W., Aberer, K., Podnar, I., Rajman, M.: Opportunities from open source search. In: Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 2–8 (2005)

    Google Scholar 

  5. Lv, Q., Cao, P., Cohen, E., Li, K., Shenker, S.: Search and replication in unstructured peer-to- peer networks. In: 16th International Conference on Supercomputing, pp. 84–95 (2002)

    Google Scholar 

  6. Lu, J., Callan, J.: Content-based retrieval in hybrid peer-to-peer networks. In: Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM), pp. 199–206 (2003)

    Google Scholar 

  7. Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable contentaddressable network. In: SIGCOMM 2001, pp. 161–172 (2001)

    Google Scholar 

  8. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to- peer lookup service for internet applications. In: SIGCOMM 2001, pp. 149–160 (2001)

    Google Scholar 

  9. Aberer, K.: P-Grid: A self-organizing access structure for P2P information systems. In: Batini, C., Giunchiglia, F., Giorgini, P., Mecella, M. (eds.) CoopIS 2001. LNCS, vol. 2172, pp. 179–194. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  10. Aberer, K., Alima, L.O., Ghodsi, A., Girdzijauskas, S., Haridi, S., Hauswirth, M.: The Essence of P2P: A Reference Architecture for Overlay Networks. In: Fifth IEEE International Conference on Peer-to-Peer Computing, pp. 11–20 (2005)

    Google Scholar 

  11. Reynolds, P., Vahdat, A.: Efficient Peer-to-Peer Keyword Searching. In: Middleware 2003 (2003)

    Google Scholar 

  12. Salton, G., Yang, C.: On the specification of term values in automatic indexing. Journal of Documentation 4, 351–372 (1973)

    Article  Google Scholar 

  13. Salton, G., Allan, J., Buckley, C.: Approaches to Passage Retrieval in Full Text Information Systems. In: SIGIR 1993, pp. 49–58 (1993)

    Google Scholar 

  14. Pôssas, B., Ziviani, N., Ribeiro-Neto, B., Wagner Meira, J.: Maximal termsets as a query structuring mechanism. In: CIKM 2005, pp. 287–288 (2005)

    Google Scholar 

  15. Rajman, M., Bonnet, A.: Corpora-Base Linguistics: New Tools for Natural Language Processing. In: 1st Annual Conference of Association for Global Strategic Information (1992)

    Google Scholar 

  16. Aberer, K., Klemm, F., Rajman, M., Wu, J.: An Architecture for Peer-to-Peer Information Retrieval. In: SIGIR 2004, Workshop on Peer-to-Peer Information Retrieval (2004)

    Google Scholar 

  17. Cuenca-Acuna, F.M., Peery, C., Martin, R.P., Nguyen, T.D.: PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities. In: 12th IEEE International Symposium on High Performance Distributed Computing (HPDC-12), pp. 236–246. IEEE Press, Los Alamitos (2003)

    Chapter  Google Scholar 

  18. Bender, M., Michel, S., Triantafillou, P., Weikum, G., Zimmer, C.: Improving collection selection with overlap awareness in P2P search engines. In: SIGIR 2005: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 67–74 (2005)

    Google Scholar 

  19. Balke, W., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to- peer networks. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), pp. 174–185 (2005)

    Google Scholar 

  20. Michel, S., Triantafillou, P., Weikum, G.: KLEE: a framework for distributed top-k query algorithms. In: VLDB 2005, pp. 637–648 (2005)

    Google Scholar 

  21. Pôssas, B., Ziviani, N., Wagner Meira, J., Ribeiro-Neto, B.: Set-based vector model: An efficient approach for correlation-based ranking. ACM Trans. Inf. Syst. 23, 397–429 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Podnar, I., Luu, T., Rajman, M., Klemm, F., Aberer, K. (2006). A Peer-to-Peer Architecture for Information Retrieval Across Digital Library Collections. In: Gonzalo, J., Thanos, C., Verdejo, M.F., Carrasco, R.C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2006. Lecture Notes in Computer Science, vol 4172. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11863878_2

Download citation

  • DOI: https://doi.org/10.1007/11863878_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44636-1

  • Online ISBN: 978-3-540-44638-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics