OverCite: A Cooperative Digital Research Library

  • Jeremy Stribling
  • Isaac G. Councill
  • Jinyang Li
  • M. Frans Kaashoek
  • David R. Karger
  • Robert Morris
  • Scott Shenker
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3640)


CiteSeer is a well-known online resource for the computer science research community, allowing users to search and browse a large archive of research papers. Unfortunately, its current centralized incarnation is costly to run. Although members of the community would presumably be willing to donate hardware and bandwidth at their own sites to assist CiteSeer, the current architecture does not facilitate such distribution of resources. OverCite is a proposal for a new architecture for a distributed and cooperative research library based on a distributed hash table (DHT). The new architecture will harness resources at many sites, and thereby be able to support new features such as document alerts and scale to larger data sets.


Round Trip Time Distribute Hash Table Inverted Index Index Server Index Partition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bawa, M., Manku, G.S., Raghavan, P.: SETS: Search enhanced by topic segmentation. In: Proceedings of the 2003 SIGIR (July 2003)Google Scholar
  2. 2.
    Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings of the Compression and Complexity of Sequences (June 1997)Google Scholar
  3. 3.
    Burkard, T.: Herodotus: A peer-to-peer web archival system. Master’s thesis, Massachusetts Institute of Technology (May 2002)Google Scholar
  4. 4.
    Chawathe, Y., Ratnasamy, S., Breslau, L., Lanham, N., Shenker, S.: Making Gnutella-like P2P systems scalable. In: Proc. of SIGCOMM (August 2003)Google Scholar
  5. 5.
    Cho, J., Garcia-Molina, H.: Parallel crawlers. In: Proceedings of the 2002 WWW Conference (May 2002)Google Scholar
  6. 6.
    Dabek, F., Kaashoek, M.F., Li, J., Morris, R., Robertson, J., Sit, E.: Designing a DHT for low latency and high throughput. In: Proceedings of the 1st NSDI (March 2004)Google Scholar
  7. 7.
    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences 66, 614–656 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Gnawali, O.D.: A keyword set search system for peer-to-peer networks. Master’s thesis, Massachusetts Institute of Technology (June 2002)Google Scholar
  9. 9.
    Gupta, A., Liskov, B., Rodrigues, R.: Efficient routing for peer-to-peer overlays. In: Proceedings of the 1st NSDI (March 2004)Google Scholar
  10. 10.
    Huebsch, R., Hellerstein, J.M., Lanham, N., Loo, B.T., Shenker, S., Stoica, I.: Querying the Internet with PIER. In: Proceedings of the 19th VLDB (September 2003)Google Scholar
  11. 11.
    Kannan, J., Yang, B., Shenker, S., Sharma, P., Banerjee, S., Basu, S., Lee, S.J.: SmartSeer: Continuous queries over CiteSeer. Tech. Rep. UCB//CSD-05-1371, UC Berkeley, Computer Science Division (January 2005)Google Scholar
  12. 12.
    Lawrence, S., Giles, C.L., Bollacker, K.: Digital libraries and autonomous citation indexing. IEEE Computer 32(6), 67–71 (1999), Google Scholar
  13. 13.
    Li, J., Loo, B.T., Hellerstein, J.M., Kaashoek, M.F., Karger, D., Morris, R.: On the feasibility of peer-to-peer web indexing and search. In: Kaashoek, M.F., Stoica, I. (eds.) IPTPS 2003. LNCS, vol. 2735. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  14. 14.
    Li, J., Stribling, J., Kaashoek, M.F., Morris, R.: Bandwidth-efficient management of DHT routing tables. In: Proceedings of the 2nd NSDI (May 2005)Google Scholar
  15. 15.
    Litwin, W., Neimat, M.-A., Schneider, D.A.: LH* — a scalable, distributed data structure. ACM Transactions on Database Systems 21(4), 480–525 (1996)CrossRefGoogle Scholar
  16. 16.
    Loo, B.T., Cooper, O., Krishnamurthy, S.: Distributed web crawling over DHTs. Tech. Rep. UCB//CSD-04-1332, UC Berkeley, Computer Science Division (February 2004)Google Scholar
  17. 17.
    Loo, B.T., Huebsch, R., Stoica, I., Hellerstein, J.M.: The case for a hybrid P2P search infrastructure. In: Voelker, G.M., Shenker, S. (eds.) IPTPS 2004. LNCS, vol. 3279, pp. 141–150. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  18. 18.
    Reynolds, P., Vahdat, A.: Efficient peer-to-peer keyword searching. In: Endler, M., Schmidt, D.C. (eds.) Middleware 2003. LNCS, vol. 2672. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  19. 19.
    Shi, S., Yang, G., Wang, D., Yu, J., Qu, S., Chen, M.: Making peer-to-peer keyword searching feasible using multi-level partitioning. In: Voelker, G.M., Shenker, S. (eds.) IPTPS 2004. LNCS, vol. 3279, pp. 151–161. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  20. 20.
    Singh, A., Srivatsa, M., Liu, L., Miller, T.: Apoidea: A decentralized peer-to-peer architecture for crawling the world wide web. In: Proceedings of the SIGIR 2003 Workshop on Distributed Information Retrieval (August 2003)Google Scholar
  21. 21.
    Suel, T., Mathur, C., Wu, J.-W., Zhang, J., Delis, A., Kharrazi, M., Long, X., Shanmugasundaram, K.: ODISSEA: A peer-to-peer architecture for scalable web search and information retrieval. In: Proceedings of the International Workshop on the Web and Databases (June 2003)Google Scholar
  22. 22.
    Tang, C., Dwarkadas, S.: Hybrid global-local indexing for efficient peer-to-peer information retrieval. In: Proceedings of the 1st NSDI (March 2004)Google Scholar
  23. 23.
    Yang, B., Garcia-Molina, H.: Improving search in peer-to-peer networks. In: Proceedings of the 22nd ICDCS (July 2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Jeremy Stribling
    • 1
  • Isaac G. Councill
    • 2
  • Jinyang Li
    • 1
  • M. Frans Kaashoek
    • 1
  • David R. Karger
    • 1
  • Robert Morris
    • 1
  • Scott Shenker
    • 3
  1. 1.MIT Computer Science and Artificial Intelligence Laboratory 
  2. 2.PSU School of Information Sciences and Technology 
  3. 3.UC Berkeley and ICSI 

Personalised recommendations