MINERVA∞: A Scalable Efficient Peer-to-Peer Search Engine

  • Sebastian Michel
  • Peter Triantafillou
  • Gerhard Weikum
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3790)

Abstract

The promises inherent in users coming together to form data sharing network communities, bring to the foreground new problems formulated over such dynamic, ever growing, computing, storage, and networking infrastructures. A key open challenge is to harness these highly distributed resources toward the development of an ultra scalable, efficient search engine. From a technical viewpoint, any acceptable solution must fully exploit all available resources dictating the removal of any centralized points of control, which can also readily lead to performance bottlenecks and reliability/availability problems. Equally importantly, however, a highly distributed solution can also facilitate pluralism in informing users about internet content, which is crucial in order to preclude the formation of information-resource monopolies and the biased visibility of content from economically-powerful sources. To meet these challenges, the work described here puts forward MINERVA∞, a novel search engine architecture, designed for scalability and efficiency. MINERVA∞ encompasses a suite of novel algorithms, including algorithms for creating data networks of interest, placing data on network nodes, load balancing, top-k algorithms for retrieving data at query time, and replication algorithms for expediting top-k query processing. We have implemented the proposed architecture and we report on our extensive experiments with real-world, web-crawled, and synthetic data and queries, showcasing the scalability and efficiency traits of MINERVA∞.

References

  1. 1.
    Aspnes, J., Shah, G.: Skip graphs. In: Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 2003, pp. 384–393 (2003)Google Scholar
  2. 2.
    Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: PODC 2004 (2004)Google Scholar
  3. 3.
    Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann, San Francisco (2002)Google Scholar
  4. 4.
    Cuenca-Acuna, F.M., Peery, C., Martin, R.P., Nguyen, T.D.: PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities. Technical Report DCS-TR-487, Rutgers University (Sept. 2002)Google Scholar
  5. 5.
    Fagin, R.: Combining fuzzy information from multiple systems. J. Comput. Syst. Sci. 58(1), 83–99 (1999)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4) (2003)Google Scholar
  7. 7.
    Ganesan, P., Bawa, M., Garcia-Molina, H.: Online balancing of range-partitioned data with applications to peer-to-peer systems. In: VLDB, pp. 444–455 (2004)Google Scholar
  8. 8.
    Gupta, A., Sahin, O.D., Agrawal, D., Abbadi, A.E.: Meghdoot: content-based publish/subscribe over p2p networks. In: Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware, pp. 254–273. Springer, New York (2004)Google Scholar
  9. 9.
    Harvey, N., Jones, M., Saroiu, S., Theimer, M., Wolman, A.: Skipnet: A scalable overlay network with practical locality properties. In: USITS (2003)Google Scholar
  10. 10.
    Huebsch, R., Hellerstein, J.M., Lanham, N., Loo, B.T., Shenker, S., Stoica, I.: Querying the internet with pier. In: VLDB, pp. 321–332 (2003)Google Scholar
  11. 11.
    Lu, J., Callan, J.: Content-based retrieval in hybrid peer-to-peer networks. In: Proceedings of CIKM 2003, pp. 199–206. ACM Press, New York (2003)Google Scholar
  12. 12.
    Michel, S., Triantafillou, P., Weikum, G.: Klee: A framework for distributed top-k query algorithms. In: VLDB Conference (2005)Google Scholar
  13. 13.
    Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S.: A scalable content-addressable network. In: Proceedings of ACM SIGCOMM 2001, pp. 161–172. ACM Press, New York (2001)Google Scholar
  14. 14.
    Reynolds, P., Vahdat, A.: Efficient peer-to-peer keyword searching. In: Proceedings of International Middleware Conference, June 2003, pp. 21–40 (2003)Google Scholar
  15. 15.
    Rowstron, A., Druschel, P.: Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), pp. 329–350 (2001)Google Scholar
  16. 16.
    Salomoni, D., Luitz, S.: High performance throughput tuning/measurement (2000), http://www.slac.stanford.edu/grp/scs/net/talk/High_perf_ppdg_jul2000.ppt
  17. 17.
    Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of the ACM SIGCOMM 2001, pp. 149–160. ACM Press, New York (2001)Google Scholar
  18. 18.
    Suel, T., Mathur, C., Wu, J., Zhang, J., Delis, A., Kharrazi, M., Long, X., Shanmugasunderam, K.: Odissea: A peer-to-peer architecture for scalable web search and information retrieval. Technical report, Polytechnic Univ. (2003)Google Scholar
  19. 19.
    Tirumala, A., et al.: iperf: Testing the limits of your network (2003), http://dast.nlanr.net/projects/iperf/
  20. 20.
    Triantafillou, P., Pitoura, T.: Towards a unifying framework for complex query processing over structured peer-to-peer data networks. In: DBISP2P (2003)Google Scholar
  21. 21.
    Wang, Y., Galanis, L., de Witt, D.J.: Galanx: An efficient peer-to-peer search engine system, Available at http://www.cs.wisc.edu/~yuanwang

Copyright information

© IFIP International Federation for Information Processing 2005

Authors and Affiliations

  • Sebastian Michel
    • 1
  • Peter Triantafillou
    • 2
  • Gerhard Weikum
    • 1
  1. 1.Max-Planck-Institut für InformatikSaarbrückenGermany
  2. 2.R.A. Computer Technology Institute and University of PatrasGreece

Personalised recommendations