Achieving High Precisions with Peer-to-Peer Is Possible!
Until previously, centralized stand-alone solutions had no problem coping with the load of storing, indexing and searching the small test collections used for evaluating search results at INEX. However, searching the new large-scale Wikipedia collection of 2009 requires much more resources such as processing power, RAM, and index space. It is hence more important than ever to regard efficiency issues when performing XML-Retrieval tasks on such a big collection. On the other hand, the rich markup of the new collection is an opportunity to exploit the given structure and obtain a more efficient search. This paper describes our experiments using distributed search techniques based on XML-Retrieval. Our aim is to improve both effectiveness and efficiency; we have thus submitted search results to both the Efficiency Track and the Ad Hoc Track. In our experiments, the collection, index, and search load are split over a peer-to-peer (P2P) network to gain more efficiency in terms of load balancing when searching large-scale collections. Since the bandwidth consumption between searching peers has to be limited in order to achieve a scalable, efficient system, we exploit XML-structure to reduce the number of messages sent between peers. In spite of mainly aiming at efficiency, our search engine SPIRIX resulted in quite high precisions and made it into the top-10 systems (focused task). It ranked 7 at the Ad Hoc Track (59%) and came first in terms of precision at the Efficiency Track (both categories of topics). For the first time at INEX, a P2P system achieved an official search quality comparable with the top-10 centralized solutions!
KeywordsXML-Retrieval Large-Scale Distributed Search INEX Efficiency
Unable to display preview. Download preview PDF.
- 1.Baeza-Yates, R., Castillo, C., Junqueira, F., Plachouras, V., Silvestri, F.: Challenges on Distributed Web Retrieval. In: IEEE Int. Conf. on Data Engineering (ICDE’07), Turkey (2007)Google Scholar
- 2.Balakrishnan, H., Kaashoek, F., Karger, D., Morris, R., Stoica, I.: Looking Up Data in P2P Systems. Communications of the ACM 46(2) (2003)Google Scholar
- 3.Carmel, D., Maarek, Y., Mandelbrod, M., Mass, Y., Soffer, A.: Searching XML Documents via XML Fragments. In: Proc. of the 26th Int. ACM SIGIR, Toronto, Canada (2003)Google Scholar
- 5.Moffat, A., Webber, W., Zobel, J., Baeza-Yates, R.: A pipelined architecture for distributed text query evaluation. In: Springer Science + Business Media, LLC 2007 (2007)Google Scholar
- 6.Risson, J., Moors, T.: Survey of research towards robust peer-to-peer networks – search methods. In: Technical Report UNSW-EE-P2P-1-1, Uni. of NSW, Australia (2004)Google Scholar
- 7.Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: Proc. of CIKM’04. ACM Press, New York (2004)Google Scholar
- 8.Steinmetz, R., Wehrle, K. (eds.): Peer-to-Peer Systems and Applications. LNCS, vol. 3485. Springer, Heidelberg (2005)Google Scholar
- 9.Stoica, I., Morris, R., Liben-Nowell, D., Karger, D., Kaashoek, F., Dabek, F., Balakrishnan, H.: Chord - A Scalable Peer-to-peer Lookup Protocol for Internet Applications. IEEE/ACM Transactions on Networking 11(1) (2003)Google Scholar
- 10.Vinson, A., Heuser, C., Da Silva, A., De Moura, E.: An Approach to XML Path Matching. In: WIDM’07, Lisboa, Portugal, November 9 (2007)Google Scholar
- 12.Zeinalipour-Yazti, D., Kalogeraki, V., Gunopulos, D.: Information Retrieval in Peer-to-Peer Networks. IEEE CiSE Magazine, Special Issue on Web Engineering (2004)Google Scholar