Abstract
In this chapter, we survey P2P data sharing systems. All along, we focus on the evolution from simple file-sharing systems, with limited functionalities, to Peer Data Management Systems (PDMS) that support advanced applications with more sophisticated data management techniques. Advanced P2P applications are dealing with semantically rich data (e.g., XML documents, relational tables), using a high-level SQL-like query language. We start our survey with an overview over the existing P2P network architectures, and the associated routing protocols. Then, we discuss data indexing techniques based on their distribution degree and the semantics they can capture from the underlying data. We also discuss schema management techniques which allow integrating heterogeneous data. We conclude by discussing the techniques proposed for processing complex queries (e.g., range and join queries). Complex query facilities are necessary for advanced applications which require a high level of search expressiveness. This last part shows the lack of querying techniques that allow for an approximate query answering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A.Crespo, H.G.Molina: Routing indices for peer-to-peer systems. In: Proc. of the 28 tn Conference on Distributed Computing Systems (2002)
A.Crespo, H.G.Molina: Semantic overlay networks for p2p systems. Tech. rep., Computer Science Department, Stanford University (2002)
A.Doan, J.Madhavan, R.Dhamankar, P.Domingos, A.Halevy: Learning to match ontologies on the semantic web. The VLDB Journal 12(4), 303–319 (2003)
A.Gupta, D.Agrawal, A.El-Abbadi: Approximate range selection queries in peer-to-peer systems. In: CIDR (2003)
K.Aberer et al.: P-grid: a self-organizing structured p2p system. SIGMOD Rec. 32(3), 29–33 (2003)
Q.Lv et al.: Search and replication in unstructured peer-to-peer networks. In: ACM Int. conference on Supercomputing (2002)
W.Nejdl et al.: Edutella: a p2p networking infrastructure based on rdf. In: WWW’02 (2002)
S.Androutsellis-Theotokis, D.Spinellis: A survey of peer-to-peer content distribution technologies. ACM Comput. Surv. 36(4), 335–371 (2004)
A.Rowstron, P.Druschel: Pastry: Scalable decentralized object location and routing for large-scale peer-to-peer systems. In: IFIP/ACM International Conference on Distributed Systems Platforms (Middleware) (2001)
A.Rowstron, P.Druschel: Storage management and caching in PAST, a large–scale, persistent peer-to-peer storage utility. In: Proc.SOSP (2001)
A.Singla, C.Rohrs: Ultrapeers: another step towards gnutella scalability. Tech. rep. (2002)
B.Arai, G.Das, D.Gunopulos, V.Kalogeraki: Approximating aggregation queries in peer-to-peer networks. In: ICDE (2006)
M.Bawa, G.S.Manku, P.Raghavan: Sets: search enhanced by topic segmentation. In: SIGIR ’03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 306–313 (2003)
B.Babcock, S.Chaudhuri, G.Das: Dynamic sample selection for approximate query processing. In: SIGMOD (2003)
B.Cooper, H-G.Molina: Ad hoc, self-supervising peer-to-peer search networks. ACM Trans. Inf. Syst. 23(2), 169–200 (2005)
B.Maniymaran, M.Bertier, A-M.Kermarrec: Build one, get one free: Leveraging the coexistence of multiple p2p overlay networks. In: Proc of the 27th International Conference on Distributed Computing Systems ICDCS, p. 33. IEEE Computer Society, Washington, DC, USA (2007)
P.Bosc, D.Dubois, H.Prade: Fuzzy functional dependencies and redundancy elimination. JASIS 49(3), 217–235 (1998)
P.Bosc, O.Pivert, L.Ughetto: On data summaries based on gradual rules. In: Fuzzy Days, pp. 512–521 (1999)
B.Yang, H-G.Molina: Improving search in peer-to-peer networks. In: Proc of the 22 nd International Conference on Distributed Computing Systems (ICDCS) (2002)
B.Zhao, J.Kubiatowicz, A.Joseph: Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Tech. rep., Computer Science Division, U. C.Berkeley (2001)
B.Zhao, L.Huang, J.Stribling, S.Rhea, A.Joseph, J.Kubiatowicz: Tapestry: A resilient global-scale overlay for service deployment. IEEE J. Selected Areas Commun. 22, 41–53 (2004)
C.Schmidt, M.Parashar: Enabling flexible queries with guarantees in p2p systems. IEEE Int. Comput. 08(3), 19–26 (2004)
C.Tang, Z.Xu, M.Mahalingam: Peersearch: Efficient information retrieval in peer-to-peer networks. Tech. Rep. HPL-2002-198, HP Labs (2002)
J.C.Cubero, J.M.Medina, O.Pons, M.A.V.Miranda: Data summarization in relational databases through fuzzy dependencies. Inf. Sci. 121(3–4), 233–270 (1999)
D.Malkhi, M.Naor, D.Ratajczak: Viceroy: a scalable and dynamic emulation of the butterfly. In: Proc of the twenty-first annual symposium on Principles of distributed computing, pp. 183–192 (2002)
D.Milojicic, et al: Peer-to-peer computing. Tech. rep., HP labs (2002)
D.Papadias, Y.Tao, G.Fu, B.Seeger: An optimal and progressive algorithm for skyline queries. In: ACM SIGMOD, pp. 467–478 (2003)
F.Cuenca-Acuna, C.Peery, R.Martin, T.Nguyen: Planetp: Using gossiping to build content addressable peer-to-peer information sharing communities. In: HPDC-12 (2003)
G.Raschia, N.Mouaddib: A fuzzy set-based approach to database summarization. Fuzzy sets and systems 129(2) pp. 137–162 (2002)
J.Gray, S.Chaudhuri, A.Bosworth, A.Layman, D.Reichart, M.Venkatrao, F.Pellow, H.Pirahesh: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. J. Data Mining Knowl. Discov. 1(1), 29–53 (1997)
S.Gribble, A.Halevy, Z.Ives, M.Rodrig, D.Suciu: What can databases do for peer-to-peer? In: WebDB Workshop on Databases and the Web (2001)
G.Skobeltsyn, T.Luu, Žarko, I.P., M.Rajman, K.Aberer: Query-driven indexing for scalable peer-to-peer text retrieval. Infoscale p. 14 (2007)
G.Wiederhold: Mediators in the architecture of future information systems. IEEE Computer 25, 38–49 (1992)
J.M.Hellerstein: Toward network data independence. SIGMOD Rec 32, 200–203 (2003)
H.Jagadish, B.Ooi, Q.Vu: Baton: A balanced tree structure for peer-to-peer networks. In: VLDB (2005)
H.Jagadish, B.Ooi, Q.Vu, R.Zhang, A.Zhou: Vbi-tree: A peer-to-peer framework for supporting multi-dimensional indexing schemes. In: ICDE, p. 34 (2006)
I.Chrysakis, D.Plexousakis, I.Chrysakis, D.Plexousakis: Semantic query routing and distributed top-k query processing in peer-to-peer networks. Tech. rep., Department of Computer Science, University of Crete (2006)
I.Clarke, S.Miller, T.Hong, O.Sandberg, B.Wiley: Protecting free expression online with freenet. IEEE Int. Comput. 6(1), 40–49 (2002)
I.Stoica, R.Morris, D.Karger, M.F.Kaashoek, H.Balakrishnan: Chord: A scalabale peer-to-peer lookup service for internet applications. In: Proc ACM SIGCOMM (2001)
I.Tartinov et al.: The Piazza peer data management project. In: SIGMOD (2003)
S.Iyer, A.Rowstron, P.Druschel: Squirrel: a decentralized peer-to-peer web cache. In: PODC ’02: Proceedings of the twenty-first annual symposium on Principles of distributed computing, pp. 213–222 (2002)
J.Kubiatowicz, D.Bindel, Y.Chen, S.Czerwinski, P.Eaton, D.Geels, R.Gummadi, S.Rhea, H.Weatherspoon, C.Wells, B.Zhao: Oceanstore: an architecture for global-scale persistent storage. SIGOPS Oper. Syst. Rev. 34(5), 190–201 (2000)
D.Karger, E.Lehman, T.Leighton, R.Panigrahy, M.Levine, D.Lewin: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In: STOC ’97: Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, pp. 654–663 (1997)
K.Hose, C.Lemke, K.Sattler: Processing relaxed skylines in pdms using distributed data summaries. In: CIKM, pp. 425–434 (2006)
L.Adamic et al.: Search in power law networks. Phys. Rev. E 64, 46,135–46,143 (2001)
L.A.Zadeh: Fuzzy sets. Inf. Control 8, 338–353 (1965)
L.A.Zadeh: Concept of a linguistic variable and its application to approximate reasoning-I. Inf. Syst. 8, 199–249 (1975)
D.H.Lee, M.H.Kim: Database summarization using fuzzy ISA hierarchies. IEEE Trans. on Systems, Man and Cybernetics-Part B: Cybernetics 27, 68–78 (1997)
M.Lenzerini: Data integration: a theoretical perspective. In: PODS ’02: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 233–246 (2002)
Y.Li, L.Lao, J.H.Cui: Sdc: A distributed clustering protocol for peer-to-peer networks. In: Networking, pp. 1234–1239 (2006)
L.Ramaswamy, B.Gedik, L.Liu: A distributed approach to node clustering in decentralized peer-to-peer networks. IEEE Trans. Parallel Distributed Syst. 16(9), 814–829 (2005)
J.Madhavan, P.A.Bernstein, A.Doan, A.Halevy: Corpus-based schema matching. In: ICDE ’05: Proceedings of the 21st International Conference on Data Engineering, pp. 57–68 (2005)
M.Cai, M.Frank: Rdfpeers: a scalable distributed rdf repository based on a structured peer-to-peer network. In: WWW, pp. 650–657 (2004)
M.Cai, M.Frank, J.Chen, P.Szekely: Maan: A multi-attribute addressable network for grid information services. In: GRID (2003)
M.Castro, M.Costa, A.Rowstron: Should we build gnutella on a structured overlay? SIGCOMM Comput. Commun. Rev. 34(1), 131–136 (2004)
M.Espil, A.Vaisman: Aggregate queries in peer-to-peer olap. In: DOLAP (2004)
W.Ng, B.Ooi, K-L.Tan, A.Zhou: Peerdb: A p2p-based system for distributed data sharing. In: ICDE, pp. 633–644 (2003)
N.Harvey, M.Jones, S.Saroiu, M.Theimer, A.Wolman: Skipnet: A scalable overlay network with practical locality properties. In: USENIX Symposium on Internet Technologies and Systems (2003)
O.Sahin A.Gupta, D.Agrawal, A.El-Abbadi: Query processing over peer-to-peer data sharing systems. Tech. rep., University of California, Santa Barbara (2002)
H.Y.Paik, N.Mouaddib, B.Benatallah, F.Toumani, M.Hassan: Building and querying e-catalog networks using p2p and data summarisation techniques. J. Intell. Inf. Syst. 26(1), 7–24 (2006)
P.Bernstein, F.Giunchiglia, A.Kementsietsidis, J.Mylopoulos, L.Serafini, I.Zaihrayeu: Data management for peer-to-peer computing: A vision. In: Proc. of the 5th International Workshop on the Web and Databases (WebDB) (2002)
P.Kalnis, W.Ng, B.Ooi, D.Papadias, K.Tan: An adaptive peer-to-peer network for distributed caching of olap results. In: SIGMOD (2002)
P.Kalnis, W.Ng, B.Ooi, K.Tan: Answering similarity queries in peer-to-peer networks. Inf. Syst. 31(1), 57–72 (2006)
P.Maymounkov, D.Mazieres: Kademlia: A peer-to-peer information system based on the xor metric. In: Int. Workshop on Peer-to-Peer Systems (IPTPS), pp. 53–65 (2002)
P.McBrien, A.Poulovassilis: Defining peer-to-peer data integration using both as view rules. In: DBISP2P, pp. 91–107 (2003)
H.Prade, C.Testemale: Generalizing database relational algebra for the treatment of incomplete/uncertain information and vague queries. Inf. Sci. 34(2), 115–143 (1984)
P.Triantafillou, T.Pitoura: Towards a unifying framework for complex query processing over structured peer-to-peer data networks. In: DBISP2P, pp. 169–183 (2003)
P.Wu, C.Zhang, Y.Feng, B.Zhao, D.Agrawal, A.El-Abbadi: Parallelizing skyline queries for scalable distribution. In: EDBT, pp. 112–130 (2006)
R.Akbarinia, E.Pacitti, P.Valduriez: Reducing network traffic in unstructured p2p systems using top-k queries. Distrib. Parallel Databases 19(2–3), 67–86 (2006)
R.Akbarinia, E.Pacitti, P.Valduriez: Processing top-k queries in distributed hash tables. In: Euro-Par, pp. 489–502 (2007)
R.Akbarinia, V.Martins, E.Pacitti, P.Valduriez: Design and implementation of appa. In: Global Data Management (Eds. R. Baldoni, G. Cortese and F. Davide). IOS press (2006)
R.Aringhieri, E.Damiani, S.Vimercati, S.Paraboschi, P.Samarati: Fuzzy techniques for trust and reputation management in anonymous peer-to-peer systems. J. Am. Soc. Inf. Sci. Technol. 57(4) (2006)
D.Rasmussen, R.R.Yager: SummarySQL – a fuzzy tool for data mining. Intell. Data Anal. 1, 49–58 (1997)
R.Dingledine, M.Freedman, D.Molnar: The free haven project: distributed anonymous storage service. In: International workshop on Designing privacy enhancing technologies, pp. 67–95. Springer-Verlag New York, Inc., New York, NY, USA (2001)
R.Huebsch, J.Hellerstein, N.Lanham, B.Thau, L.Shenker, I.Stoica: Querying the internet with pier. In: VLDB (2003)
S.Ratnasamy, M.Handley, R.Karp, S.Shenker: Topologically-aware overlay construction and server selection. In: Proceedings of IEEE INFOCOM’02 (2002)
S.Ratnasamy, P.Francis, M.Handley, R.M.Karp, S.Shenker: A scalable content – addressable network. In: SIGCOMM (2001)
K.Sripanidkulchai, B.M.Maggs, H.Zhang: Efficient content location using interest-based locality in peer-to-peer systems. In: INFOCOM (2003)
S.Wang, B.Ooi, A.Tung, L.Xu: Efficient skyline query processing on peer-to-peer networks. In: ICDE (2007)
T.Luu, G.Skobeltsyn, F.Klemm, M.Puh, Žarko, I.P., M.Rajman, K.Aberer: Alvisp2p: Scalable peer-to-peer text retrieval in a structured p2p network. In: Proc VLDB (2008)
D.Tsoumakos, N.Roussopoulos: A comparison of peer-to-peer search methods. In: Int.Workshop on the Web and Databases (WebDB), pp. 61–66 (2003)
U.Guntzer, W.Balke, W.Kieβling: Optimizing multi-feature queries for image databases. In: VLDB (2000)
J.D.Ullman: Information integration using logical views. In: ICDT ’97: Proceedings of the 6th International Conference on Database Theory, pp. 19–40 (1997)
V.Kalogeraki, D.Gunopulos, D.Yazti: A local search mechanism for peer-to-peer networks. In: Proc CIKM. USA (2002)
A.Voglozin, G.Raschia, L.Ughetto, N.Mouaddib: Handbook of Research on Fuzzy Information Processing in Databases, vol. 1, chap. From User Requirements to Evaluation Strategies of Flexible Queries in Databases, pp. 115–142 (2008)
W.A.Voglozin, G.Raschia, L.Ughetto, N.Mouaddib: Querying the SaintEtiQ summaries–a first attempt. In: Int.Conf.On Flexible Query Answering Systems (FQAS) (2004)
W.Balke, W.Nejdl, W.Siberski, U.Thaden: Progressive distributed top-k retrieval in peer-to-peer networks. In: ICDE (2005)
W.Nejdl, W.Siberski: Design issues and challenges for rdf- and schema-based peer-to-peer systems. SIGMOD Record 32, 2003 (2003)
W.Ng, B.Ooi, K.Tan: Bestpeer: A self-configurable peer-to-peer system. In: ICDE (2002)
X.Li, Y.J.Kim, R.Govindan, W.Hong: Multidimensional range queries in sensor networks. In: SENSYS (2003)
R.R.Yager: On linguistic summaries of data. In: Knowledge Discovery in Databases, pp. 347–366. MIT Press (1991)
B.Yang, P.Vinograd, H.Garcia-Molina: Evaluating guess and non-forwarding peer-to-peer search. In: ICDCS ’04: Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS’04), pp. 209–218 (2004)
Y.Chawathe, S.Ratnasamy, L.Breslau, N.Lanham, S.Shenker: Making gnutella-like p2p systems scalable. In: In Proc. ACM SIGCOMM (2003)
Y.Halevy, G.Ives, D.Suciu, I.Tatarinov: Schema mediation for large-scale semantic data sharing. VLDB J. 14(1), 68–83 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Hayek, R., Raschia, G., Valduriez, P., Mouaddib, N. (2010). Data Sharing in P2P Systems. In: Shen, X., Yu, H., Buford, J., Akon, M. (eds) Handbook of Peer-to-Peer Networking. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09751-0_19
Download citation
DOI: https://doi.org/10.1007/978-0-387-09751-0_19
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09750-3
Online ISBN: 978-0-387-09751-0
eBook Packages: Computer ScienceComputer Science (R0)