Abstract
In this paper we present LinkedPeers, a DHT-based system designed for efficient distribution and processing of multidimensional, loosely structured data over a Peer-to-Peer overlay. Each dimension is further annotated with the use of concept hierarchies. The system design aims at incorporating two important features, namely large-scale support for partially-structured data and high-performance, distributed query processing including multiple aggregates. To enable the efficient resolution of such queries, LinkedPeers utilizes a conceptual chain of DHT rings that stores data in a hierarchy-preserving manner. Moreover, adaptive mechanisms detect dynamic changes in the query workloads and adjust the granularity of the indexing on a per node basis. The pre-computation of possible future queries is also performed during the resolution of an incoming query. Extensive experiments prove that our system is very efficient achieving over 85% precision in answering queries while minimizing communication cost and adapting its indexing to the incoming queries.
Keywords
- Resource Description Framework
- Primary Ring
- Distribute Hash Table
- Primary Dimension
- Query Response Time
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Data, L.: Connect Distributed Data across the Web, http://linkeddata.org/
Balakrishnan, H., Kaashoek, M.F., Karger, D., Morris, R., Stoica, I.: Looking up data in p2p systems. Commun. ACM 46, 43–48 (2003), http://doi.acm.org/10.1145/606272.606299
Stoica, I., Morris, R., Karger, D., Kaashoek, F., Balakrishnan, H.: Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications. In: Proceedings of the 2001 ACM SIGCOMM Conference, San Diego, USA, pp. 149–160 (August 2001)
Rowstron, A., Druschel, P.: Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001), http://dl.acm.org/citation.cfm?id=646591.697650
Maymounkov, P., Mazières, D.: Kademlia: A Peer-to-Peer Information System Based on the XOR Metric. In: Druschel, P., Kaashoek, M.F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429, pp. 53–65. Springer, Heidelberg (2002)
Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S.: A scalable content-addressable network. In: Proceedings of the 2001 ACM SIGCOMM Conference, San Diego, USA, pp. 161–172 (August 2001)
Lv, Q., Cao, P., Cohen, E., Li, K., Shenker, S.: Search and replication in unstructured peer-to-peer networks. In: Proceedings of the 16th International Conference on Supercomputing, ICS 2002, pp. 84–95. ACM, New York (2002), http://doi.acm.org/10.1145/514191.514206
Asiki, A., Tsoumakos, D., Koziris, N.: Distributing and searching concept hierarchies: An adaptive dht-based system. Cluster Computing 13, 257–276 (2010)
Bizer, C., Heath, T., Berners-Lee, T.: Linked Data - The Story So Far. Int. Journal on Semantic Web and Information Systems, IJSWIS (2009)
RDF, Resource Description Framework(RDF), http://www.w3.org/RDF/
SPARQL, SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia - a crystallization point for the web of data. Web Semant. 7, 154–165 (2009)
Halpin, H.: A query-driven characterization of linked data. In: LDOW (2009)
FreePastry, http://freepastry.rice.edu/FreePastry
apb, OLAP Council APB-1 OLAP Benchmark, http://www.olapcouncil.org/research/resrchly.htm
SQLite, http://www.sqlite.org/
O.-S. E. Virtuoso, Version 6.1, http://www.openlinksw.com/wiki/main/Main
JenaProvider, Virtuoso jena provider, http://www.openlinksw.com/dataspace/dav/wiki/Main/VirtJenaProvider
Guo, Y., Pan, Z., Heflin, J.: An Evaluation of Knowledge Base Systems for Large OWL Datasets. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 274–288. Springer, Heidelberg (2004)
Huebsch, R., Hellerstein, J., Boon, N.L., Loo, T., Shenker, S., Stoica, I.: Querying the Internet with PIER. In: VLDB (2003)
Tatarinov, I., Halevy, A.: Efficient query reformulation in peer data management systems. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD 2004, pp. 539–550. ACM, New York (2004)
Ooi, B.C., Tan, K.-L., Zhou, A., Goh, C.H., Li, Y., Liau, C.Y., Ling, B., Ng, W.S., Shu, Y., Wang, X., Zhang, M.: Peerdb: Peering into personal databases. In: SIGMOD Conference, p. 659 (2003)
Wu, S., Li, J., Ooi, B.C., Tan, K.-L.: Just-in-time query retrieval over partially indexed data on structured p2p overlays. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 279–290. ACM, New York (2008)
Wu, S., Jiang, S., Ooi, B.C., Tan, K.-L.: Distributed online aggregations. In: Proc. VLDB Endow., vol. 2, pp. 443–454 (August 2009)
Schmidt, C., Parashar, M.: Enabling flexible queries with guarantees in p2p systems. IEEE Internet Computing 8, 19–26 (2004)
Lee, J., Lee, H., Kang, S., Kim, S.M., Song, J.: CISS: An efficient object clustering framework for DHT-based peer-to-peer applications. Computer Networks 51(4), 1072–1094 (2007)
Ganesan, P., Yang, B., Garcia-Molina, H.: One torus to rule them all: multi-dimensional queries in p2p systems. In: Proceedings of the 7th International Workshop on the Web and Databases: Colocated with ACM SIGMOD/PODS, WebDB 2004, pp. 19–24. ACM, New York (2004)
Hose, K., Schenkel, R., Theobald, M., Weikum, G.: Database Foundations for Scalable RDF Processing. In: Polleres, A., d’Amato, C., Arenas, M., Handschuh, S., Kroner, P., Ossowski, S., Patel-Schneider, P. (eds.) Reasoning Web 2011. LNCS, vol. 6848, pp. 202–249. Springer, Heidelberg (2011)
Harris, S., Gibbins, N.: 3store: Efficient bulk RDF storage. In: Proceedings of the 1st International Workshop on Practical and Scalable Semantic Systems (PSSS 2003), Citeseer, pp. 1–20 (2003)
Neumann, T., Weikum, G.: The rdf-3x engine for scalable management of rdf data. The VLDB Journal 19, 91–113 (2010)
Haase, P., Mathäß, T., Ziller, M.: An evaluation of approaches to federated query processing over linked data. In: Proceedings of the 6th International Conference on Semantic Systems, I-SEMANTICS 2010, pp. 5:1–5:9. ACM, New York (2010)
Harth, A., Hose, K., Karnstedt, M., Polleres, A., Sattler, K.-U., Umbrich, J.: Data summaries for on-demand queries over linked data. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 411–420. ACM, New York (2010)
Cai, M., Frank, M.: Rdfpeers: a scalable distributed rdf repository based on a structured peer-to-peer network. In: Proceedings of the 13th International Conference on World Wide Web, WWW 2004, pp. 650–657. ACM, New York (2004)
Cai, M., Frank, M., Chen, J., Szekely, P.: Maan: A multi-attribute addressable network for grid information services. Journal of Grid Computing 2, 3–14 (2004), doi:10.1007/s10723-004-1184-y
Kaoudi, Z., Koubarakis, M., Kyzirakos, K., Miliaraki, I., Magiridou, M., Papadakis-Pesaresi, A.: Atlas: Storing, updating and querying rdf(s) data on top of dhts. Web Semant. 8, 271–277 (2010)
Aberer, K., Cudré-Mauroux, P., Hauswirth, M., Van Pelt, T.: GridVine: Building Internet-Scale Semantic Overlay Networks. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 107–121. Springer, Heidelberg (2004)
Karnstedt, M., Sattler, K.-U., Hauswirth, M., Schmidt, R.: A dht-based infrastructure for ad-hoc integration and querying of semantic data. In: Proceedings of the 2008 International Symposium on Database Engineering and Applications, IDEAS 2008, pp. 19–28. ACM, New York (2008)
Zhou, J., Hall, W., De Roure, D.: Building a distributed infrastructure for scalable triple stores. Journal of Computer Science and Technology 24, 447–462 (2009), doi:10.1007/s11390-009-9236-1
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Asiki, A., Tsoumakos, D., Koziris, N. (2012). A DHT-Based System for the Management of Loosely Structured, Multidimensional Data. In: Hameurlain, A., Küng, J., Wagner, R., Liddle, S.W., Schewe, KD., Zhou, X. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems VI. Lecture Notes in Computer Science, vol 7600. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34179-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-34179-3_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34178-6
Online ISBN: 978-3-642-34179-3
eBook Packages: Computer ScienceComputer Science (R0)
