Skip to main content

A DHT-Based System for the Management of Loosely Structured, Multidimensional Data

  • Conference paper
  • 497 Accesses

Part of the Lecture Notes in Computer Science book series (TLDKS,volume 7600)

Abstract

In this paper we present LinkedPeers, a DHT-based system designed for efficient distribution and processing of multidimensional, loosely structured data over a Peer-to-Peer overlay. Each dimension is further annotated with the use of concept hierarchies. The system design aims at incorporating two important features, namely large-scale support for partially-structured data and high-performance, distributed query processing including multiple aggregates. To enable the efficient resolution of such queries, LinkedPeers utilizes a conceptual chain of DHT rings that stores data in a hierarchy-preserving manner. Moreover, adaptive mechanisms detect dynamic changes in the query workloads and adjust the granularity of the indexing on a per node basis. The pre-computation of possible future queries is also performed during the resolution of an incoming query. Extensive experiments prove that our system is very efficient achieving over 85% precision in answering queries while minimizing communication cost and adapting its indexing to the incoming queries.

Keywords

  • Resource Description Framework
  • Primary Ring
  • Distribute Hash Table
  • Primary Dimension
  • Query Response Time

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Data, L.: Connect Distributed Data across the Web, http://linkeddata.org/

  2. Balakrishnan, H., Kaashoek, M.F., Karger, D., Morris, R., Stoica, I.: Looking up data in p2p systems. Commun. ACM 46, 43–48 (2003), http://doi.acm.org/10.1145/606272.606299

    CrossRef  Google Scholar 

  3. Stoica, I., Morris, R., Karger, D., Kaashoek, F., Balakrishnan, H.: Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications. In: Proceedings of the 2001 ACM SIGCOMM Conference, San Diego, USA, pp. 149–160 (August 2001)

    Google Scholar 

  4. Rowstron, A., Druschel, P.: Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001), http://dl.acm.org/citation.cfm?id=646591.697650

    CrossRef  Google Scholar 

  5. Maymounkov, P., Mazières, D.: Kademlia: A Peer-to-Peer Information System Based on the XOR Metric. In: Druschel, P., Kaashoek, M.F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429, pp. 53–65. Springer, Heidelberg (2002)

    CrossRef  Google Scholar 

  6. Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S.: A scalable content-addressable network. In: Proceedings of the 2001 ACM SIGCOMM Conference, San Diego, USA, pp. 161–172 (August 2001)

    Google Scholar 

  7. Lv, Q., Cao, P., Cohen, E., Li, K., Shenker, S.: Search and replication in unstructured peer-to-peer networks. In: Proceedings of the 16th International Conference on Supercomputing, ICS 2002, pp. 84–95. ACM, New York (2002), http://doi.acm.org/10.1145/514191.514206

    CrossRef  Google Scholar 

  8. Asiki, A., Tsoumakos, D., Koziris, N.: Distributing and searching concept hierarchies: An adaptive dht-based system. Cluster Computing 13, 257–276 (2010)

    CrossRef  Google Scholar 

  9. Bizer, C., Heath, T., Berners-Lee, T.: Linked Data - The Story So Far. Int. Journal on Semantic Web and Information Systems, IJSWIS (2009)

    Google Scholar 

  10. RDF, Resource Description Framework(RDF), http://www.w3.org/RDF/

  11. SPARQL, SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/

  12. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia - a crystallization point for the web of data. Web Semant. 7, 154–165 (2009)

    CrossRef  Google Scholar 

  13. Halpin, H.: A query-driven characterization of linked data. In: LDOW (2009)

    Google Scholar 

  14. FreePastry, http://freepastry.rice.edu/FreePastry

  15. apb, OLAP Council APB-1 OLAP Benchmark, http://www.olapcouncil.org/research/resrchly.htm

  16. SQLite, http://www.sqlite.org/

  17. O.-S. E. Virtuoso, Version 6.1, http://www.openlinksw.com/wiki/main/Main

  18. JenaProvider, Virtuoso jena provider, http://www.openlinksw.com/dataspace/dav/wiki/Main/VirtJenaProvider

  19. Guo, Y., Pan, Z., Heflin, J.: An Evaluation of Knowledge Base Systems for Large OWL Datasets. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 274–288. Springer, Heidelberg (2004)

    CrossRef  Google Scholar 

  20. Huebsch, R., Hellerstein, J., Boon, N.L., Loo, T., Shenker, S., Stoica, I.: Querying the Internet with PIER. In: VLDB (2003)

    Google Scholar 

  21. Tatarinov, I., Halevy, A.: Efficient query reformulation in peer data management systems. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD 2004, pp. 539–550. ACM, New York (2004)

    CrossRef  Google Scholar 

  22. Ooi, B.C., Tan, K.-L., Zhou, A., Goh, C.H., Li, Y., Liau, C.Y., Ling, B., Ng, W.S., Shu, Y., Wang, X., Zhang, M.: Peerdb: Peering into personal databases. In: SIGMOD Conference, p. 659 (2003)

    Google Scholar 

  23. Wu, S., Li, J., Ooi, B.C., Tan, K.-L.: Just-in-time query retrieval over partially indexed data on structured p2p overlays. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 279–290. ACM, New York (2008)

    CrossRef  Google Scholar 

  24. Wu, S., Jiang, S., Ooi, B.C., Tan, K.-L.: Distributed online aggregations. In: Proc. VLDB Endow., vol. 2, pp. 443–454 (August 2009)

    Google Scholar 

  25. Schmidt, C., Parashar, M.: Enabling flexible queries with guarantees in p2p systems. IEEE Internet Computing 8, 19–26 (2004)

    CrossRef  Google Scholar 

  26. Lee, J., Lee, H., Kang, S., Kim, S.M., Song, J.: CISS: An efficient object clustering framework for DHT-based peer-to-peer applications. Computer Networks 51(4), 1072–1094 (2007)

    CrossRef  MATH  Google Scholar 

  27. Ganesan, P., Yang, B., Garcia-Molina, H.: One torus to rule them all: multi-dimensional queries in p2p systems. In: Proceedings of the 7th International Workshop on the Web and Databases: Colocated with ACM SIGMOD/PODS, WebDB 2004, pp. 19–24. ACM, New York (2004)

    CrossRef  Google Scholar 

  28. Hose, K., Schenkel, R., Theobald, M., Weikum, G.: Database Foundations for Scalable RDF Processing. In: Polleres, A., d’Amato, C., Arenas, M., Handschuh, S., Kroner, P., Ossowski, S., Patel-Schneider, P. (eds.) Reasoning Web 2011. LNCS, vol. 6848, pp. 202–249. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  29. Harris, S., Gibbins, N.: 3store: Efficient bulk RDF storage. In: Proceedings of the 1st International Workshop on Practical and Scalable Semantic Systems (PSSS 2003), Citeseer, pp. 1–20 (2003)

    Google Scholar 

  30. Neumann, T., Weikum, G.: The rdf-3x engine for scalable management of rdf data. The VLDB Journal 19, 91–113 (2010)

    CrossRef  Google Scholar 

  31. Haase, P., Mathäß, T., Ziller, M.: An evaluation of approaches to federated query processing over linked data. In: Proceedings of the 6th International Conference on Semantic Systems, I-SEMANTICS 2010, pp. 5:1–5:9. ACM, New York (2010)

    Google Scholar 

  32. Harth, A., Hose, K., Karnstedt, M., Polleres, A., Sattler, K.-U., Umbrich, J.: Data summaries for on-demand queries over linked data. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 411–420. ACM, New York (2010)

    CrossRef  Google Scholar 

  33. Cai, M., Frank, M.: Rdfpeers: a scalable distributed rdf repository based on a structured peer-to-peer network. In: Proceedings of the 13th International Conference on World Wide Web, WWW 2004, pp. 650–657. ACM, New York (2004)

    CrossRef  Google Scholar 

  34. Cai, M., Frank, M., Chen, J., Szekely, P.: Maan: A multi-attribute addressable network for grid information services. Journal of Grid Computing 2, 3–14 (2004), doi:10.1007/s10723-004-1184-y

    CrossRef  MATH  Google Scholar 

  35. Kaoudi, Z., Koubarakis, M., Kyzirakos, K., Miliaraki, I., Magiridou, M., Papadakis-Pesaresi, A.: Atlas: Storing, updating and querying rdf(s) data on top of dhts. Web Semant. 8, 271–277 (2010)

    CrossRef  Google Scholar 

  36. Aberer, K., Cudré-Mauroux, P., Hauswirth, M., Van Pelt, T.: GridVine: Building Internet-Scale Semantic Overlay Networks. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 107–121. Springer, Heidelberg (2004)

    CrossRef  Google Scholar 

  37. Karnstedt, M., Sattler, K.-U., Hauswirth, M., Schmidt, R.: A dht-based infrastructure for ad-hoc integration and querying of semantic data. In: Proceedings of the 2008 International Symposium on Database Engineering and Applications, IDEAS 2008, pp. 19–28. ACM, New York (2008)

    CrossRef  Google Scholar 

  38. Zhou, J., Hall, W., De Roure, D.: Building a distributed infrastructure for scalable triple stores. Journal of Computer Science and Technology 24, 447–462 (2009), doi:10.1007/s11390-009-9236-1

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Asiki, A., Tsoumakos, D., Koziris, N. (2012). A DHT-Based System for the Management of Loosely Structured, Multidimensional Data. In: Hameurlain, A., Küng, J., Wagner, R., Liddle, S.W., Schewe, KD., Zhou, X. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems VI. Lecture Notes in Computer Science, vol 7600. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34179-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34179-3_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34178-6

  • Online ISBN: 978-3-642-34179-3

  • eBook Packages: Computer ScienceComputer Science (R0)