Skip to main content

nativeNDP: Processing Big Data Analytics on Native Storage Nodes

  • Conference paper
  • First Online:
Book cover Advances in Databases and Information Systems (ADBIS 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11695))

Included in the following conference series:

Abstract

Data analytics tasks on large datasets are computationally-intensive and often demand the compute power of cluster environments. Yet, data cleansing, preparation, dataset characterization and statistics or metrics computation steps are frequent. These are mostly performed ad hoc, in an explorative manner and mandate low response times. But, such steps are I/O intensive and typically very slow due to low data locality, inadequate interfaces and abstractions along the stack. These typically result in prohibitively expensive scans of the full dataset and transformations on interface boundaries.

In this paper, we examine R as analytical tool, managing large persistent datasets in Ceph, a wide-spread cluster file-system. We propose nativeNDP – a framework for Near-Data Processing that pushes down primitive R tasks and executes them in-situ, directly within the storage device of a cluster-node. Across a range of data sizes, we show that nativeNDP is more than an order of magnitude faster than other pushdown alternatives.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. ITRS - International Technology Roadmap for Semiconductors Reports (2014). http://www.itrs2.net/itrs-reports.html

  2. Acharya, A., Uysal, M., Saltz, J.H.: Active disks: programming model, algorithms and evaluation. In: ASPLOS (1998)

    Google Scholar 

  3. Boral, H., De Witt, D.J.: Database machines: an idea whose time has passed? A critique of the future of database machines. In: Parallel Architectures for Database Systems (1989)

    Google Scholar 

  4. Cho, S., Park, C., Oh, H., Kim, S., Yi, Y., Ganger, G.R.: Active disk meets flash. In: Proceedings 27th International Conference on Supercomputing - ICS, p. 91. ACM Press (2013)

    Google Scholar 

  5. De, A., Gokhale, M., Gupta, R., Swanson, S.: Minerva: accelerating data analysis in next-generation SSDs. In: 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines, pp. 9–16. IEEE, April 2013

    Google Scholar 

  6. DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. Commun. ACM 35, 85–98 (1992)

    Article  Google Scholar 

  7. Eddelbuettel, D.: Seamless R and C++ integration with Rcpp. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-6868-4

    Book  MATH  Google Scholar 

  8. Gray, J., Shenoy, P.J.: Rules of thumb in data engineering. In: Proceedings ICDE, p. 3 (2000)

    Google Scholar 

  9. Gu, B., et al.: Biscuit: a framework for near-data processing of big data workloads. In: ACM/IEEE 43rd Annual International Symposium on Computer Architecture, vol. 8, pp. 153–165. IEEE, June 2016

    Google Scholar 

  10. Hardock, S., Petrov, I., Gottstein, R., Buchmann, A.: NoFTL: database systems on FTL-less flash storage. Proc. VLDB Endow. (2013)

    Google Scholar 

  11. István, Z., Sidler, D., Alonso, G.: Caribou. Proc. VLDB Endow. 10(11), 1202–1213 (2017)

    Article  Google Scholar 

  12. Keeton, K., Patterson, D.A., Hellerstein, J.M.: A case for intelligent disks (IDISKS). SIGMOD Rec. 27(3), 42–52 (1998)

    Article  Google Scholar 

  13. Kim, S., Oh, H., Park, C., Cho, S., Lee, S.W., Moon, B.: In-storage processing of database scans and joins. Inf. Sci. (Ny) 327, 183–200 (2016)

    Article  Google Scholar 

  14. Minutoli, M., Kuntz, S.K., Tumeo, A., Kogge, P.M.: Implementing Radix Sort on Emu 1. Work. Near-Data Process, pp. 1–6 (2015)

    Google Scholar 

  15. Riedel, E., Gibson, G.A., Faloutsos, C.: Active storage for large-scale data mining and multimedia. In: Proceedings of the 24th International Conference on Very Large Data Bases, pp. 62–73. VLDB, Morgan Kaufmann Publishers Inc., San Francisco (1998)

    Google Scholar 

  16. Vinçon, T., Hardock, S., Riegger, C., Oppermann, J., Koch, A., Petrov, I.: NoFTL-KV: Tacklingwrite-amplification on KV-stores with native storage management. In: EDBT (2018)

    Google Scholar 

  17. Weil, S.A., Brandt, S.A., Miller, E.L., Long, D.D.E., Maltzahn, C.: Ceph: a scalable, high-performance distributed file system. In: OSDI (2006)

    Google Scholar 

  18. Weil, S.A., Leung, A.W., Brandt, S.A., Maltzahn, C.: RADOS: a scalable, reliable storage service for petabyte-scale storage clusters. In: PDSW (2007)

    Google Scholar 

  19. Woods, L., Teubner, J., Alonso, G.: Less watts, more performance. In: Proceedings 2013 Int. Conference Management of Data - SIGMOD, p. 1073. ACM Press, New York (2013)

    Google Scholar 

Download references

Acknowledgements

This work has been partially supported by HAW Promotion MWK, Baden-Würrtemberg and BMBF PANDAS 01IS18081C/D.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tobias Vinçon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vinçon, T., Hardock, S., Riegger, C., Koch, A., Petrov, I. (2019). nativeNDP: Processing Big Data Analytics on Native Storage Nodes. In: Welzer, T., Eder, J., Podgorelec, V., Kamišalić Latifić, A. (eds) Advances in Databases and Information Systems. ADBIS 2019. Lecture Notes in Computer Science(), vol 11695. Springer, Cham. https://doi.org/10.1007/978-3-030-28730-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-28730-6_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-28729-0

  • Online ISBN: 978-3-030-28730-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics