Advertisement

Distributed and Parallel Databases

, Volume 37, Issue 1, pp 209–231 | Cite as

DeStager: feature guided in-situ data management in distributed deep memory hierarchies

  • Xuechen ZhangEmail author
  • Fang Zheng
  • Bao Nguyen
Article
  • 88 Downloads
Part of the following topical collections:
  1. Special Issue on Scientific and Statistical Data Management

Abstract

In-situ analytics have been increasingly adopted by leadership scientific applications to gain fast insights into massive output data of simulations. With the current practice, systems buffer the output data in DRAM for analytics processing, constraining it to DRAM capacity un-used by the simulation. The rapid growth of data size requires alternative approaches to accommodating data-rich analytics, such as using solid-state disks to increase effective memory capacity. For this purpose, this paper explores software solutions for exploring the deep memory hierarchies expected on future high-end machines. Leveraging the fact that many analytics are sensitive to data features (regions-of-interest) hidden in the data being processed, the approach incorporates the knowledge of the data features into in-situ data management. It uses adaptive index creation/refinement to reduce the overhead of index management. In addition, it uses data features to predict data skew and improve load balance through controlling data distribution and placement on distributed staging servers. The experimental results show that such feature-guided optimizations achieve substantial improvements over state-of-the-art approaches for managing output data in-situ.

Keywords

Indexing R-tree Octree In-situ Analytics SSDs 

Notes

Acknowledgements

This research was supported in part by NSF ACI-1565338 and WSU Vancouver Research Grant.

References

  1. 1.
    Abbasi, H., Wolf, M., Eisenhauer, G., Klasky, S., Schwan, K., Zheng, F.: Datastager: scalable data staging services for petascale applications. In: HPDC (2009)Google Scholar
  2. 2.
    ADIOS. Adios.: Adaptive i/o system. http://www.olcf.ornl.gov/center-projects/adios/ (2012)
  3. 3.
    Al-Furaih, I., Aluru, S., Goil, S., Ranka, S.: Parallel construction of multidimensional binary search trees. In: ICS (1996)Google Scholar
  4. 4.
    Caulfield, A.M., Grupp, L.M., Swanson, S.: Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications. In: ASPLOS (2009)Google Scholar
  5. 5.
    Center-wide Scrach Filesystem Atlas.: https://www.olcf.ornl.gov/kb_articles/atlas-transition/
  6. 6.
    Chen, F., Koufaty, D.A., Zhang, X.: Hystor: making the best use of solid state drives in high performance storage systems. In: ICS (2011)Google Scholar
  7. 7.
    Chen, G., Vo, H.T., Wu, S., Ooi, B.C., Özsu, M.T.: A framework for supporting dbms-like indexes in the cloud. PVLDB 4(11), 702–713 (2011)Google Scholar
  8. 8.
    Dayal, J., Bratcher, D., Eisenhauer, G., Schwan, K., Wolf, M., Zhang, X., Abbasi, H., Klasky, S., Podhorszki, N.: Flexpath: type-based publish/subscribe system for large-scale science analytics. In: CCGrid (2014)Google Scholar
  9. 9.
    Evpath.: An event transport middleware layer. http://www.cc.gatech.edu/systems/projects/EVPath/
  10. 10.
    Hawkes, J.C.S.E.R., Sankaran, R., Chen, J.H.: Direct numerical simulation of turbulent combustion: fundamental insights towards predictive models. J. Phys. 16, 65–79 (2005)Google Scholar
  11. 11.
    Eisenhauer, G., Wolf, M., Abbasi, H., Schwan, K.: Event-based systems: opportunities and challenges at exascale. In: DEBS (2009)Google Scholar
  12. 12.
    Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Yormark, B. (ed) SIGMOD (1984)Google Scholar
  13. 13.
    He, J., Bennett, J., Snavely, A.: Dash-IO: an empirical study of flash-based IO for PHC. In: TG (2010)Google Scholar
  14. 14.
    He, J., Jagatheesan, A., Gupta, S., Bennett, J., Snavely, A.: Dash: a recipe for a flash-based data intensive supercomputer. In: SC (2010)Google Scholar
  15. 15.
    Heikkinen, J.A., Janhunen, S.J., Kiviniemi, T.P., Ogando, F.: Full f gyrokinetic method for particle simulation of tokamak transport. J. Comput. Phys. 227(11), 5582–5609 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Jin, T., Zhang, F., Sun, Q., Bui, H., Parashar, M., Yu, H., Klasky, S., Podhorszki, N., Abbasi, H.: Using cross-layer adaptations for dynamic data management in large scale coupled scientific workflows. In: SC, p. 74 (2013)Google Scholar
  17. 17.
    Jin, T., Zhang, F., Sun, Q., Bui, H., Romanus, M., Podhorszki, N., Klasky, S., Kolla, H., Chen, J., Hager, R., Chang, C.S., Parashar, M.: Exploring data staging across deep memory hierarchies for coupled data intensive simulation workflows. In: IPDPS (2015)Google Scholar
  18. 18.
    Jung, M., Wilson III, E.H., Choi, W., Shalf, J., Aktulga, H.M., Yang, C., Saule, E., Catalyurek, U.V., Kandemir, M.: Exploring the future of out-of-core computing with compute-local non-volatile memory. In: SC (2013)Google Scholar
  19. 19.
    Kim, J., Abbasi, H., Chacón, L., Docan, C., Klasky, S., Liu, Q., Podhorszki, N., Shoshani, A., Wu, K.: Parallel in situ indexing for data-intensive computing. In: LDAV, pp. 65–72 (2011)Google Scholar
  20. 20.
    Klasky, S., Ethier, S., Lin, Z., Martins, K., McCune, D., Samtaney, R.: Grid -based parallel data streaming implemented for the gyrokinetic toroidal code. In: SC ’03 (2003)Google Scholar
  21. 21.
    Lakshminarasimhan, S., Boyuka, D.A., Pendse, S.V., Zou, X., Jenkins, J., Vishwanath, V., Papka, M.E., Samatova, N.F.: Scalable in situ scientific data encoding for analytical query processing. In: HPDC’13Google Scholar
  22. 22.
    Lakshminarasimhan, S., Boyuka, D.A., Pendse, S.V., Zou, X., Jenkins, J., Vishwanath, V., Papka, M.E., Samatova, N.F.: Scalable in situ scientific data encoding for analytical query processing. In: HPDC (2013)Google Scholar
  23. 23.
    Lashuk, I., Chandramowlishwaran, A., Langston, H., Nguyen, T.-A., Sampath, R., Shringarpure, A., Vuduc, R., Ying, L., Zorin, D., Biros, G.: A massively parallel adaptive fast multipole method on heterogeneous architectures. In: SC (2009)Google Scholar
  24. 24.
    Lee, D., Vuduc, R., Gray, A.G.: A distributed kernel summation framework for general-dimension machine learning. In: SDM (2012)Google Scholar
  25. 25.
    Lee, T., Moon, B., Lee, S.: Bulk insertion for r-trees by seeded clustering. Data Knowl. Eng. 59(1), 86–106 (2006)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Liu, N., Cope, J., Carns, P.H., Carothers, C.D., Ross, R.B., Grider, G., Crume, A., Maltzahn, C.: On the role of burst buffers in leadership-class storage systems. In: MSST, pp. 1–11 (2012)Google Scholar
  27. 27.
    Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3d surface construction algorithm. In: SIGGRAPH (1987)Google Scholar
  28. 28.
    Mehta, D.P., Sahni, S.: Handbook of Algorithms and Data Structures. Chapman and Hall, London (2004)Google Scholar
  29. 29.
    Moon, B., Jagadish, H.V., Faloutsos, C., Saltz, J.H.: Analysis of the clustering properties of the hilbert space-filling curve. Trans. Knowl. Data Eng. 13(1), 124–141 (2001)CrossRefGoogle Scholar
  30. 30.
    Nam, B., Sussman, A.: Spatial indexing of distributed multidimensional datasets. In: CCGRID, pp. 743–750 (2005)Google Scholar
  31. 31.
    Nam, B., Sussman, A.: Dist: fully decentralized indexing for querying distributed multidimensional datasets. In: IPDPS (2006)Google Scholar
  32. 32.
    Nguyen, B., Tan, H., Zhang, X.: Large-scale adaptive mesh simulations through non-volatile byte-addressable memory. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, Denver, CO (2017)Google Scholar
  33. 33.
    Plimpton, S.: Fast parallel algorithms for short-range molecular dynamics. J. Comput. Phys. 117(1), 1–19 (1995)CrossRefzbMATHGoogle Scholar
  34. 34.
    Prabhakar, R., Vazhkudai, S.S., Kim, Y., Butt, A.R., Li, M., Kandemir, M.: Provisioning a multi-tiered data staging area for extreme-scale machines. In: The 31st International Conference on Distributed Computing Systems (2011)Google Scholar
  35. 35.
    Rajachandrasekar, R., Ouyang, X., Besseron, X., Meshram, V., Panda, D.K.: Can checkpoint/restart mechanisms benefit from hierarchical data staging? Euro-Par Workshops 2, 312–321 (2011)Google Scholar
  36. 36.
    Reliable UDP networking library.: http://enet.bespin.org/
  37. 37.
    Schnitzer, B., Leutenegger, S.T.: Master-client R-trees: a new parallel r-tree architecture. In: SSDBM (1999)Google Scholar
  38. 38.
    Shekhar, R., Fayyad, E., Yagel, R., Cornhill, J.F.:. Octree-based decimation of marching cubes surfaces. In: VIS (1996)Google Scholar
  39. 39.
    Su, Y., Wang, Y., Agrawal, G.: In-situ bitmaps generation and efficient data analysis based on bitmaps. In: HPDC (2015)Google Scholar
  40. 40.
  41. 41.
  42. 42.
    Vetter, J.S., Mittal, S.: Opportunities for nonvolatile memory systems in extreme-scale high-performance computing. Comput. Sci. Eng. 17(2), 73–82 (2015)CrossRefGoogle Scholar
  43. 43.
    Wang, C., Vazhkudai, S.S., Ma, X., Meng, F., Kim, Y., Engelmann, C.: Nvmalloc: Exposing an aggregate SSD store as a memory partition in extreme-scale machines. In: IPDPS, pp. 957–968 (2012)Google Scholar
  44. 44.
    Wolf, M., Cai, Z., Huang, W., Schwan, K.: Smartpointers: personalized scientific data portals in your hand. In: SC, pp. 1–16 (2002)Google Scholar
  45. 45.
    Yang, Q., Ren, J.: I-cash: Intelligently coupled array of SSD and HDD. In: HPCA (2011)Google Scholar
  46. 46.
    Yu, H., Wang, C., Grout, R.W., Chen, J.H., Ma, K.-L.: In situ visualization for large-scale combustion simulations. IEEE Comput. Graph. Appl. 30(3), 45–57 (2010)CrossRefGoogle Scholar
  47. 47.
    Zhang, W., Tang, H., Ranshous, S., Byna, S., Martn, D.F., Wu, K., Dong, B., Klasky, S., Samatova, N.F.: Exploring memory hierarchy and network topology for runtime AMR data sharing across scientific applications. In: Big Data (2016)Google Scholar
  48. 48.
    Zhang, X., Zheng, F., Schwan, K., Wolf, M.: Flashstager: improving the performance of SSD-based data staging systems via write redirection. In: CLUSTER (2016)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Engineering and Computer ScienceWashington State University VancouverVancouverUSA
  2. 2.IBM T. J. Watson Research CenterNew YorkUSA

Personalised recommendations