Advertisement

Cluster Computing

, Volume 18, Issue 1, pp 29–40 | Cite as

In-situ feature-based objects tracking for data-intensive scientific and enterprise analytics workflows

  • Solomon Lasluisa
  • Fan Zhang
  • Tong Jin
  • Ivan Rodero
  • Hoang Bui
  • Manish Parashar
Article

Abstract

Emerging scientific simulations on leadership class systems are generating huge amounts of data and processing this data in an efficient and timely manner is critical for generating insights from the simulations. However, the increasing gap between computation and disk I/O speeds makes traditional data analytics pipelines based on post-processing cost prohibitive and often infeasible. In this paper, we investigate an alternate approach that aims to bring the analytics closer to the data using in-situ execution of data analysis operations. Specifically, we present the design, implementation and evaluation of a framework that can support in-situ feature-based objects tracking on distributed scientific datasets. Central to this framework is a scalable decentralized and online clustering, a cluster tracking algorithm, which executes in-situ (on different cores) in parallel with the simulation processes, and retrieves data from the simulations directly via on-chip shared memory. The results from our experimental evaluation demonstrate that the in-situ approach significantly reduces the cost of data movement, that the presented framework can support scalable feature-based objects tracking, and that it can be effectively used for in-situ analytics in large scale simulations.

Keywords

Simulations workflows Scientific data analysis Scalable in-situ data analytics Feature-based objects tracking 

Notes

Acknowledgments

The research presented in this work is supported in part by US National Science Foundation (NSF) via Grants numbers OCI 1310283, DMS 1228203, IIP 0758566, OCI 1339036 and CNS 1305375, by the Director, Office of Advanced Scientific Computing Research, Office of Science, of the U.S. Department of Energy through the Scientific Discovery through Advanced Computing (SciDAC) Institute of Scalable Data Management, Analysis and Visualization (SDAV) under ward number DE-SC0007455, the Advanced Scientific Computing Research and Fusion Energy Sciences Partnership for Edge Physics Simulations (EPSI) under award number DE-FG02-06ER54857, the ExaCT Combustion Co-Design Center via subcontract number 4000110839 from UT Battelle, and by an IBM Faculty Award. This work used the Extreme Science and Engineering Discovery Environment (XSEDE) under project number TG-CCR110035, which is supported by NSF grant number OCI 1053575. The research was conducted as part of the NSF Cloud and Autonomic Computing (CAC) Center at Rutgers University and the Rutgers Discovery Informatics Institute (RDI2). We thank Dr. Deborah Silver and Sedat Ozer for useful discussions on data visualization and providing the scientific dataset for our experimental evaluation.

References

  1. 1.
    Childs, H.: Architectural challenges and solutions for petascale postprocessing. J. Phys. 78(1), 12 (2007)Google Scholar
  2. 2.
    Gamell, M., Rodero, I., Parashar, M., Poole, S.: “Exploring energy and performance behaviors of data-intensive scientific workflows on systems with deep memory hierarchies”. In: Proceedings of the 20th International Conference on High Performance Computing (HiPC), pp. 1–10. (2013)Google Scholar
  3. 3.
    Zhang, F., Docan, C., Parashar, M., Klasky, S.: “Dads: a dynamic and adaptive data space for interacting parallel applications”. In: Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2010), Marina Del Rey (2010)Google Scholar
  4. 4.
    Bennett, J.C., Abbasi, H., Bremer, P.-T., Grout, R., Gyulassy, A., Jin, T., Klasky, S., Kolla, H., Parashar, M., Pascucci, V., Pebay, P., Thompson, D., Yu, H., Zhang, F., Chen, J.: “Combining in-situ and in-transit processing to enable extreme-scale scientific analysis”. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC ’12, 2012, pp. 49:1–49:9Google Scholar
  5. 5.
    Gamell, M., Rodero, I., Parashar, M., Bennett, J., et al.: “Exploring power behaviors and tradeoffs of in-situ data analytics”. In: International Conferencce on High Performance Computing Networking, Storage and Analysis (SC), pp. 1–12. Denver, Nov 2013Google Scholar
  6. 6.
    Quiroz, A., Parashar, M., Gnanasambandam, N., Sharma, N.: “Design and evaluation of decentralized online clustering”. ACM Trans. Auton. Adapt. Syst. 7(3), 34:1–34:31 (2012). doi: 10.1145/2348832.2348837 CrossRefGoogle Scholar
  7. 7.
    Quiroz, A., Gnanasambandam, N., Parashar, M., Sharma, N.: Robust clustering analysis for the management of self-monitoring distributed systems. Clust. Comput. 12(1), 73–85 (Mar. 2009)Google Scholar
  8. 8.
    Chen, J.H., Choudhary, A., de Supinski, B., DeVries, M., Hawkes, E.R., Klasky, S., Liao, W.K., Ma, K.L., Mellor-Crummey, J., Podhorski, N., Sankaran, R., Shende, S., Yoo, C.S.: Terascale direct numerical simulations of turbulent combustion using s3d. Comput. Sci. Discov. 2, 1–31 (2009)CrossRefGoogle Scholar
  9. 9.
    Docan, C., Parashar, M., Klasky, S.: “Dataspaces: an interaction and coordination framework for-coupled simulation workflows”. Clust. Comput. 15(2), 163–181 (2012). doi: 10.1007/s10586-011-0162-y CrossRefGoogle Scholar
  10. 10.
    Podhorszki, N., Klasky, S., Liu, Q., Docan, C., Parashar, M., Abbasi, H., Lofstead, J., Schwan, K., Wolf, M., Zheng, F., Cummings, J.: “Plasma fusion code coupling using scalable i/o services and scientific workflows”. In: Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, ser. WORKS ’09, pp. 8:1–8:9. ACM, New York, (2009) doi: 10.1145/1645164.1645172
  11. 11.
    Pak, A., Paroubek, P.: “Twitter as a corpus for sentiment analysis and opinion mining”. In: LREC, Baton Rouge (2010)Google Scholar
  12. 12.
    Zhang, F., Docan, C., Parashar, M., Klasky, S., Podhorszki, N., Abbasi, H.: “Enabling in-situ execution of coupled scientific workflow on multi-core platform”. In: Proceedings of 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS’12), (2012)Google Scholar
  13. 13.
    Quiroz, A.: Decentralized online clustering for supporting autonomic management of distributed systems. Ph.D in Electrical and Computer Engineering, Rutgers University, (2010)Google Scholar
  14. 14.
    Schmidt, C., Parashar, M.: “Flexible information discovery in decentralized distributed systems”. In: Proceedings of the 12th High Performance Distributed Computing (HPDC), pp. 226–235. (2003)Google Scholar
  15. 15.
    Yu, H., Wang, C., Grout, R., Chen, J., Ma, K.-L.: In situ visualization for large-scale combustion simulations. IEEE Comput. Graph. Appl. 30(3), 45–57 (2010)CrossRefGoogle Scholar
  16. 16.
    Kim, J., Abbasi, H., Chacon, L., Docan, C., Klasky, S., Liu, Q., Podhorszki, N., Shoshani, A., Wu, K.: “Parallel in situ indexing for data-intensive computing”. In: Proceedings of IEEE Symposium on Large Data Analysis and Visualization (LDAV’11), Oct (2011)Google Scholar
  17. 17.
    Whitlock, B., Favre, J.M., Meredith, J.S.: “Parallel in situ coupling of simulation with a fully featured visualization system”. In: Proceedings of 11th Eurographics Symposium on Parallel Graphics and Visualization (EGPGV’11), Apr (2011)Google Scholar
  18. 18.
    Fabian, N., Moreland, K., Thompson, D., Bauer, A., Marion, P., Gevecik, B., Rasquin, M., Jansen, K.: “The paraview coprocessing library: a scalable, general purpose in situ visualization library”. In Proceedings of IEEE Symposium on Large Data Analysis and Visualization (LDAV’11), Oct (2011)Google Scholar
  19. 19.
    Abbasi, H., Wolf, M., Eisenhauer, G., Klasky, S., Schwan, K., Zheng, F.: “Datastager: scalable data staging services for petascale applications”. In: Proceedings of 18th International Symposium on High Performance Distributed Computing (HPDC’09), (2009)Google Scholar
  20. 20.
    Zheng, F., Abbasi, H., Docan, C., Lofstead, J., Klasky, S., Liu, Q., Parashar, M., Podhorszki, N., Schwan, K., Wolf, M.: “PreDatA - preparatory data analytics on peta-scale machines”. In: Proceedings of 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS’10), Apr (2010)Google Scholar
  21. 21.
    Abbasi, H., Eisenhauer, G., Wolf, M., Schwan, K., Klasky, S.: “Just in time: adding value to the IO pipelines of high performance applications with JIT staging”. In: Proceedings 20th International Symposium on High Performance Distributed Computing (HPDC’11), June (2011)Google Scholar
  22. 22.
    Docan, C., Parashar, M., Cummings, J., Klasky, S.: “Moving the code to the data - dynamic code deployment using active spaces”. In: Proceedings of 25th IEEE International Parallel and Distributed Processing Symposium (IPDPS’11), May (2011)Google Scholar
  23. 23.
    Vishwanath, V., Hereld, M., Papka, M.: “Toward simulation-time data analysis and i/o acceleration on leadership-class systems”. In: Proceedings of IEEE Symposium on Large Data Analysis and Visualization (LDAV’11), Oct 2011Google Scholar
  24. 24.
    Gelernter, D.: Generative communication in Linda. ACM Trans. Programm. Lang. Syst. 7(1), 80–112 (1985)CrossRefzbMATHGoogle Scholar
  25. 25.
    Zhang, L., Parashar, M.: “A dynamic geometry-based shared space interaction framework for parallel scientific applications”. In: Proceedings of the 11th International Conference on High Performance Computing (HiPC’04), 2004Google Scholar
  26. 26.
    “Enabling efficient and flexible coupling of parallel scientific applications”. In: Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS’06), 2006Google Scholar
  27. 27.
    Docan, C., Parashar, M., Klasky, S.: “DataSpaces: an interaction and coordination framework for coupled simulation workflows”. In: Proceedings of 19th International Symposium on High Performance and Distributed Computing (HPDC’10), June 2010Google Scholar
  28. 28.
    Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: theory and practice. IEEE Trans. Knowl. Data Eng. 15(3), 515–528 (2003)CrossRefGoogle Scholar
  29. 29.
    Charikar, M., O’Callaghan, L., Panigrahy, R.: “Better streaming algorithms for clustering problems”. In: Proceedings of the Thirty-fifth Annual ACM Symposium on Theory of Computing, pp. 30–39. (2003)Google Scholar
  30. 30.
    Aggarwal, C.C., Watson, T.J., Ctr, R., Han, J., Wang, J., Yu, P.S.: “A framework for clustering evolving data streams”. In: VLDB, pp. 81–92. (2003)Google Scholar
  31. 31.
    O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: “Streaming-data algorithms for high-quality clustering”. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE) pp. 0685-0685. IEEE Computer Society (2013)Google Scholar
  32. 32.
    Csernel, B., Clerot, F., Hbrail, G.: “Streamsamp: datastream clustering over tilted windows through sampling”. In: ECML PKDD 2006 Workshop on Knowledge Discovery from Data Streams, (2006)Google Scholar
  33. 33.
    Abrantes, A.J.,Marques, J.S.: “A method for dynamic clustering of data”. In: British Machine Vision Conference, (1998)Google Scholar
  34. 34.
    Silver, D., Wang, X.: Tracking and visualizing turbulent 3d features. IEEE Trans. Visual. Comput. Graph. 3(2), 129–141 (1997)Google Scholar
  35. 35.
    Chen, J., Silver, D., Parashar, M.: “Real-time feature extraction and tracking in a computational steering environment”. In: Proceedings of Advanced Simulations Technologies Conference (ASTC’03), (2003)Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Solomon Lasluisa
    • 1
  • Fan Zhang
    • 1
  • Tong Jin
    • 1
  • Ivan Rodero
    • 1
  • Hoang Bui
    • 1
  • Manish Parashar
    • 1
  1. 1.Rutgers Discovery Informatics Institute, NSF Cloud and Autonomic Computing CenterRutgers UniversityPiscatawayUSA

Personalised recommendations