Skip to main content

On the Provenance Extraction Techniques from Large Scale Log Files: A Case Study for the Numerical Weather Prediction Models

  • Conference paper
  • First Online:
Euro-Par 2020: Parallel Processing Workshops (Euro-Par 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12480))

Included in the following conference series:

Abstract

Day by day, severe meteorological events increasingly highlight the importance of fast and accurate weather forecasting. There are various Numerical Weather Prediction (NWP) models worldwide that are run on either a local or a global scale to predict future weather. NWP models typically take hours to finish a complete run, however, depending on the input parameters and the size of the forecast domain. Provenance information is of central importance for detecting unexpected events that may develop during model execution, and also for taking necessary action as early as possible. Besides, the need to share scientific data and results between researchers or scientists also highlights the importance of data quality and reliability. In this study, we develop a framework for tracking The Weather Research and Forecasting (WRF) model and for generating, storing, and analyzing provenance data. We develop a machine-learning-based log parser to enable the proposed system to be dynamic and adaptive so that it can adapt to different data and rules. The proposed system enables easy management and understanding of numerical weather forecast workflows by providing provenance graphs. By analyzing these graphs, potential faulty situations that may occur during the execution of WRF can be traced to their root causes. Our proposed system has been evaluated and has been shown to perform well even in a high-frequency provenance information flow.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.ncdc.noaa.gov/data-access/model-data/model-datasets/global-forcast-system-gfs.

  2. 2.

    https://www.ecmwf.int/en/research/modelling-and-prediction.

  3. 3.

    https://www.mgm.gov.tr.

  4. 4.

    https://spark.apache.org/mllib.

References

  1. Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. ACM SIGMOD Rec. 34(3), 31–36 (2005). https://doi.org/10.1145/1084805.1084812

    Article  Google Scholar 

  2. Missier, P., Belhajjame, K., Cheney, J.: The W3C PROV family of specifications for modelling provenance metadata. In: Proceedings of the 16th International Conference on Extending Database Technology, pp. 773–776 (2013). https://doi.org/10.1145/2452376.2452478

  3. Tufek, A., Gurbuz, A., Ekuklu, O.F., Aktas, M. S.: Provenance collection platform for the Weather Research and Forecasting Model. In: 2018 14th International Conference on Semantics, Knowledge and Grids (SKG), pp. 17–24 (2018). https://doi.org/10.1109/skg.2018.00009

  4. Simmhan, Y.L., Plale, B., Gannon, D.: A framework for collecting provenance in data-centric scientific workflows. In: 2006 IEEE International Conference on Web Services (ICWS06), pp. 427–436 (2006). https://doi.org/10.1109/icws.2006.5

  5. Indiana University, Pervasive Technology Institute. (n.d.). Karma. Pervasive Technology Institute website: https://pti.iu.edu/impact/open-source/karma.html. 12 Apr 2020

  6. Indiana University, Data To Insight Center (D2I). (n.d.). Komadu: Provenance collection and visualization tool based on W3C PROV standard, GitHub website: https://github.com/Data-to-Insight-Center/komadu. 12 Apr 2020

  7. Droegemeier, K.K., et al.: Linked environments for atmospheric discovery (LEAD): architecture, technology roadmap and deployment strategy. In: 21st Conference on Interactive Information Processing Systems for Meteorology, Oceanography, and Hydrology, January 2005

    Google Scholar 

  8. Aktas, M.S., Fox, G.C., Pierce, M., Oh, S.: XML metadata services. Concurrency Comput. Pract. Experience 20(7), 801–823 (2008). https://doi.org/10.1002/cpe.1276

    Article  Google Scholar 

  9. Aktas, M.S., Pierce, M.: High-performance hybrid information service architecture. Concurrency Comput. Pract. Experience 22(15), 2095–2123 (2010). https://doi.org/10.1002/cpe.1557

    Article  Google Scholar 

  10. Aktas, M.S., Fox, G.C., Pierce, M.: Information services for dynamically assembled semantic grids. In: 2005 First International Conference on Semantics, Knowledge and Grid, pp. 10–10 (2005). https://doi.org/10.1109/skg.2005.83

  11. Jensen, S., Plale, B., Aktas, M.S., Luo, Y., Chen, P., Conover, H.: Provenance capture and use in a satellite data processing pipeline. IEEE Trans. Geosci. Remote Sens. 51(11), 5090–5097 (2013). https://doi.org/10.1109/TGRS.2013.2266929

    Article  Google Scholar 

  12. Moreau, L., et al.: The open provenance model core specification (v1.1). Future Gener. Comput. Syst. 27(6), 743–756 (2011). https://doi.org/10.1016/j.future.2010.07.005

    Article  Google Scholar 

  13. Shu, Y., Taylor, K., Hapuarachchi, P., Peters, C.: Modelling provenance in hydrologic science: a case study on streamflow forecasting. J. Hydroinformatics 14(4), 944–959 (2012). https://doi.org/10.2166/hydro.2012.134

    Article  Google Scholar 

  14. Bernardet, L., Carson, L., Tallapragada, V.: The design of a modern information technology infrastructure to facilitate research-to-operations transition for NCEP’s modeling suites. Bull. Am. Meteor. Soc. 98(5), 899–904 (2017). https://doi.org/10.1175/bams-d-15-00139.1

    Article  Google Scholar 

  15. McCallumzy, A., Nigamy, K., Renniey, J., Seymorey, K.: Building domain-specific search engines with machine learning techniques. In: Proceedings of the AAAI Spring Symposium on Intelligent Agents in Cyberspace, pp. 28–39 (1999). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.4717

  16. Boyan, J., Freitag, D., Joachims, T.: A machine learning architecture for optimizing web search engines. In: AAAI Workshop on Internet Based Information Systems, pp. 1–8 (1996). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.9172

  17. Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G.: Finding high-quality content in social media. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 183–194 (2008). https://doi.org/10.1145/1341531.1341557

  18. Neethu, M.S., Rajasree, R.: Sentiment analysis in Twitter using machine learning techniques. In: 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), pp. 1–5 (2013). https://doi.org/10.1109/ICCCNT.2013.6726818

  19. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86 (2002). https://doi.org/10.3115/1118693.1118704

  20. Groth, P., Moreau, L. (Eds.). (n.d.). PROV-Overview: An Overview of the PROV Family of Documents. https://www.w3.org/TR/prov-overview. 12 Apr 2020

  21. Baeth, M., Aktas, M.: Detecting misinformation in social networks using provenance data. Concurrency Comput. Pract. Experience 31(3), e4793 (2019)

    Article  Google Scholar 

  22. Baeth, M., Aktas, M.: An approach to custom privacy policy violation detection problems using big social provenance data. Concurrency Comput. Pract. Experience 30(21), e4690 (2018)

    Article  Google Scholar 

  23. Riveni, M., Nguyen, T., Aktas, M.S., Dustdar, S.: Application of provenance in social computing: a case study. Concurrency Comput. Pract. Experience 31(3), e4894 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alper Tufek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tufek, A., Aktas, M.S. (2021). On the Provenance Extraction Techniques from Large Scale Log Files: A Case Study for the Numerical Weather Prediction Models. In: Balis, B., et al. Euro-Par 2020: Parallel Processing Workshops. Euro-Par 2020. Lecture Notes in Computer Science(), vol 12480. Springer, Cham. https://doi.org/10.1007/978-3-030-71593-9_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-71593-9_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-71592-2

  • Online ISBN: 978-3-030-71593-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics