Skip to main content

BiDaML in Practice: Collaborative Modeling of Big Data Analytics Application Requirements

Part of the Communications in Computer and Information Science book series (CCIS,volume 1375)

Abstract

Using data analytics to improve industrial planning and operations has become increasingly popular and data scientists are more and more in demand. However, complex data analytics-based software development is challenging. It involves many new roles lacking in traditional software engineering teams – e.g. data scientists and data engineers; use of sophisticated machine learning (ML) approaches replacing many programming tasks; uncertainty inherent in the models; as well as interfacing with models to fulfill software functionalities. These challenges make communication and collaboration within the team and with external stakeholders challenging. In this paper, we describe our experiences in applying our BiDaML (Big Data Analytics Modeling Languages) approach to several large-scale industrial projects. We used our BiDaML modeling toolset that brings all stakeholders around one tool to specify, model and document their big data applications. We report our experience in using and evaluating this tool on three real-world, large-scale applications with teams from: realas.com – a property price prediction website for home buyers; VicRoads – a project seeking to build a digital twin (simulated model) of Victoria’s transport network updated in real-time by a stream of sensor data from inductive loop detectors at traffic intersections; and the Alfred Hospital – Intracranial hemorrhage (ICH) prediction through Computed Tomography (CT) Scans. These show that our approach successfully supports complex data analytics software development in industrial settings.

Keywords

  • Big data analytics
  • Big data modeling
  • Big data toolkits
  • BiDaML
  • Domain specific visual languages
  • End-user tools

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-70006-5_5
  • Chapter length: 24 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-70006-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.

Notes

  1. 1.

    https://realas.com/.

  2. 2.

    https://www.vicroads.vic.gov.au/traffic-and-road-use/traffic-management/traffic-signals/scats.

  3. 3.

    https://studio.azureml.net/.

  4. 4.

    https://aws.amazon.com/machine-learning/.

  5. 5.

    https://cloud.google.com/ai-platform.

  6. 6.

    https://bigml.com/.

  7. 7.

    https://bidaml.web.app/.

  8. 8.

    https://github.com/tarunverma23/bidaml.

  9. 9.

    https://github.com/yusufades/vue-graphViz.

References

  1. BiDaML big data analytics modeling languages. http://bidaml.visualmodel.org/

  2. Baker, M.: 1,500 scientists lift the lid on reproducibility (2016)

    Google Scholar 

  3. Bishop, C.M.: Model-based machine learning. Philos. Trans. Roy. Soc. A: Math. Phys. Eng. Sci. 371(1984), 20120222 (2013)

    MathSciNet  CrossRef  Google Scholar 

  4. Breuker, D.: Towards model-driven engineering for big data analytics-an exploratory analysis of domain-specific languages for machine learning. In: 2014 47th Hawaii International Conference on System Sciences, pp. 758–767. IEEE (2014)

    Google Scholar 

  5. Callahan, S.P., Freire, J., Santos, E., Scheidegger, C.E., Silva, C.T., Vo, H.T.: VisTrails: visualization meets data management. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 745–747 (2006)

    Google Scholar 

  6. Cleary, P., Thomas, D., Bolger, M., Hetherton, L., Rucinski, C., Watkins, D.: Using workspace to automate workflow processes for modelling and simulation in engineering. in ‘modsim2015. In: 21st International Congress on Modelling and Simulation’, Broadbeach, Queensland, Australia, pp. 669–675 (2015)

    Google Scholar 

  7. Dwyer, T., Marriott, K., Wybrow, M.: Dunnart: a constraint-based network diagram authoring tool. In: Tollis, I.G., Patrignani, M. (eds.) GD 2008. LNCS, vol. 5417, pp. 420–431. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00219-9_41

    CrossRef  MATH  Google Scholar 

  8. Kamalrudin, M., Hosking, J., Grundy, J.: MaramaAIC: tool support for consistency management and validation of requirements. Autom. Softw. Eng. 24(1), 1–45 (2017)

    CrossRef  Google Scholar 

  9. Khalajzadeh, H., Abdelrazek, M., Grundy, J., Hosking, J., He, Q.: A survey of current end-user data analytics tool support. In: 2018 IEEE International Congress on Big Data (BigData Congress), pp. 41–48. IEEE (2018)

    Google Scholar 

  10. Khalajzadeh, H., Abdelrazek, M., Grundy, J., Hosking, J., He, Q.: BiDaML: a suite of visual languages for supporting end-user data analytics. In: 2019 IEEE International Congress on Big Data (BigDataCongress), pp. 93–97. IEEE (2019)

    Google Scholar 

  11. Khalajzadeh, H., Abdelrazek, M., Grundy, J., Hosking, J., He, Q.: Survey and analysis of current end-user data analytics tool support. IEEE Trans. Big Data (01), 1 (2019). https://doi.org/10.1109/TBDATA.2019.2921774

  12. Khalajzadeh, H., Simmons, A.J., Abdelrazek, M., Grundy, J., Hosking, J., He, Q.: An end-to-end model-based approach to support big data analytics development. J. Comput. Lang. 58, 100964 (2020)

    Google Scholar 

  13. Khalajzadeh, H., Simmons, A.J., Abdelrazek, M., Grundy, J., Hosking, J., He, Q.: End-user-oriented tool support for modeling data analytics requirements. In: 2020 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 1–4. IEEE (2020)

    Google Scholar 

  14. Khalajzadeh, H., Simmons, A.J., Abdelrazek, M., Grundy, J., Hosking, J.G., He, Q.: Visual languages for supporting big data analytics development. In: ENASE, pp. 15–26 (2020)

    Google Scholar 

  15. Khalajzadeh, H., Verma, T., Simmons, A.J., Grundy, J., Abdelrazek, M., Hosking, J.: User-centred tooling for the modelling of big data applications. In: MODELS 2020: ACM/IEEE International Conference on Model Driven Engineering Languages and Systems. ACM (2020)

    Google Scholar 

  16. Kim, C.H., Grundy, J., Hosking, J.: A suite of visual languages for model-driven development of statistical surveys and services. J. Vis. Lang. Comput. 26, 99–125 (2015)

    CrossRef  Google Scholar 

  17. Kim, M., Zimmermann, T., DeLine, R., Begel, A.: Data scientists in software teams: state of the art and challenges. IEEE Trans. Software Eng. 44(11), 1024–1038 (2017)

    CrossRef  Google Scholar 

  18. Landset, S., Khoshgoftaar, T.M., Richter, A.N., Hasanin, T.: A survey of open source tools for machine learning with big data in the Hadoop ecosystem. J. Big Data 2(1), 24 (2015)

    CrossRef  Google Scholar 

  19. Li, L., Grundy, J., Hosking, J.: A visual language and environment for enterprise system modelling and automation. J. Vis. Lang. Comput. 25(4), 253–277 (2014)

    CrossRef  Google Scholar 

  20. Ludäscher, B., et al.: Scientific workflow management and the Kepler system. Concurr. Comput.: Pract. Exp. 18(10), 1039–1065 (2006)

    CrossRef  Google Scholar 

  21. Minka, T., Winn, J., Guiver, J., Knowles, D.: Infer.net 2.4, 2010. Microsoft research Cambridge (2010)

    Google Scholar 

  22. Moody, D.: The “physics” of notations: toward a scientific basis for constructing visual notations in software engineering. IEEE Trans. Softw. Eng. 35(6), 756–779 (2009)

    CrossRef  Google Scholar 

  23. OMG: Business process model and notation (BPMN) (2011). https://www.omg.org/spec/BPMN/2.0/

  24. Portugal, I., Alencar, P., Cowan, D.: A preliminary survey on domain-specific languages for machine learning in big data. In: 2016 IEEE International Conference on Software Science, Technology and Engineering (SWSTE), pp. 108–110. IEEE (2016)

    Google Scholar 

  25. Sculley, D., et al.: Hidden technical debt in machine learning systems. In: Advances in Neural Information Processing Systems, pp. 2503–2511 (2015)

    Google Scholar 

  26. Tolvanen, J.P., Rossi, M.: MetaEdit+ defining and using domain-specific modeling languages and code generators. In: Companion of the 18th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, pp. 92–93 (2003)

    Google Scholar 

  27. Wolstencroft, K., et al.: The taverna workflow suite: designing and executing workflows of web services on the desktop, web or in the cloud. Nucleic Acids Res. 41(W1), W557–W561 (2013)

    CrossRef  Google Scholar 

  28. Zhang, A.X., Muller, M., Wang, D.: How do data science workers collaborate? Roles, workflows, and tools. arXiv preprint arXiv:2001.06684 (2020)

Download references

Acknowledgements

Support for this work from ARC Discovery Projects DP170101932 and from ARC Laureate Program FL190100035 is gratefully acknowledged. We would also like to acknowledge Prof. Hai Vu and Dr. Nam Hoang from the Monash Institute of Transport Studies for their collaboration, and thank the Department of Transport (VicRoads) for sharing the transport data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hourieh Khalajzadeh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Khalajzadeh, H. et al. (2021). BiDaML in Practice: Collaborative Modeling of Big Data Analytics Application Requirements. In: Ali, R., Kaindl, H., Maciaszek, L.A. (eds) Evaluation of Novel Approaches to Software Engineering. ENASE 2020. Communications in Computer and Information Science, vol 1375. Springer, Cham. https://doi.org/10.1007/978-3-030-70006-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-70006-5_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-70005-8

  • Online ISBN: 978-3-030-70006-5

  • eBook Packages: Computer ScienceComputer Science (R0)