Abstract
Nowadays, intelligent systems and services are getting increasingly popular as they provide data-driven solutions to diverse real-world problems, thanks to recent breakthroughs in artificial intelligence (AI) and machine learning (ML). However, machine learning meets software engineering not only with promising potentials but also with some inherent challenges. Despite some recent research efforts, we still do not have a clear understanding of the challenges of developing ML-based applications and the current industry practices. Moreover, it is unclear where software engineering researchers should focus their efforts to better support ML application developers. In this paper, we report about a survey that aimed to understand the challenges and best practices of ML application development. We synthesize the results obtained from 80 practitioners (with diverse skills, experience, and application domains) into 17 findings outlining challenges and best practices for ML application development. Practitioners involved in the development of ML-based software systems can leverage the summarized best practices to improve the quality of their system. We hope that the reported challenges will inform the research community about topics that need to be investigated to improve the engineering process and the quality of ML-based applications.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets generated during and/or analyzed during the current study are available in the replication package, at https://preview.tinyurl.com/ydaj9jh9
References
Amershi, S., Begel, A., Bird, C., DeLine, R., Gall, H., Kamar, E., Nagappan, N., Nushi, B., & Zimmermann, T. (2019). Software Engineering for Machine Learning: A Case Study. ICSE: In Proc.
Anderson, D. J. (2010). Kanban: successful evolutionary change for your technology business. Blue Hole Press.
Appendix. (2020). Replication package with survey data and results. Available online at: https://preview.tinyurl.com/ydaj9jh9
Bangash, A. A., Sahar, H., Chowdhury, S., Wong, A. W., Hindle, A., & Ali, K. (2019). What do developers know about machine learning: a study of ML discussions on StackOverflow.
Belani, H., Vukovic, M., & Car, Z. (2019). Requirements Engineering Challenges in Building AI-Based Complex Systems. arXiv preprint arXiv:1908.11791
Braiek, H. B., & Khomh, F. (2020). On Testing Machine Learning Programs. Journal of Systems and Software, 164, 110542, ISSN 0164–1212. https://doi.org/10.1016/j.jss.2020.110542
Charmaz, K. (2006). Constructing Grounded Theory: A Practical Guide Through Qualitative Analysis. SAGE Publications.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357.
Felderer, M., & Ramler, R. (2021). Quality Assurance for AI-Based Systems: Overview and Challenges In: Winkler, D., Biffl, S., Mendez, D., Wimmer, M., Bergsmann, J. (eds) Software Quality: Future Perspectives on Software Engineering Quality. SWQD, pp.33–42.
Fink, A. (2003) The survey handbook. Sage.
Grosse, R. B., & Duvenaud, D. K. (2014). Testing MCMC code. NIPS: In Proc.
Guo, Q., Chen, S., Xie, X., Ma, L., Hu, Q., Liu, H., Liu, Y., Zhao, J., & Li, X. (2019). An Empirical Study towards Characterizing Deep Learning Development and Deployment across Different Frameworks and Platforms. arXiv preprint arXiv:1909.06727
He, H., Bai, Y., Garcia, E. A., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning, 2008 IEEE International Joint Conference on Neural Networks, Hong Kong, 2008, pp. 1322-1328.
Huang, S., Liu, E. -H., Hui, Z. -W., Tang, S. -Q., & Zhang, S. -J. (2018). Challenges of Testing Machine Learning Applications arXiv:1806
Ishikawa, F., & Yoshioka, N. (2019). How do engineers perceive difficulties in engineering of machine-learning systems? questionnaire survey. In Proceedings of the Joint 7th International Workshop on Conducting Empirical Studies in Industry and 6th International Workshop on Software Engineering Research and Industrial Practice (CESSER-IP ’19). IEEE Press, 2–9.
Islam, Md. J., Nguyen, H. A., Pan, R., & Rajan, H. (2019). What Do Developers Ask About ML Libraries? A Large-scale Study Using Stack Overflow. arXiv: 1906.11940v1
Khomh, F., & Antoniol, G. (2018). Bringing AI and machine learning data science into operation., Redhat Blog. Available at: https://www.redhat.com/en/blog/bringing-ai-and-machine-learning-data-science-operation
Khomh, F., Adams, B., Cheng, J., Fokaefs, M., & Antoniol, G. (2018). Software Engineering for Machine-Learning Applications: The Road Ahead. IEEE Software, 35(5), 81–84.
Kriens, P., & Verbelen, T. (2019). Software Engineering Practices for Machine Learning. arXiv:1906.10366
Ma, L., Juefei-Xu, F., Xue, M., Li, B., Li, L., Liu, Y., & Zhao, J. (2019). DeepCT: Tomographic Combinatorial Testing for Deep Learning Systems. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 614–618.
Ma, L., Juefei-Xu, F., Zhang, F., Sun, J., Xue, M., Li, B., Chen, C., Su, T., Li, L., Liu, Y., et al. (2018a). Deepgauge: Multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 120–131.
Ma, L., Zhang, F., Sun, J., Xue, M., Li, B., Juefei-Xu, F., Xie, C., Li, L., Liu, Y., Zhao, J., et al. (2018b). Deepmutation: Mutation testing of deep learning systems. In 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 100–111.
Marijan, D., & Gotlieb, A. (2020). Software testing for machine learning. Proceedings of the AAAI Conference on Artificial Intelligence, 34.
Marijan, D., Gotlieb, A., & Ahuja M. K. (2019). Challenges of Testing Machine Learning Based Systems.
Nguyen-Duc, A., Sundbø, I., Nascimento, E., Conte, T., Ahmed, I., & Abrahamsson, P. (2020). A Multiple Case Study of Artificial Intelligent System Development in Industry. In Proceedings of the Evaluation and Assessment in Software Engineering (EASE ’20), pp. 1–10.
Pei, K., Cao, Y., Yang, J., & Jana S. (2017). DeepXplore: Automated Whitebox Testing of Deep Learning Systems, In Proc. Symposium on Operating Systems Principles (SOSP ’17). pp.1-18.
Poppendieck, M., & Poppendieck, T. (2003). Lean Software Development: An Agile Toolkit: An Agile Toolkit. Addison-Wesley.
Renggli, C., et al. (2019). Continuous integration of machine learning models with ease. ML/CI: Towards a rigorous yet practical treatment. arXiv:1903.00278
Responsible AI Practices. (2020). Google AI. Available at: https://ai.google/education/responsible-ai-practices
Sandberg, A. B., & Crnkovic, I. (2017). Meeting Industry-Academia Research Collaboration Challenges with Agile Methodologies. 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP), Buenos Aires, pp. 73-82.
Schelter, S., Biessmann, F., Januschowski, T., Salinas, D., Seufert, S., & Szarvas, G. (2018). On Challenges in Machine Learning Model Management. Committee on Data Engineering: Bulletin of the IEEE CS Tech.
Schwaber, Ken. (1997). Scrum development process (pp. 117–134). London: Business object design and implementation. Springer.
Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J., & Dennison, D. (2015). Hidden technical debt in machine learning systems. In Proc NIPS. pp. 2503–2511.
Stol, K., Ralph, P., & Fitzgerald, B. (2016). Grounded Theory in Software Engineering Research: A Critical Review and Guidelines. 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE), Austin, TX, pp. 120-131.
Storcheus, D., Rostamizadeh, A., & Kumar, S. (2015). A survey of modern questions and challenges in feature extraction. In Proc IWFE: Modern Questions and Challenges, NIPS. 1-18.
Sun, Y., Wu, M., Ruan, W., Huang, X., Kwiatkowska, M., & Kroening, D. (2018). Concolic testing for deep neural networks. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 109–119.
van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579-2605.
Vogelsang, A., & Borg, M. (2019). Requirements Engineering for Machine Learning: Perspectives from Data Scientists. In 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW), pp. 245-251. IEEE.
Wan, Z., Xia, X., Lo, D., & Murphy, G. C. (2019). How does Machine Learning Change Software Development Practices? IEEE Transactions on Software Engineering.
Washizaki, H., Uchida, H., Khomh, F., & Guéhéneuc, Y. (2019). Studying Software Engineering Patterns for Designing Machine Learning Systems. 2019 10th International Workshop on Empirical Software Engineering in Practice (IWESEP), Tokyo, Japan, pp. 49–495.
Zhang, J. M., Harman, M., Ma, L., & Liu, Y. (2019a). Machine Learning Testing: Survey, Landscapes and Horizons. arXiv preprint arXiv:1906.10742
Zhang, T., Gao, C., Ma, L., Lyu, M., & Kim, M. (2019b). An Empirical Study of Common Challenges in Developing Deep Learning Applications. 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE), pp. 104-115
Zhang, X., et al. (2019c). Software Engineering Practice in the Development of Deep Learning Applications. arXiv preprint arXiv:1910.03156
Zinkevich, M. (2018). Rules of machine learning: Best practices for ML engineering, Google guide on machine learning. Available at: https://developers.google.com/machine-learning/guides/rules-of-ml/
Acknowledgements
We express our gratitude to NSERC and FRQ funding agencies. Our heartiest thanks to the anonymous participants for their valuable time and thoughtful responses to our survey questionnaire.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rahman, M.S., Khomh, F., Hamidi, A. et al. Machine learning application development: practitioners’ insights. Software Qual J 31, 1065–1119 (2023). https://doi.org/10.1007/s11219-023-09621-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-023-09621-9