Modeling machine learning requirements from three perspectives: a case report from the healthcare domain

Abstract

Implementing machine learning in an enterprise involves tackling a wide range of complexities with respect to requirements elicitation, design, development, and deployment of such solutions. Despite the necessity and relevance of requirements engineering approaches to the process, not much research has been done in this area. This paper employs a case study method to evaluate the expressiveness and usefulness of GR4ML, a conceptual modeling framework for requirements elicitation, design, and development of machine learning solutions. Our results confirm that the framework includes an adequate set of concepts for expressing machine learning requirements and solution design. The case study also demonstrates that the framework can be useful in machine learning projects by revealing new requirements that would have been missed without using the framework, as well as, by facilitating communication among project team members of different roles and backgrounds. Feedback from study participants and areas of improvement to the framework are also discussed.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Notes

  1. 1.

    See the section on threats to validity for further details.

  2. 2.

    Interestingly, recent research in the healthcare domain also supports the idea that enrolling the wrong doctors into government programs can be a contributing factor toward failure of such programs. These research publications were unknown to the modelers and project team during the Business View modeling.

  3. 3.

    CRoss-Industry Standard Process for DM.

  4. 4.

    The International Workshop on Requirements Engineering for Artificial Intelligence (RE4AI).

  5. 5.

    Software Engineering for Machine Learning Applications International Symposium.

References

  1. 1.

    Gartner Inc (2019) Advanced analytics. Gartner IT Glossary. https://www.gartner.com/it-glossary/advanced-analytics/. Accessed 16 Nov 2019

  2. 2.

    Bichler M, Heinzl A, van der Aalst WM (2017) Business analytics and data science: once again? Bus Inf Syst Eng 59(2):77–79

    Article  Google Scholar 

  3. 3.

    Moore A (2019) When AI becomes an everyday technology. Harvard business review. https://hbr.org/2019/06/when-ai-becomes-an-everyday-technology. Accessed 16 Nov 2019

  4. 4.

    Veeramachaneni K (2016) Why you’re not getting value from your data science. Harv Bus Rev 12:1–4

    Google Scholar 

  5. 5.

    Luca M, Kleinberg J, Mullainathan S (2016) Algorithms need managers, Too. Harv Bus Rev 94:96–101

    Google Scholar 

  6. 6.

    Kiron D, Schrage M (2019) Strategy for and with AI. MIT Sloan Manag Rev 60(4):30–35

    Google Scholar 

  7. 7.

    Ng A (2016) What artificial intelligence can and can’t do right now. Harvard Business Review. https://hbr.org/2016/11/what-artificial-intelligence-can-and-cant-do-right-now. Accessed 16 Nov 2019

  8. 8.

    Redman T (2019) Do your data scientists know the ‘Why’ behind their work?. Harvard Business Review. https://hbr.org/2019/05/do-your-data-scientists-know-the-why-behind-their-work. Accessed 16 Nov 2019

  9. 9.

    Akkiraju R, Sinha V, Xu A, Mahmud J, Gundecha P, Liu Z, Schumacher J (2018) Characterizing machine learning process: a maturity framework. arXiv preprint http://arxiv.org/1811.04871

  10. 10.

    Storey VC, Trujillo JC, Liddle SW (2015) Research on conceptual modeling: Themes, topics, and introduction to the special issue. Data Knowl Eng 98:1–7

    Article  Google Scholar 

  11. 11.

    Storey VC, Song IY (2017) Big data technologies and management: what conceptual modeling can do. Data Knowl Eng 108:50–67

    Article  Google Scholar 

  12. 12.

    Lukyanenko R, Castellanos A, Parsons J, Tremblay MC, Storey VC (2019) Using conceptual modeling to support machine learning. In: Cappiello C, Ruiz M (eds) International Conference on Advanced Information Systems Engineering, vol 350. Springer, Cham, pp 170–181

    Google Scholar 

  13. 13.

    Nalchigar S, Yu E, Ramani R (2016) A conceptual modeling framework for business analytics. In: Comyn-Wattiau I, Tanaka K, Song IY, Yamamoto S, Saeki M (eds) International Conference on Conceptual Modeling, vol 9974. Springer, Cham, pp 35–49

    Google Scholar 

  14. 14.

    Nalchigar S, Yu E (2018) Business-driven data analytics: a conceptual modeling framework. Data Knowl Eng 117:359–372

    Article  Google Scholar 

  15. 15.

    Nalchigar S, Yu E (2017) Conceptual modeling for business analytics: a framework and potential benefits. In 2017 IEEE 19th Conference on Business Informatics (CBI) (Vol. 1, pp. 369–378). IEEE

  16. 16.

    Nalchigar S, Yu E (2020) Designing business analytics solutions. Bus Inf Syst Eng 62(1):61–75

    Article  Google Scholar 

  17. 17.

    Nalchigar S, Yu E, Obeidi Y, Carbajales S, Green J, Chan A (2019) Solution patterns for machine learning. In: Giorgini P, Weber B (eds) International Conference on Advanced Information Systems Engineering, vol 11483. Springer, Cham, pp 627–642

    Google Scholar 

  18. 18.

    Siau K, Rossi M (2011) Evaluation techniques for systems analysis and design modelling methods–a review and comparative analysis. Inf Syst J 3(21):249–268

    Article  Google Scholar 

  19. 19.

    Easterbrook E (2007) Empirical Research Methods in Requirements Engineering. Tutorial In 15th IEEE International Requirements Engineering Conference

  20. 20.

    Easterbrook S, Singer J, Storey MA, Damian D (2008) Selecting empirical methods for software engineering research. In: Shull F, Singer J, Sjøberg DIK (eds) Guide to Advanced Empirical Software Engineering. Springer, London

    Google Scholar 

  21. 21.

    Kurgan LA, Musilek P (2006) A survey of Knowledge discovery and data mining process models. Knowl Eng Rev 21(1):1–24

    Article  Google Scholar 

  22. 22.

    Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI mag 17(3):37–37

    Google Scholar 

  23. 23.

    Shearer C (2000) The CRISP-DM model: the new blueprint for data mining. J data warehous 5(4):13–22

    Google Scholar 

  24. 24.

    RE4AI Workshop. https://sites.google.com/view/re4ai. Accessed: 2020–03–07

  25. 25.

    Software Engineering for Machine Learning Applications (SEMLA). https://semla.polymtl.ca/. Accessed: 2020–03–07

  26. 26.

    Horkoff J (2019) Non-Functional Requirements for Machine Learning: Challenges and New Directions. In 2019 IEEE 27th International Requirements Engineering Conference (RE’19), (pp. 386–391)

  27. 27.

    Vogelsang A, Borg M (2019) Requirements Engineering for Machine Learning: Perspectives from Data Scientists. In 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW) (pp. 245–251). IEEE

  28. 28.

    Liu L, Feng L, Cao Z, Li J (2016) Requirements engineering for health data analytics: Challenges and possible directions. In 2016 IEEE 24th International Requirements Engineering Conference (RE) (pp. 266–275). IEEE

  29. 29.

    Chen HM, Kazman R, Haziyev S (2016) Agile big data analytics for web-based systems: an architecture-centric approach. IEEE Transactions on Big Data 2(3):234–248

    Article  Google Scholar 

  30. 30.

    Barone D, Yu E, Won J, Jiang L, Mylopoulos J (2010) Enterprise modeling for business intelligence. In: van Bommel P, Hoppenbrouwers S, Overbeek S, Proper E, Barjis J (eds) IFIP Working Conference on the Practice of Enterprise Modeling, vol 68. Springer, Berlin, Heidelberg, pp 31–45

    Google Scholar 

  31. 31.

    Jiang L, Barone D, Amyot D, Mylopoulos J (2011) Strategic models for business intelligence. In: Jeusfeld M, Delcambre L, Ling TW (eds) International Conference on Conceptual Modeling, vol 6998. Springer, Berlin, Heidelberg, pp 429–439

    Google Scholar 

  32. 32.

    Barone D, Jiang L, Amyot D, Mylopoulos J (2011) Reasoning with Key performance indicators. In: Johannesson P, Krogstie J, Opdahl AL (eds) IFIP Working Conference on The Practice of Enterprise Modeling, vol 92. Springer, Berlin, Heidelberg, pp 82–96

    Google Scholar 

  33. 33.

    Giorgini P, Rizzi S, Garzetti M (2008) GRAnD: A goal-oriented approach to requirement analysis in data warehouses. Decis Support Syst 45(1):4–21

    Article  Google Scholar 

  34. 34.

    Mazón JN, Pardillo J, Trujillo J (2007) A Model-driven goal-oriented requirement engineering approach for data warehouses. In: Hainaut JL et al (eds) International Conference on Conceptual Modeling, vol 4802. Springer, Berlin, Heidelberg, pp 255–264

    Google Scholar 

  35. 35.

    Vassiliadis P, Simitsis A, Skiadopoulos S (2002) Conceptual modeling for ETL processes. In: Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP (pp. 14–21). ACM

  36. 36.

    Munoz L, Mazon JN, Trujillo J (2011) ETL process modeling conceptual for data warehouses: a systematic mapping study. IEEE Latin Am Transactions 9(3):358–363

    Article  Google Scholar 

  37. 37.

    Horkoff J, Yu E (2016) Interactive goal model analysis for early requirements engineering. Requir Eng 21(1):29–61

    Article  Google Scholar 

  38. 38.

    Yu ESK, Giorgini P, Maiden N, Mylopoulos J (2011) (Eds.). Social modeling for requirements engineering. MIT Press. Cambridge

Download references

Acknowledgements

We wish to thank the anonymous reviewer #1 for her/his valuable comments, especially for suggesting to highlight the centrality of the Insight modeling elements as a link between the three modeling views.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Soroosh Nalchigar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A : List of prompting questions for constructing models in the framework

Constructing Business View models
•What are the key business strategies in your domain of interest?
•Who is responsible for/aim to achieve those goals?
•How are they achieving this? How else can we achieve this?
•Why are they doing this?
•What are the key performance indicators in this context?
•How would you measure how well you are achieving those goals?
•What are the business decision(s) that need analytics (or data-driven) support? Who are those decision makers?
•Why would they need to make such decisions? Which business goal is each decision part of? Which business (routine) process is this decision part of?
•What is the frequency of each decision (how often)?
•What would the decision maker(s) need to know during the decision processes?
•What are the questions that come to their mind (and they need to have an answer for) during their decision making activities?
•For each question, if it is too broad, can you break it into sub-questions?
•Specify the tense (past, present, or future), and frequency (how often) of the questions
•From the given list, specify what kinds of answers are needed for each of the business questions? Predictive model, groupings of the data (segments), probability model, diagram (visualization), or logical rules
•For each of the above, specify the Input, Output, Usage Frequency, Update Frequency, and Learning Period of the machine learning model
Constructing Analytics View models
•What kind of analytics (descriptive, predictive, or prescriptive) would be appropriate to generate required insights?
•What algorithm(s) exist for fulfilling the analytics goal at hand?
•What are the quality attributes or non-functional requirements (NFRs) are critical for users?
•What numeric metrics would be used to compare/evaluate the algorithms?
•Define the threshold (upper or lower) values for indicators (e.g., minimum required accuracy for predictive models)
•How are the critical NFRs influenced by alternative algorithms?
Constructing Data Preparation View models
•What kind of data would be relevant for generating the insights and answering the business question at hand?
•What data attributes (i.e., features), in what format, and aggregation level are needed for the question goals under consideration?
•Where is the data stored, and what is data schema (i.e., entities and relationships)?
•Explain, to best of your understanding, the attributes, format, and size of the dataset at hand
•For each attributes, what is the data types, aggregation level, and selection of records (filtering)?
•What (sequence of) integration, cleaning, aggregation, filtering, and other data preparations are needed for transforming the raw data tables into the prepared data tables?
•Are there any data quality concerns?

Appendix B: Questionnaire used for collecting feedback in post-modeling interviews

[Q1] At the end of modeling sessions, were the modelers able to arrive at a characterization of your existing analytics solution/product?

  • If your answer is NO, please explain what aspects/parts/components of your product/solution were not identified at the end of modeling sessions.

  • If your answer is YES, please provide 2–3 sentences on which area of the graphical models correspond to which part of your product.

[Q2] Through the course of this collaboration, were there any instances of understandings or findings that you and your team were not able to arrive at that prior to the modeling activities? Please provide 2–3 examples.

[Q3] What did you find useful about the framework? (Write 3–4 sentences or bullet points). This can include specific modeling language features or methodological steps, as well as the general approach.

[Q4] What do you think is most lacking in the framework? Are there additions to or variations on the framework that you would like to see?

[Q5] Provide 2–3 examples of features that are not part of current your product/solution, but after the modeling sessions, you think that they can be fruitful additions.

[Q6] What are the aspects or features of the framework that you consider least useful? (This can include modeling language features as well as methodological steps.)

[Q7] In arriving at your current analytics solution/product, you had evolved the product conception and design through one or more iterations in the past. Retrospectively, do you think using the modeling framework would have enabled you to arrive at a viable product more easily or sooner?, e.g., in uncovering pain points and analyzing failure stories and scenarios, and in providing guidance and focus in the search for solutions.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nalchigar, S., Yu, E. & Keshavjee, K. Modeling machine learning requirements from three perspectives: a case report from the healthcare domain. Requirements Eng (2021). https://doi.org/10.1007/s00766-020-00343-z

Download citation

Keywords

  • Conceptual modeling
  • Requirements engineering
  • Machine learning
  • Data analytics
  • Health care