1 Introduction

The Systems Modeling Language (SysML) is a standard from the Object Management Group (OMG) to support the design, the analysis, and verification of complex systems which may include software and hardware components. SysML reuses parts of UML and additionally offers new language elements like value types, quantity kind, as well as the opportunity to describe the functionality of continuous systems [29]. One of the first intention for SysML was to give systems engineers a modeling language in hand which is not too software oriented [51]. SysML enables to model a wide variety of systems from different perspectives such as behavior, structure, or requirement. The temporarily last version 1.5 was released in May 2017. SysML has been in place for about thirteen years, and various papers capturing different aspects of this standard have been published at different venues by different research communities. Since SysML is used in multi-disciplinary engineering, there are large application fields where the language is used.

To get a better overview of this huge number of contributions as well as to identify the relevance of SysML in scientific communities, we carried out a systematic mapping study by analyzing the abstracts of the different contributions. The study helps to generate knowledge by determining the application fields in which SysML is commonly used, which research groups are involved, etc. These insights help to identify trends to which direction SysML should be developed in future, also with respect to the ongoing discussion about SysML 2.0.

To put the aim of this article in a nutshell, we present inputs as well as outputs of the SysML mapping study and show a comprehensive overview of the evolution of SysML over a period of more than 10 years. Additionally, we identify open issues and discuss these issues in the conclusion of this article with regard to SysML 2.0. According to Kitchenham et al. [28], the findings and outlook may support the work of the following stakeholders:

  • Research: Scientists just started with research in the field of SysML may use this study as an overview and starting point for their work. Experienced researchers may also use it as reference to save time for in-depth studies and to accelerate the search for open issues.

  • Industry: For industry, the findings give a good outline of the state of the art in SysML research. This may enable to transfer knowledge between academia and industry. Such knowledge transfer may push forward the realization of open issues in the vision of Industry 4.0 and cyber-physical systems [10]. At least, industry stakeholders may identify relevant and suitable research outputs for practical settings.

The remainder of this article is structured as follows: Section 2 discusses the related work. In Sect. 3, we present the research method, define the research questions, and describe the process of conducting the mapping study. In Sect. 4, we describe and analyze the extracted data and visualize the results. Section 5 covers possible threats to validity. In Sect. 6, we present the conclusions and an outlook to future work. In Appendices A and B we present references of all covered SysML papers, a list of books, and theses, which were not part of this survey.

2 Related work

In this section, we give an outline on the method of systematic literature review compared to the method of a systematic mapping study. Furthermore, we take a closer look on these methods applied to UML and to its profiles (e.g., SysML, MARTE).

2.1 Systematic literature review versus systematic mapping study

Evidence-based practices, originating from the medicine discipline, have been widely adopted in software engineering (SE) since 2004. In order to address evidence-based SE in the form of systematic literature reviews (SLRs), the corresponding techniques were re-formulated by Kitchenham [26]. SLR is a well-defined methodology to identify, analyze, and interpret evidences in an unbiased and repeatable way [28]. A large majority of published SLRs in the domain of SE has been performed by following the approaches introduced by Kitchenham et al. [25, 27]. In addition, there are some authors who have adopted surveys from medicine [35] as well as from social sciences [46], or they have applied refined guidelines like introduced in [11, 14, 60].

In this article, we apply a broader form of SLR which is known as systematic mapping study (SMS) according to [8], since our intention is to focus on evidences for a specific research topic instead of answering detailed research questions. Based on a set of primary studies, a SMS identifies gaps in the research area under consideration and discovers potential research trends. By doing so, we follow the guidelines for conducting SMS in SE introduced by Petersen et al. [44, 45]. Additionally, we apply the survey of Kuhrmann et al. [31] for performing our SMS for SysML (see Sect. 3).

It seems that there are similarities between SMS and SLR; however, the approaches of these two methodologies and also their goals are quite different. For instance, in contrast to SLR, a SMS uses general research questions to classify and aggregate relevant studies to high-level categories [40].

2.2 SMSs and SLRs applied to UML

In empirical studies concerning the maintenance of UML diagrams and their use in the maintenance of code, Fernández-Sáez et al. [17] conducted a SMS. For this purpose, the authors studied 38 already published studies for discovering an empirical evidence by applying the guidelines of [25]. As a result, the authors identified the need for more experiments and case studies in industrial contexts.

In the particular research field of UML-driven software performance engineering, Garousi et al. [18] conducted a SMS to systematically categorize the current state of the art. Thereby, the authors applied the guidelines provided by Kitchenham and Charters [25] and Petersen et al. [44]. Among others, the authors identified emerging trends in this specialized research field based on a set of 90 (from 114 identified) papers published between 1998 and 2011 [18].

Torre et al. [56] deliver a comprehensive summary of UML consistency rules (regarding the different diagram types) by performing a SMS including 94 primary studies published until December 2012. For their SMS, they used in total seven search engines and followed the guidelines of Kitchenham [25]. There are related research works that address, e.g., a SLR on UML consistency management [33] by covering an earlier publication period (2001–2007), as well as, a SLR about the quality of UML diagrams [37]. Finally mentioned, there exist prior works on empirical evidence related to UML in general, e.g., a SLR [9] and a SMS [47], which consider papers on UML properties and features published until 2008.

In the area of Software Product Lines (SPL), a SMS on business process variability is conducted by Valença et al. [57]. This SMS includes 80 primary studies and considers one empirical study on a hierarchical representation method for UML 2.0 activity diagrams. They based their work mainly on the surveys presented in [8, 44] as well as on SMS best practice as introduced in [28].

All of these related works have in common that they do not consider SysML as main topic of the survey and that they apply other techniques than we follow in our mapping study. However, they represent interesting related work, not least because UML provides the basis for SysML.

2.3 SMSs and SLRs applied to UML profiles

Ameller et al. [1] classify UML and UML profiles used to specify functional and non-functional requirements based on SMS to assess the state of the art in the development of services-oriented architectures using model-driven development. The authors selected and analyzed 129 papers by adopting the guidelines presented in [25] and those described in [28, 44]. There are related SMS investigating the alignment of requirements specification and testing such as presented in [5]. In [52], the authors conducted a survey to examine the use of UML profiles for testing Web services composition.

In the research field of domain-specific languages (DSLs), Nascimento et al. [13] perform a SMS to identify the most popular application domains of DSLs. The authors categorize 1440 (from 4450 identified) primary studies by applying the guidelines described in [25, 44]. The technique of UML profiles is mentioned in 21 publications of their catalog. An extensive SLR in the specialized research area of model-driven security was conducted by Nguyen et al. [39], where the authors also consider UML profiles (e.g., UMLSec, SecureUML, etc.) for the definition of security-oriented DSLs. In addition, Souag et al. [55] surveyed UML-based extensions for modeling security in the field of security requirements engineering.

The UML profile SysML is addressed as topic in a mapping study, which investigated the usability requirements elicitation [41]. The study was conducted based on the guideline presented in [25]. The authors formulated a sub-question on notations to elicit usability requirements, and they identified model-based notations and natural language as the most widely used notations in SE. There are similar SLRs related to this topic such as presented in [2], which covers model-driven requirements engineering.

Regarding model-based requirements specifications, Rashid et al. [48] investigated how UML, SysML, and MARTE profiles have been used to specify aspects of embedded systems in the context of early design verification by considering papers published between 2008 and 2015. In an additional SLR on tool selection in model-based systems engineering, Rashid et al. [49] classified selected research work in different categories like “modeling category,” where modeling aspects of embedded systems using UML and its profiles SysML and MARTE were discussed. Additionally to model-based or model-driven requirements engineering and specification, SysML as topic was also investigated in the field of model-based testing like in the work presented in [54]. Wortmann et al. [62] explore in their SMS the state of the art of using modeling languages for model-based systems engineering of smart factories. The authors found out that SysML and its variants play a key role as modeling technique for realizing Industry 4.0 approaches.

In the research field of systems engineering, several SLRs include specific research questions concerning UML as well as its profiles SysML and MARTE. For instance, Guessi et al. [20] conducted a SLR on the topic of describing software architectures for systems of systems (SoS). The authors’ second research question targets the techniques that have been used for describing SoS. They identified that most primary studies use UML or SysML as semi-formal architecture description languages.

In the previous past, SMSs were applied on safety and security topics in the research field of systems engineering. For instance, Nguyen et al. [40] conducted a SMS by covering primary studies that focus on several SysML profiles like SysML-Sec. Other SMSs such as presented in [16, 22] only touch SysML in their explanations and findings.

There are a lot of SMSs addressing UML-based approaches and UML profiles (e.g., SysML), e.g., (i) an SMS on functional safety conducted by [7], (ii) a survey on SPL evolution presented by [32], or two SMS on SPL testing conducted by [15, 38].

2.4 Synopsis

In this section, we relate existing research to our mapping study. The presented research includes guidelines for conducting SLRs and SMSs such as the work of Kitchenham et al. [25, 27] and Petersen et al. [44, 45], or Kuhrmann et al [31]. We discussed works including empirical studies, case studies, and surveys on UML and UML profiles, in particular, applied in the domain of software engineering as well as systems engineering. The conducted studies and mentioned surveys investigate in the research fields of requirements engineering, embedded systems in the context of early design verification, model-based systems engineering, security engineering, performance engineering, and software testing, e.g., the quality and usability of UML and UML profiles.

All of these studies and surveys have in common that they do not consider SysML exclusively and that they apply other guidelines than we follow in our mapping study. For instance, the presented SLRs answer detailed research questions but they give no evidence on various aspects for SysML for systems and software engineering. However, they represent interesting related work and provide relevant entry points to our own mapping study.

3 Research method

As research method, we used the previously introduced SMS that enables to cover and classify publications in a specific area. In our study, we are focusing on the abstracts of publications, published in the time period from 2005 to 2017, that have SysML as their main topic.

The process for conducting this SMS is shown as SysML activity diagram in Fig. 1, which mainly bases on the guidelines introduced in Petersen et al. [44]. We modified this mapping process by adapting the last two activities.

Fig. 1
figure 1

Activity diagram of the systematic mapping process [44]

Our systematic mapping process consists of five steps (see Fig. 1). It starts by the activity of defining research questions. The output of this activity are appropriate research questions that define the review scope for the next step. In that activity, we conduct a literature search. The output are all publications related to the previously defined research questions. The next step is the screening of those publications in order to select the relevant ones. These relevant publications are the input for the activity called “classification using abstracts,” where we categorize the relevant publications by their abstracts based on the research type facets introduced by Petersen et al. [44] (see Table 1). We enhance this activity by further investigating to classify the abstracts based on systems engineering phases related to the VDI guideline 2206 [58] and contribution types as introduced by Shaw [53]. As output, we get classified abstracts of selected publications, which we use as input for the last activity “mapping of papers,” After this final step, we get a systematic map, which enables us to extract main findings related to our research questions.

In the following, we describe four (Sects. 3.13.3) of the five SMS activities based on the research topic of our survey. Afterward, in Sect. 4, we present the final activity “mapping of papers.”

All data (i.e., founded results, search strings, screened paper, classifications) can be also found on figshareFootnote 1 at https://figshare.com/s/871aa0c03aa18eb3edf6.

3.1 Activity 1: defining research questions

In this subsection, we define our research questions to specify the review scope of the mapping study and we provide an insight into the intentions behind these questions.

  • RQ 1: What are the bibliometric key facts of SysML publications?

    The intentions of this research question is to find out (i) the number of SysML publications that were contributed in the period from 2005 to 2017, (ii) the type of those publications (e.g., article, book chapter), (iii) the main venues where the publications have been submitted, and (iv) the main research background (i.e., communities) of these venues.

  • RQ 2: Where are the scientific communities of

    SysML located and are there main contributors, who scientifically promote SysML topics?

    The intention is to identify and analyze scientific communities working on topics of SysML, e.g., we are interested in the location of these communities. Moreover, we address the question if there are more single authors working on SysML topics, or rather (small) research groups. For instance, we are interested in the number of publications and their authors to identify those publications published by one and the same author. Last but not least, we consider the number of citations of each of the publications to identify the relevance for the respective community. By doing this analysis, we want to find out if there exists a huge network spanning over the world which is working on SysML approaches, or not.

  • RQ 3: Which research type facets do the identified publications address?

    The main intention is to categorize the different publications by a solid and already well-established schema ( [31, 59]). Therefore, we use the research type facets introduced by Petersen et al. [44] as described in detail in Sect. 3.3 (see Table 1). Based on this type facets, we want to find out in which research contexts SysML topics are used, e.g., validation, evaluation, etc.

  • RQ 4: What are the key aspects of applying SysML in the classified publications?

    In addition to assigning the publications to type facets, we are interested to get a deeper insight in the research contribution of those publications. This research question aims to identify (i) in which phase of the engineering process [58] SysML is used, and (ii) the contribution type [53] of the publications.

Table 1 Research Type Facet [44, p.4]

3.2 Activities 2 and 3: conducting search and screening of publications

After identifying our research questions, the next activity is the definition of appropriate keywords to find all published papers regarding topics about SysML.

Conducting search In contrast to existing work (see Sect. 2), we do not want to cover just a single aspect of SysML. Our aim is to provide an overview of all published papers. Thus, we decide to search for the following keywords:

  • SysML

  • “Systems Modeling Language” (case insensitive)

  • “System Modeling Language” (case insensitive)

There are many different digital literature libraries available on the Web for conducting a literature search. We have opted for the following four established ones:

  • Scopus (www.scopus.com)

    One of the largest abstract and citation databases of peer-reviewed literature.

  • ACM Digital Library (http://dl.acm.org/)

    ACM is a research, discovery, and networking platform where a collection of full-text articles and bibliographic records can be found.

  • IEEE Xplore Digital Library (http://ieeexplore.ieee.org)

    IEEE Xplore provides a full-text access to technical literature in engineering as well as technology.

  • DBLP (http://dblp.uni-trier.de/)

    The computer science bibliography database dblp offers open bibliographic information on computer science journals and proceedings.

In our piloting phase, we got more than 2000 papers resulting from the conducted search process in these libraries. In order to obtain more precise results regarding our intention to find out the state of the art of research on SysML in academia, we decided to restrict the search string by the following criteria:

  • Publication in the period from 2005 to 2017: The first SysML SpecificationFootnote 2 v.0.9. is online since January 2005. Thus, we use this year as starting point in our systematic mapping study. Since the survey was conducted in late 2017/early 2018, we decided to define 2017 as end date.

  • Title: In order to get as output more specific SysML publications, and not only papers mentioning SysML as related work, we restrict the search query to publications where the previously defined keywords are in the title. This decision should ensure that only publications that focus on SysML are included in our result set.

We updated our result set several times to receive a complete set of all relevant publications to answer our research questions. In addition during the revision process, we also made several updates for finding any further publication published in 2017. The final state of our result set, aligned with all libraries, was checked the last time on the 21 of January 2019.

Screening of publications For screening the publications, we defined the following exclusion criteria:

  • Duplicates

  • Papers:

    \(\bullet \):

    without available abstract

    \(\bullet \):

    without an English, German, or French abstract

    \(\bullet \):

    without any context to the language SysML

    For instance, SysML as abbreviation for “System Machine Learning.”

    \(\bullet \):

    with similar abstracts

    Some papers are covering different development stages of a project, and therefore, their abstracts are identical or have been just slightly extended. We deleted the older or shorter version and always kept the newer or longer version in the result set.

    \(\bullet \):

    with identical abstracts

    There are papers with identical content and abstracts; however, they have been published at different venues (e.g., conferences and journals). We decided to leave one of them in the results set and deleted the other publication.

  • Books: Books are deleted because they are not peer-reviewed (e.g., A Practical Guide to SysML [B4]). The whole list of retrieved books can be found in Appendix B.

  • Theses: Theses often cover several different aspects and therefore can be assigned to different type facets. In addition, most theses are also (partly) published as conference or journal papers and would be duplicates. Thus, we removed them from the result set. The list of excluded theses can be found in Appendix B.

Based on these exclusion criteria, we double-checked (extractor/checker) all extracted papers in order to ensure that there is consensus on all findings. After performing this screening process, our result set comprises 579 papers. For these papers, we additionally considered the number of citations provided by Google ScholarFootnote 3 (see Sect. 4.2). The overall list of the 579 publications is provided in Appendix A.

3.3 Activity 4: classification using abstracts

According to the guidelines of Petersen et al. [44], it is sufficient to search only the abstract of a publication for categorization. In order to get a deeper insight of the research context of those publications and for a better mapping, we decide to apply the research type facets of Petersen et al., as presented in Table 1, already in this phase of the SMS. This means that we deviate from the original mapping process by using the research type facets as classification schema to categorize the abstracts of the selected publications. Thus, we modified the activity of “keywording of abstracts” in that we use an already established classification schema.

Besides the assignment of abstracts to these research type facets, it is important to find out in which engineering phase SysML is mainly used and to which contribution type the publications belong in order to get a deeper insight in the research field of the selected publications. For this purpose, we examine the different topics of the papers by analyzing the keywords of the abstracts and cluster the publications based on systems engineering phases and contribution types. In this respect, we adapted the mapping process introduced by Petersen et al. [44] by making a more fine grained categorization, as we describe in the following:

Systems engineering phase Based on the V-model related to the VDI guideline 2206 [58], we distinguish the following phases:

  • Requirements: Defining the requirements and system properties such as the scope of functions and interfaces.

  • Design: Designing the architecture of the system.

  • Implementation: Phase of realization and integration to which simulations and code generators belong.

  • Validation and Verification: The final phase of the V-model to analyze and check the system.

Contribution type On the basis of the types of research results introduced by Shaw [53], we define our categories for the contribution types. Shaw defines seven different types, in which she is also distinguishing between different data models (empirical, analytic, qualitative model). In our study, we do not focus on different data models. Thus, we adapt these types for our classification process. To give an overview, we briefly describe our contribution types in the following:

  • Technique: Definition of a method or procedure.

  • Process: Sequence of both interdependent as well as linked procedures.

  • Notation: A formal language or graphical notation to support a method (e.g., SysML Profile) or to map SysML to other languages (e.g., translation).

  • Tool: A specific implemented tool based on a certain technique.

  • Specific Solution: Solution for an application problem, e.g., result of a specific analysis, evaluation, or comparison.

  • Other: For all publications of our result set, which cannot be assigned to one of the contribution types specified above. For instance, this includes publications that use SysML in an educational context, or that compare SysML with other modeling languages.

The outcome of the first four activities of the mapping process is a result set of publications classified based on research type facets, systems engineering phases, as well as on contribution types. This outcome serves as an input to the final activity called “mapping of papers,” which is described in detail in the next section.

4 Mapping of papers

In this section, we describe analysis and results to answer the research questions (RQ1–RQ4). This output bases on the last activity of our adapted SMS. Additionally, we briefly summarize the main findings related to these questions at the end of each subsection.

4.1 RQ 1: bibliometrics of SysML publications

To answer the first research question, we start with analyzing the distribution of published papers in the period from 2005 to 2017. In a second step, we relate the result set to the type of publications. Furthermore, we make a list of venues, where the publications have been submitted. Figure 2 depicts the absolute number of publications per year. The plot shows that this number subjects to fluctuations. We found out that in the years, in which a new version of the SysML standard was published, the number of publications is mostly higher than in the years before. The peak in Fig. 2 indicates that most of the publications were published in 2013.

Fig. 2
figure 2

Number of publications per year in the period from 2005 to 2017 (included number of studies: 579)

Fig. 3
figure 3

Number of publications per year regarding publication type (included number of studies: 579)

For further analysis of these results, Fig. 3 illustrates the relationship among the number of publications, the years, and the type of publication. The orange line depicts that in the period from 2005 to 2017 there have been submitted much more inproceedings to scientific conferences than articles to scientific journals (see the blue line) or book chapters (see the green line). Regarding the publication type, 80% of the screened SysML publications were published as inproceeding, 19% as article, and only 1% as book chapter.

Table 2 Most Prominent Conferences regarding the Number of Publications (at least 8 Papers) Sys. Eng. = Systems Engineering, Sof. Eng. = Software Engineering, Sim. = Simulation, Aut. = Automation
Table 3 Most Prominent Journals regarding the Number of Publications (at least 4 Papers) Sys. Eng. = Systems Engineering, Sof. Eng. = Software Engineering, Sim. = Simulation, Aut. = Automation

Furthermore, we want to find out if there are few selected venues promoted by a handful research communities, or if the publications spread over various conferences, workshops, and journals which are promoted by very different research communities. The result set shows that there are 316 different venues, where the papers have been submitted. For the sake of relevance and clarity, we list in Table 2 those conferences with at least 8 SysML publications and in Table 3 the journals with at least 4 publications. In total, we present 12 different venues. The category column shows the main research community of these venues.

The main venue is the Annual International Symposium of the International Council on Systems Engineering (INCOSE), where more than 30 publications have been submitted in the last 10 years. A possible reason for INCOSE being so prominent may be that the development of the SysML language specification was a collaborative effort between members of OMG and INCOSE. Thus, the INCOSE community is interested in applications and innovations of SysML. Additionally, one of the main research topics of this conference is systems engineering, where SysML plays a key role. From a statistical point of view, we underpin INCOSE’s main position by considering the statistical distribution based on the number of papers per venues, listed in Table 2. We get a mean value of 13 (13,1) and a standard deviation of 7 (6,9). Since we identified a spread of 6 to 20 publications per venue in this descriptive statistical analysis, we can classify INCOSE as an outlier compared to the averages of the other conferences.

The second prominent venue is the International Design Engineering Technical Conferences (IDETC/CIE), where 14 papers were submitted and presented. This conference is, among others, one of the main conferences for design engineering mostly related to the manufacturing domain, where SysML fits thematically well, since it is often used in the design phase of automation systems (see Sect. 4.3.1).

The third venue is the International Conference on Emerging Technologies and Factory Automation (ETFA), where 13 papers were submitted. Approaches based on SysML are in line with this conference, since the main topic of this conference is complex systems, and among others, one goal of SysML is to support the modeling of systems considering software as well as hardware components.

All other venues listed in Table 2 have at most 12 publications. Even though these venues are focusing on different subjects, all of them capture the main topics of SysML such as design, simulation, and complex systems. With regard to journals (see Table 3), the listed ones all deal with systems engineering or software engineering topics, whereby the journal Systems Engineering (with a number of 11 publications) has the most published articles with a focus on SysML.

The distribution curve across all these publications (most prominent venues: conferences, journals) in the time frame from 2005 to 2017 regarding the main research communities is shown in Fig. 4. Regarding the number of publications per year, it is obvious that most contributions were published in the field of systems engineering.

Fig. 4
figure 4

Number of publications per year regarding the research fields of the 12 most prominent venues (included number of studies: 579)

Based on the provided information in the relevant abstract of each publication, we classified the publication in various application fields in a double-checking process (extractor/checker). Those publications that do not clearly belong to a specific application field are discussed in the group. If no unambiguous assignment is possible even after this discussion, the paper is identified as not classified. Based on these two processes, unfortunately 229 paper are not classified in our result set. Figure 5 shows the different application domains that have at least 10 publications. Summarized, SysML is experiencing a strong application in the production area, followed by the aerospace sector (both aircraft and space applications), which is closely followed by the application field of mechatronics. In addition, SysML is frequently used for system engineering modeling in the areas of automotive and energy.

Fig. 5
figure 5

Application domains with at least 10 publications (included number of studies: 579)

figure f

4.2 RQ 2: research communities and main contributors of SysML topics

In a first step, we analyzed the number of authors and their publications. We identified in total 1167 authors, of whom 30 are single authors without relationship to any other author of the result set. Twenty-seven of these single or “non-related” authors have published only one publication with a SysML topic. The “related” authors have at least one relationship to an author, who also has published a paper about SysML. Figure 6 illustrates the number of publications per author.

Fig. 6
figure 6

Number of publications per author (included number of studies: 579)

Fig. 7
figure 7

Affiliation of authors with at least 10 papers (included number of studies: 579)

Fig. 8
figure 8

Number of authors per country with at least 5 authors (included number of studies: 579)

It should be noted that most of the related as well as non-related authors (in total 836) have published only a single publication about SysML. However, there are 13 authors, who worked more closely on the topic and wrote at least 10 papers. The specific affiliations of these authors are shown in Fig. 7. It is worth to mention that these 13 authors belong to eight different institutions. Some of them like Hammad, Mountassir, and Chouali are working together in the same research group, whereas other prominent ones like Paredis, Hause, Vogel-Heuser, and Soares publish on behalf of their own research groups.

For deriving the distribution of related and non-related authors over the world, we took a look at their affiliation to a country. Thereby, we found out that most of the authors are from the USA, followed by France, and Germany (see Fig. 8). Figure 9 illustrates the distribution of authors from a continental perspective. Most of the authors are from Europe (48.8%) followed by North America (23.7%).

Fig. 9
figure 9

Percentage of numbers of authors per continent (included number of studies: 579)

Fig. 10
figure 10

Connections between authors (created by Gephi, included number of studies: 579)

Fig. 11
figure 11

Biggest network between authors with bridge builder (created by Gephi, included number of studies: 579)

In a next step, we analyzed the relationship among these authors to get an overview of networks between them. For this network analysis, we used the free tool Gephi.Footnote 4 The results of this analysis are shown in Fig. 10. It illustrates all links among the 1167 authors. A link exists as soon as one author worked with another author on the same publication. Based on Fig. 10, we identified that there are several research networks for SysML, but not a single big one. For a deeper analysis, we took a closer look at the largest network in the entire graph. This research network consists of 61 authors and is shown in Fig. 11. In this figure, we only name the so-called “bridge builders,” who are the authors Paredis from the Georgia Institute of Technology in Atlanta (USA), Friedenthal from the Lockheed Martin Corporation in Fairfax (USA), and Canedo, who is working at the Siemens Corporation Research in Princeton (USA). These three authors are the anchor points linking the research networks across the world.

Fig. 12
figure 12

Distribution of citations of publications by year (included number of studies: 579)

Fig. 13
figure 13

Number of publications per year according to the type facet classification (included number of studies: 579)

Finally, we analyzed the influence of the selected publications on the scientific community. We used Google Scholar for counting the citations, since we found no information about citation count in the other used research libraries, except Scopus. In order to make the distribution of citations more comprehensible and to show which publications are cited most frequently in an annual comparison (see Fig. 12, outliers), we have chosen a boxplot for visualization as shown in Fig. 12. The top three papers [12, 21, 43] are each cited more than 100 times. These three papers are focusing on the SysML topics: simulation, physical systems, and design. These topics, as well as other SysML topics, are one of the most important issues, as the tag cloud shows (see Fig. 14), which we will discuss later on when presenting the results of RQ 4.

figure g

4.3 RQ 3: classification of SysML publications

For the classification process, we used the definition of categories as described in Sect. 3.3. In a first step, each of us individually categorized the abstracts of the selected publications based on one of the six different type facets introduced by Petersen et al. [44] (see Table 1). This classification offers the possibility to find out whether SysML is used in own approaches, in experiments, or in theoretical considerations. In a second step, we discussed the categorizations and potential conflicts in the group. Based on these discussions, the conflicting papers were finally assigned to one category. The results of this classification process are shown in Fig. 13.

Fig. 14
figure 14

Tag cloud of most important terms (created with http://tagcrowd.com/, included number of studies: 579)

The result set comprises 10 opinion papers, 32 evaluation papers, 53 philosophical papers, and 60 experience papers. We found out that philosophical papers are so to speak the “pioneers” in the introductory phase of the standard until about 2007. There are negligibly few opinion papers in the result set. The majority of the publications are assigned to the categories solution proposals (185 papers) and validation research (239 papers). This means that the SysML topic of the majority of papers is an own approach and its sample implementation. Figure 13 shows the result set and its chronological sequence which indicates that from 2010 onward validation research and solution proposal become more and more prominent. There are only few evaluation papers, since this category requires a preceding solution implementation and based on this groundwork an evaluation within a practical setting with an industry partner.

figure h

4.3.1 RQ 4: key aspects of applying SysML

Additionally to the categorization of publications based on research type facets, we want to detect the key aspects for applying SysML. For this purpose, we firstly created a tag cloud based on all abstracts of the publications of our result set. (German and French abstracts were translated.) Figure 14 shows this tag cloud, which gives us an overview of the 50 most important keywords (with a frequency of at least 50 times) of the abstracts. It should be mentioned that we have deleted conjunctions and keywords like “SysML,” “Systems,” “Modeling,” “Language” as well as “UML,” since we already used them for our general keyword searching when conducting the second activity of the mapping process (see Sect. 3.2).

The most frequently used keywords are design, requirements, process, and simulation. Based on the frequency of these keywords, it can be derived that SysML is most frequently used in connection with design and requirement problems. It should be noticed that in this tag cloud every term is counted as often as it occurs in the selected abstracts. In addition, keywords like implementation, verification, and validation frequently appear.

Fig. 15
figure 15

Distribution of publications by systems engineering and contribution (included number of studies: 579)

For an even more detailed analysis of the application of SysML topics in the selected publications, we clustered the result set according to systems engineering phases and contribution types, as defined and described in Sect. 3. In the following, we give a summary of the result set analyzed based on engineering phases:

  • Requirements: In this initial phase of the engineering process, SysML is used to describe system requirements. The requirements representation is enhanced by a graphical view and by an explicit mapping of the relationships between them. Additionally, the traceability is significantly improved by the so-called “requirements tables.” This in SysML newly introduced diagram type helps to bridge the gap between documents written in natural language and modeled use cases. SysML is also used for modeling non-functional requirements. Besides the requirement diagram, the parametric diagram is used to formally describe design requirements for verification and validation purposes.

  • Design: We found out that in the design phase, SysML is often used to get a better system understanding and to improve interoperability. In many publications, SysML is used to get a detailed picture of the designed system. Increasingly, SysML is used (i) as modeling language for hardware systems, (ii) for concurrent design processes, (iii) for mechanical concept designs, and (iv) for solving aerospace development problems. Additionally to fulfill special design requirements, SysML is extended by profiles, used in combination with, e.g., MARTE, or mapped to other models.

  • Implementation: In the implementation phase, SysML is often used in combination with other languages like SystemC, Modelica, or DEVS to support the implementation of an executable architecture that provides a feasible systems engineering solution. Generally, SysML models are used as basis for the structural and behavioral description of systems. Based on SysML models, executable code is generated by code generators or model transformations are performed by model transformation languages like QVT.

  • Validation and Verification: Regarding the V&V phase, we identified that different approaches deal with (i) model checking for the assessment and evaluation of performance characteristics, (ii) generating automated test cases out of models, and (iii) reliability analysis. The formalization of SysML models allows to building frameworks for the verification and validation of systems design.

Regarding the research results of the contribution type, there are different research fields addressed in the result set, briefly described as follows:

  • Technique: There are a lot of different techniques presented in the publications. Most of them deal with (i) efficient modeling of requirements (functional and non-functional), (ii) performing parametric analysis of complex systems, and (iii) verification of designed models.

  • Process: It could be identified that the support of the development process of systems stands in the foreground. Most of the presented approaches deal with the development of requirements up to the entire design phase, whereas only few publications address the process beyond the design phase.

  • Notation: We found out that in relation to language engineering, most of the publications of the result set deal with SysML profiles. There are extensions and profiles for (i) facilitating the verification of non-functional quantitative requirements, (ii) improving the application of SysML to complex systems, and (iii) using SysML in the automation, mechatronic system, or embedded system domain. In addition to profiles, there are approaches focusing on translation like transformation to Petri Net or Matlab/Simulink. SysML is also used in combination with OCL, OPM, or MARTE.

  • Tool: Approaches in this category mostly engage in the development of tools, e.g., to create and versioning SysML models. There are, for example, requirements modeling tools based on SysML and also tools integrating SysML in a process and design optimization framework. Additionally, there exist approaches that use SysML in combination with simulation frameworks or engines like fUMLFootnote 5 and James II.Footnote 6

  • Specific Solution: There are some specific solutions based on SysML, for example, for space systems, automotive systems, or embedded systems. It can be said that the focus of these solutions is on describing the special requirements of the respective projects.

  • Other: The main aspects for assigning publications to the contribution type called “other” are: (i) the comparisons of SysML to other modeling languages, (ii) the analysis of the usability of SysML diagrams like requirement view and parametric diagram, and (iii) teaching systems modeling in SysML.

We connected the results of the systems engineering phases and the contribution types together and visualized it in Fig. 15. The distribution shows that most of the publications deal with problems in the design phase followed by the V&V phase. To realize their approaches, the authors mostly develop their own techniques and notations.

Fig. 16
figure 16

Formalism transformation graph (FTG) of SysML publications (included number of studies: 579)

There are many papers in our result set dealing with SysML extensions or transformations to other languages, techniques, tools, and concepts. Therefore, we have analyzed the information provided in the abstracts for creating a formalism transformation graph (FTG) [34], and additionally based on the same principle, we created a formalism extension graph (FEG).

The FTG graph in Fig. 16 shows the various transformations of SysML to other languages, techniques, tools, and concepts for different application scenarios such as simulation, verification, analysis, and extracting code.Footnote 7 In addition, the FEG in Fig. 17 shows the different extensions of SysML used in the approaches, techniques, and methods introduced and presented in the papers of our result set.Footnote 8 Besides the shown transformations and extensions, there are two publications describing linking techniques for SysML to other languages, one to Relax and one to Simulink.

Fig. 17
figure 17

Formalism extension graph (FEG) of SysML publications (included number of studies: 579)

It can be summarized that most of the SysML publications are directed toward individual approaches for the design or validation of systems. In most cases, established languages, mechanisms for extension, and transformations are used. To illustrate these main findings, we give an overall view in Fig. 18 where we show the systematic map of SysML publications regarding type facets, systems engineering phases, and contribution types. This figure presents the interplay of all the probed categories and their classification as output of the last activity of the systematic mapping process (see Sect. 3, Fig. 1).

figure i

5 Threads to validity

For identifying the threats of validity of our SMS, we follow the four basic types of validity threats according to Wohlin et al. [61]. We address each of these threats in the following subsections.

5.1 Conclusion validity

Conclusion validity takes care of issues that might arise when drawing conclusions and whether the SMS can be repeated. According to Wohlin et al. [61], the main focus is to draw a correct conclusion regarding relations between the design and outcome of the study. In the given SMS, threats to conclusion validity include:

  • Subjective measures, such as the manual categorization of abstracts to the research type facets of Petersen et al. [44].

  • Low statistical power, due to the restricted amount of identified publications (e.g., a few publications may influence the ranking of prominent contributors).

  • Fishing (searching) for specific results, since the results are influenced by the chosen selection of publications (see internal validity).

An additional threat to the validity of the conclusion of a SMS is the publication bias. The term “publication bias” occurs when studies with non-significant findings are either be not submitted by their authors, or may be rejected by reviewers and/or editors and then this could be a risk considering our research type facets. For instance, the risk could be based on the reason that opinion papers are less frequent, since they are more often rejected and become either unofficial technical reports or unpublished studies. To counteract to this risk, we use different databases with various scopes.

As mitigation strategy against subjective measures, the papers of the result set were classified by each of us based on the strategy introduced in Petersen et al. [44], presented in Table 1. Subsequently, these classifications were discussed among themselves. Thereby, occurred discrepancies were considered in more detail and discussed in the group before we re-classified them. Once again, our mitigation strategy against the low statistical power is the use of four different digital libraries to obtain the most complete possible set of papers focusing on SysML as research topic. A comparison with a sampling method such as introduced in [30] would be interesting in order to see whether the same publications would be found. Unfortunately, this investigation goes beyond the scope of this article.

5.2 Internal validity

Fig. 18
figure 18

Systematic map of SysML papers (included number of studies: 579)

Threats to internal validity address issues that indicate a causal relationship, such as hidden factors. This phenomenon is also known as spurious correlation. Therefore, the main goal is to guarantee that the methods used in the SMS cause the outcome of the survey. It should be mentioned that factors which impact the internal validity are also significantly influencing the process of the research subjects’ (i.e., publications’) selection. For a better understanding of the internal validity regarding our SMS, we describe in more detail the two influencing factors, selection and instrumentation, based on Wohlin et al. [61], in the following:

  • Publication selection based on:

    \(\bullet \):

    Keywords: only the title of publications were searched for the following keywords: ‘SysML’ or ‘System[s] Modeling Language’.

    \(\bullet \):

    Time frame: restricted from 2005 to 2017, since the first draft of SysML specification was published in 2005. The idea of UML for Systems Engineering was already issued in 2003 but with a different naming.

    \(\bullet \):

    Literature repositories: we took into account four different literature repositories, which are Scopus, ACM Digital Library, IEEE Xplore Digital Library, as well as DBLP.

    \(\bullet \):

    Publication language: only publications with English, German, or French abstracts were considered, even though the repositories have provided additional abstracts satisfying the keywords as well as time frame, like Chinese or Spanish publications.

    \(\bullet \):

    Manual filtering: we deleted duplicates, books, and theses, as well as publications without abstract or research context to SysML.

  • Instrumentation caused by design of artifacts:

    This includes, for instance, timeliness and completeness of literature repositories to answer the question which venues are considered by those libraries. It may be possible to delay previously published articles like in the case of post-proceedings, and therefore, they are not available online.

Our mitigation strategy to address risks of publication selection and instrumentation was to avoid too tight restrictions by considering alternatives. For instance, (i) three different keywords based on our mapping scope were used for the search process, (ii) a time frame was applied that started with the first draft of SysML, and (iii) four broad-based literature repositories were taken into account for conducting the search. In contrast to [63], we use the four libraries ACM, IEEE, DBLP, and Scopus and not, for example, SpringerLink. However, SpringerLink references are included in DBLP and Scopus and therefore implicit in our mapping study. In addition, Scopus contains many publications in the field of systems engineering that do not appear in the other libraries. Thus, by this mitigation strategy, the result set may cover a representative set of relevant publications.

Regarding manual filtering, a certain bias remains according to publications with heterogeneous titles and abstracts, but identical content. We discussed this issue in the group. However, this uncertainty remains open due to method we have chosen for this SMS based on Petersen et al. [44].

5.3 Construct validity

Construct validity concerns the relationship between theory and observation. According to Wohlin et al. [61], construct threats to validity cope with issues that might arise during research design. Thus, it should be checked if the used concept is sufficient. There are two kinds of threats to construct validity, which are (i) design threats such as mono-operation bias, mono-method bias, or confounding constructs as well as levels of constructs, and (ii) social threats such as hypothesis guessing, evaluation apprehension, and experimenter expectancies [61]. It should be mentioned that social threats do not apply to non-personal subjects (such as publications); however, they may be relevant regarding the authors of this mapping study [61].

In the given SMS, threats to construct validity include:

  • Mono-method bias: the study is mainly based on the systematic mapping process introduced by Petersen et al. [44]. In this context, the mitigation strategy was that two independent research groups have worked cooperatively in this study. In doing so, the first literature review has been independently carried out by each group.

  • Confounding constructs and levels of constructs: for instance, in the case of categorization, there is more than only one type facet applicable (e.g., validation research vs. solution proposal). In cases where the levels of applicable type facets are relevant, we selected the most fitting category based on objective aspects and discussion within the group.

  • Hypothesis guessing: since the authors of this article are familiar with systems engineering and SysML, some outcomes might be expected such as increasing publications over the years, or close relationships among research groups. We minimized this risk by using an open research design where we have generated knowledge instead of only checking it.

5.4 External validity

The external validity is concerned with “generalization,” and whether the result of a study can be generalized outside the scope of the study or not. According to this validity, there are three main risk types [61]: (i) interaction of participants and treatment, (ii) interaction of environment/setting and treatment, and (iii) interaction of history/timing and treatment. However, in the presented SMS, we do not aim for generalization. Given our scope (keywords, time frame, etc.), the SMS aimed for completeness; however, no extensive literature survey can ever claim to be complete. Our SMS is concerned with scientific research on SysML and can not be generalized to closely related research field. Although some conclusions could be generalized to a broader topic (e.g., lack of evaluation research studies), we did not draw such general conclusions.

6 Conclusion and outlook

In this article, we report on our findings regarding the investigated research topics on SysML over the last thirteen years by performing a systematic mapping study. We found out that initially most of the publications were published in systems engineering venues, but since 2013, the research interest on SysML topics moves more toward software engineering. It may be concluded that this moving interest results from the fact that in 2013 Industry 4.0 initiatives started to implement their visions such as CPPS, IoT, IIoT, and others. Therefore, SysML has been very strongly represented in the production application area since that time. Also it seems that the research interest on SysML topics seems to be stronger in Europe than on other continents, since the Industry 4.0 vision started in Germany. However, in Asia and the USA, there started also similar initiatives known under the umbrella “advanced manufacturing” [10] which also stimulate research on software engineering and SysML.

It can be summarized that out of the nine SysML diagram types, the following ones are mainly used: requirement diagram, parametric diagram, activity diagram, state machine diagram, block definition diagram, and internal block diagram. It turned out that the two newly introduced diagram types—requirement diagram and parametric diagram—are accepted and frequently used by the academia research community. SysML is well established as modeling language for designing, analyzing, and verifying complex systems. However, many researchers customize SysML for their purposes, and therefore, define their own profiles since SysML seems still too generic for some domain-specific tasks (e.g., SysML4Modelica [42, 50], SysML4Mechatronics [4, 24] to mention just a few approaches). An additional finding is that SysML is lacking of operational semantics. Some approaches aim to overcome this gap such as fSysML [3] which is similar to fUML (a foundational subset of UML for executable UML models).

Toward SysML v2

The OMG is currently working on a new version of SysML in version 2 (abbreviated SysML v2). Based on the first insights from the draft SysML v2 RequirementsFootnote 9, it becomes apparent that the main challenges regarding the usage of SysML, which we have identified and discussed in the presented mapping study, were also admitted in the current work of the standardization group. For instance, SysML v2 is intended to expand the requirement diagram by formal definitions of non-textual requirements in order to make these requirements more general and subject to automated validation. Additionally, the draft addresses the issue of ambiguous operational semantics of SysML, trying to solve this ambiguity similarly to the fUML initiative. There are also planned enhancements to have a timing component in models, which is an important issue, e.g., when modeling continuous systems in combination with discrete systems.

Based on the presented SMS and its results and main findings, we identified the following research directions for future work.

Research direction 1: life cycle support

The results show that there is only limited support when using SysML in the implementation phase, and very limited support for describing the whole life cycle of a system from design until operation and back again, for implementing so-called “liquid models” [36]. Therefore, a future research direction is to exploit and adapt SysML for supporting the execution and analysis of systems during runtime and to align operational data with design models.

Research direction 2: modeling hybrid systems

Most of the selected publications consider either discrete or continuous challenges when designing systems [6, 23]. This means that very rarely hybrid solutions in systems design are taken into account [19]. Therefore, further investigations should be undertaken for defining formal semantics for SysML to close the gap when combining discrete and continuous modeling and simulation.

Research direction 3: operational semantics for SysML

Currently, there is no support, e.g., to shift property specification and verification tasks up to the model level. There is still a rule-based operational semantics missing to ensure a step-wise, state-based semantics, e.g., to describe a finite execution trace through a sequence of changes. In this context, a future research direction is to define a rule-based operational semantics for SysML, e.g., based on foundations done in the context of fUML.

Research direction 4: deeper analysis of the publication corpus for further research questions

Our result set provides a good foundation for deeper analysis for specific topics related to SysML regarding particular research fields. For example, the contribution category notation can be further differentiated into various language engineering aspects (e.g., profiling, translation to another languages, etc.). Based on such analysis, it is possible to characterize similarities and differences among various approaches.