1 Introduction

Component-based software engineering (CBSE) is a common approach in the development and evolution of contemporary software systems. However, in CBSE, developing a new component internally (in-house) is not necessarily the best option (Wohlin et al., 2016). Thus, practitioners are very often asked to choose between different component sourcing options (CSO). But what are the factors that affect a practitioners’ decision to choose one CSO over another? In other words, how do the practitioners prioritize the attributes of a component when they have to decide on “buying” or “making” a new component?

Prioritization is a procedure of critical importance in decision making. In software engineering, it is encountered in cases where multiple attributes have to be considered in order to make a decision. However, human subjectivity accounts for variation when different people try to independently prioritize a certain number of attributes. These factors have led to the adoption of voting schemes where stakeholders express their relative preferences for certain attributes in a systematic and controlled manner. We used cumulative voting (CV) or the 100-point method or hundred-dollar ($100) test, described by Leffingwell and Widrig (Leffingwell & Widrig, 2003), to gather practitioners’ preferences. The methodology utilized for the analysis of the results was compositional data analysis (CoDA).

There is little available research on which attributes of a component are of primary importance when multiple attributes are considered to decide which component sourcing option is more appropriate in different cases. Thus, making a CSO decision is crucial and the reasons behind it deserve further investigation. Understanding the source of variation between decision makers among different CSOs in CBSE may optimize the decision process and consolidate opinions with respect to prioritization. In the present work, we focused on the attributes that practitioners typically compare when they are choosing to add or replace a new component for their products. The products concern software-intensive systems and thus entail component complexity. Therefore, an industrial cross-domain anonymous survey regarding the practitioners’ decision making in relation to choosing between CSOs was conducted (Borg et al., 2019). The questionnaire was web-based and consisted of both open-ended and closed-ended questions. The practitioners were asked to choose between different CSOs; however, they were also able to choose more than one CSO.

The present work is an extension of our previous work (Chatzipetrou et al., 2018). In this study, we aimed to further investigate the different views of practitioners towards the prioritization of different attributes. The results of the hundred-dollar ($100) test are coded as variables, and they are statistically analyzed in order to find differences or agreements in views and correlations with other inherent characteristics of the practitioners, i.e., the role, amount of working experience, level of education, maturity of product they work with, and size of their organization. This information was collected via the same survey (Borg et al., 2019). The selection of the inherent characteristics was based on the available data. For instance, the education type was included in the analysis but its domain was not included since the answers we got from the survey were too many and the results were sparse. Thus, we included in the analysis the inherent characteristics where are at the same time meaningful and we have data availability from our sample. A further investigation using the inherent characteristics of the practitioners was undertaken in the present work. The focus of this work is twofold. First, to investigate if there are any patterns behind the views on CSO decisions for a particular type of practitioner, product, or organization, and secondly to understand if the different views of particular types of practitioners, products developed, or organizations change when these are of a specific level in terms of experience, education, product maturity, or company size.

The paper is structured as follows: Sect. 2 presents an outline of the related work. Section 3 provides research methodology and discusses the basic principles of CoDA along with various challenges related to its application. Section 4 presents results from descriptive statistics, the application of non-parametric tests and the CoDA framework on survey data. Finally, in Sect. 5, conclusions and directions for future work are provided.

2 Related work

Component-based software systems require decisions on component origins for acquiring components. A component origin is an alternative option of where to acquire a component. A recent systematic literature review about CSO selection (Badampudi et al., 2016) also investigated decision criteria, methods for decision making, and evaluations of the decision results. The paper highlighted that the CSO comparison was mainly focused on in-house vs. COTS and COTS vs. OSS. In a recent case survey (Petersen et al., 2017), 22 case studies of how practitioners choose between CSOs were investigated. One of the conclusions was that the most frequent trade-offs are carried out between in-house vs. COTS, in-house vs. outsourced, and COTS vs. OSS. In-house was the most favored decision option; however, the evaluation of the decision showed that many of the decisions were perceived as suboptimal, indicating a need for optimizing the decision-making process and outcomes.

In the survey of Borg et al. (Borg et al., 2019), the most important challenges related to CSO selection were identified and fell within the following three types: managerial, functional, and non-functional (quality-related). There have not yet been any attempts to identify the significance of these challenges to the best of our knowledge.

Several primary studies discussing in-house vs. COTS CSO decisions exist, i.e., (Brownsword et al., 2000; Li et al., 2006a). In (Cortellessa et al., 2008), a framework was presented to support the decision to buy components or build them in-house. The authors in (Li et al., 2006b) studied decisions made during the integration of COTS vs. OSS and showed significant differences and commonalities.

Cumulative voting (CV) is known as a prioritization technique, used for decision making in various areas. CV has been used also in various areas of Software Engineering, such as requirements engineering, impact analysis, or process improvement (Regnell et al., 2001; Berander & Wohlin, 2004). Prioritization is performed by stakeholders (users, developers, consultants, marketing representatives, or customers), under different perspectives or positions, who respond in questionnaires appropriately designed for the purposes of prioritization. CV has been proposed as an alternative to the Analytical Hierarchy Process (AHP) and its use is continuously expanding to areas such as requirements prioritization and the prioritization of process improvements (Leffingwell & Widrig, 2003; Firesmith, 2004).

In (Regnell et al., 2001), CV is used in an industrial case study where a distributed prioritization process is proposed, observed and evaluated. The stakeholders prioritized 58 requirements with $100,000 to distribute among the requirements (the large amount of “money” was chosen to cope with the large number of requirements). In (Staron & Wohlin, 2006), CV was used for an industrial case study on the choice between language customization mechanisms. In (Hatton, 2008), CV is one of the four prioritization methods examined, evaluated and recommended for certain stages of a software project. In (Chatzipetrou et al., 2010), 18 interviewees were asked to prioritize 25 aspects using CV by distributing 1000 imaginary points to the aspects. Each interviewee prioritized the 25 aspects twice: Under the organizational perspective and under the self-perspective. The data were collected during an empirical study on the role of impact analysis (IA) in the change management process at Ericsson AB in Sweden. CoDA had also been used in the software effort phase distribution analysis (Chatzipetrou et al., 2015; Chatzipetrou et al., 2012).

In the previous work of the authors (Chatzipetrou et al., 2018), CoDA was used for the visualization of the inherent characteristics of the practitioners. The work is extended in this paper, with a deeper investigation into which attributes are the most important for the decision process related to CSOs. Box plots provided us with insights about the most important attributes in each different selection case. For this purpose, an exploratory study was rigorously conducted with regard to CSOs selection and the inherent characteristics of the participants, which led us to the visualization of relevant results.

3 Research methodology

3.1 Research questions

In this paper, we used the experience from our former studies on CV and on the selection of different CSOs in order to investigate why practitioners chose one CSO over another and why they chose specific combinations of CSOs. A thorough study was conducted based on their choices. Moreover, we aimed to discover if there were any trends among the practitioners based on their inherent characteristics, e.g., their current role or the number of years they have been active in the industry. Therefore, in the present study, we investigate the reasoning behind decision making in component selection based on the practitioners’ inherent characteristics. The main contribution is to understand and explain the decision-making process of practitioners in CSO selection. The methodology is applied to real survey data, in order to draw interesting and useful results regarding the practitioners’ decision processes.

Our work was driven by the following research questions (RQs):

  • RQ1: What matters the most to industry practitioners when selecting CSOs?

We wanted to explore what matters the most to industry practitioners when selecting CSOs for their existing or new projects. In order to investigate which information is the most important input for their decision, we used a set of attributes (defined and presented in Table 1) and we asked the practitioner to prioritize those attributes by using the CV technique.

  • RQ2: Is the decision process regarding the CSO affected by the practitioners’ known characteristics (i.e., role, working experience and education)?

Table 1 Attributes used for prioritization

We wanted to explore if the decision process regarding the CSO was affected by the practitioners’ characteristics. The available characteristics from this study are defined and presented in Sect. 4.3. We mainly investigated and explored trends and peculiarities among our population by using a powerful descriptive tool specially designed for this type of data: the biplot.

The above RQs are the starting point in our investigation and will help us drive our research and gain further understanding into the decision-making process in CSOs.

3.2 Description of the data set

An anonymous, cross-domain, industrial survey was conducted that aimed to identify the relationship between practitioners’ decision making and which CSOs were chosen. The survey questionnaire was web-based and consisted of a number of both open-ended and closed-ended questions. The practitioners were asked to choose between four different CSO. Moreover, the practitioners were free to choose more than one CSO if they believed that to be necessary.

The CSO decisions can be summarized in the following four alternatives (Petersen et al., 2017; Borg et al., 2019):

  • Software developed internally (in-house): This is the case where a company develops a component internally. In addition, development is still considered in-house when the development is distributed in different locations, as long as it takes place within the company. The source code is developed and remains inside the same company.

  • Software developed outsourced: Another company develops the component on behalf of the company that wants to obtain the component. Usually, the source code is delivered as part of the contract agreed upon between the two companies.

  • Commercial of the shelf software: The company buys an existing component from a software vendor (pre-built). The source code is not available for the buyer.

  • Open-source software: The company integrates a pre-built, existing component that has been developed by an open source community as an open-source software. The source code is publicly accessible.

Practitioners were asked to choose between the above-mentioned four CSOs and indicate which information was the most important input for their decision process. The attributes were chosen after a joint research effort was conducted with several senior researchers within the project research team (the Orion research team). In addition, an external senior software engineering researcher was invited to review the chosen attributes. Moreover, a native English speaker reviewed the attributes from a language perspective. The attributes’ names and descriptions were refined to avoid potential ambiguities. An effort was put towards the common understandability of the attributes between the practitioners. In particular, a detailed description was included for each one of the attributes in order to avoid misunderstandings. At the next evaluation stage, a pilot run was conducted. We invited 15 independent researchers to act as test pilots and evaluate the entire survey. We used their feedback in order to refine the attributes description. For instance, an “Other” category was added.

Finally, 12 attributes were chosen (Table 1 presents the attributes in the same order they appeared in the survey). The practitioners were asked to prioritize 12 attributes using CV by distributing 100 imaginary points. The number of the respondents involved was 157. The complete description and design of the survey is available in (Borg et al., 2019).

3.3 Data analysis

  1. a)

    Descriptive statistics and box plots

Descriptive statistics involve the computation of simple summary statistics like minimum and maximum values, the mean, standard deviation, and the median of the data. Descriptive statistics are computed separately for the whole set of the data and separately for each combination of CSO selections. The statistics are accompanied by graphical representations like radial bar charts and box plots.

  1. b)

    Cumulative voting

CV or the 100-point method or hundred-dollar ($100) test, described by Leffingwell and Widrig (Leffingwell & Widrig, 2003), is a simple, straightforward, and intuitively appealing voting scheme where each stakeholder is given a constant amount (e.g., 100, 1000, or 10,000) of imaginary units (or imaginary currency) that he or she can use to vote in favor of the most important attributes. In this way, the amount of money assigned to an attribute represents the respondent’s relative preference (and therefore prioritization) in relation to other attributes. The points can be distributed in any way the stakeholder desires. Each stakeholder is free to put the entire amount given to them on only one attribute of critical importance. It is also possible for a stakeholder to equally distribute the amount to many, or even all, of the attributes.

However, since the results from the hundred-dollar ($100) test sum up to 1, we cannot treat them as independent variables and since they are restricted to the [0,1] interval, normality assumptions are invalid. A methodology that is suitable for the analysis of proportions is CoDA. This methodology has been widely used in the analysis of material composition in various scientific fields like chemistry, geology and archeology, but its principles fit the analysis of data obtained by CV.

  1. c)

    Compositional data analysis

CoDA is a multivariate statistical analysis framework for vectors of variables having a certain dependence structure: The values of each vector have a sum equal to a constant. Usually, for easy reference of the same problem, after dividing by that constant, the sum of the values of each vector becomes one. The important point here is that data is constrained to the [0,1] interval; therefore, the techniques applied to samples from the real Euclidean space are not applied in a straightforward manner.

Concerning the prioritization questionnaires using the $100 test, the data essentially represents proportions of the overall importance allocated to each of the aspects examined in a study. The relative importance of the aspects is represented by their ratios, so CoDA seems to be the appropriate framework for their study. Historically, Karl Pearson in 1897 (Pearson, 1897) posed the problem of interpreting correlations of proportions while the milestone for this type of statistical analysis is the pioneer work of John Aitchison (Aitchison, 1982; Aitchison, 2003). The freeware package CoDaPack3D (Comas-Cufí & Thió i Fernández de Henestrosa, 2011) was used for compositional data analysis.

The data from the CV questionnaires have some special characteristics that can cause problems in the analysis. The problem of zeros is of principal importance. When the number of attributes is large, and the individuals are only few, the data matrix is usually sparse, with a large number of zeros. The presence and meaning of zeros in a dataset can be of crucial importance. Two types of zeros exist. Essential zeros it could imply complete absence of an attribute on the other hand, rounded zeros refer to the instrument used for the measurements where they did not or could not detect the attribute. In our study, the meaning of a zero proportion is interpreted as rounded zeros as the zeros are results from lack of recording. However, this structure causes problems of interpretation when we consider their relative importance. In order to address that problem, we used a simple method proposed by (Dunn, 1959), known as multiplicative replacement strategy and according to it every zero value is replaced with a very small value (0.01) and the rest of the vectors are adjusted and recalculated in order to sum up to 1. The advantages of multiplicative replacement are discussed extensively in (Martín-Fernández et al., 2000; Martín-Fernández et al., 2003a; Martín-Fernández et al., 2003b).

To treat the data with “common” statistical techniques we needed to transform them. Aitchison (Aitchison, 2003) proposed the centered log ratio (CLR) transformation for transforming a raw proportional dataset to real space and at the same time retaining its correlation structure.

In order to visualize the results from the CoDA, we used Biplot (Gabriel, 1971; Gabriel, 1981; Aitchison & Greenacre, 2002), which is a straightforward and useful tool for exploring trends and peculiarities in data. Its basic characteristics are the lines (or rays) and dots. Rays represent the variables, and dots represent the respondents. An important characteristic of the plot is the angle between the rays. The length of a ray shows the variance of the corresponding variable. Longer rays depict higher variances. A link is an imaginary line connecting the ends of two rays. It essentially shows the difference between the two variables. Large links show large proportional variation. It is essential to emphasize that, in terms of interpretation, links are considered more important than rays since the variables can be examined in a more relative and intuitive manner. Finally, the cosine of the angle between the rays approximates the correlation between the CLR transformations of the variables. The closer the angle is to 90°, or to 270°, the smaller the correlation. An angle nearer to 0° or 180° reflects strong positive or negative correlation, respectively (Aitchison & Ng, 2005). An extensive presentation of the CoDA framework is discussed in (Chatzipetrou et al., 2010; Chatzipetrou et al., 2015; Chatzipetrou et al., 2012).

3.4 Validity threats

The validity threats are distinguished between four aspects of validity according to Runeson and Höst (Runeson & Höst, 2009):

Construct validity reflects the extent to which the operational measures represent the study subject. In the present study, practitioners’ views are measured on a numerical scale. However, since the practitioners work for different organizations offering different products, their views on component selection may differ. However, a deeper analysis on practitioners’ decision processes based on their inherent characteristics was valuable. The major threat to our study was whether our inquiry about previously experienced CSO decisions truly reflects a phenomenon in the industry. We addressed this by developing the questionnaire into a joint research effort with several senior researchers followed by a pilot run with a handful of selected respondents. Furthermore, our list of attributes appears to be rather comprehensive as the number of Other answers is low. Our initial construct captured CSO decisions and component selection as two separate activities, but our construct evolved during the study.

Internal validity refers to the examination of causal relations, which is the intended outcome of our investigation. In our case study, we focused on how the different inherent characteristics of the practitioners, i.e., the role, size of the company, etc., affect the decision process with regard to which CSO is chosen.

Regarding external validity, the study is clearly empirical and by no means can the findings be generalized to other software development organizations with similar characteristics. The population under study, i.e., practitioners involved in architectural decision making in component-based software evolution, is large and highly heterogeneous. Under these circumstances, the highest importance was to select a representative sample. Our survey was not designed to make strong quantitative conclusions about the general population of practitioners involved in CSO decisions, but rather to identify larger trends. The practitioners who participated in the survey do not constitute a random sample; however, they were approached for their experience and expertise, so their responses are considered valid.

Regarding reliability, this aspect is concerned with the extent to which the data and the analysis are dependent on specific researchers. Hypothetically, if another researcher later on conducts the same study, the result should be the same. The data gathered are quantitative and independent from the influence of different research subjects or researchers’ interpretation. The survey questions were piloted with a set of practitioners who provided feedback and improvement suggestions. A statistical analysis was performed with reliability in mind, with each step documented for potential replication.

4 Results

4.1 Descriptive statistics

  1. a)

    General results

The results from the descriptive statistics are available in Table 2 and Fig. 1, where Table 2 summarizes the answers of the practitioners in absolute numbers and from two perspectives: the number of practitioners that chose an attribute and the total points an attribute received from all the practitioners. The practitioners were free to select any number of attributes. The attributes are in descending order, based on the practitioners’ choices.

Table 2 Descriptive statistics
Fig. 1
figure 1

Number of respondents vs. total points selected for each attribute

The analysis showed that cost is considered the most important attribute from the majority of the practitioners when making CSO decisions, selected by 121 out of 157 practitioners (77%). The inputs that are also considered important in the decision process and are mentioned by roughly half of the practitioners are: support of the components (74 out of 157, 47%), longevity prediction (71 out of 157, 45%), and level of off-the-shelf fit to product (62 out of 157, 40%). An interesting finding is that support of the components is the second most popular choice among practitioners. However, it is ranked 4th in the total points received and the maximum amount of points it got from a practitioner is 50. In other words, the practitioners did not assign high values to this choice; however, they still consider it an important factor that should be taken into account. The same is true for longevity prediction (maximum amount of points, 60; however, it is second in the total amount of points received). We can hypothesize that complexity (and also maintenance costs) are likely to increase as more components are added from CSOs. Thus, if one considers cost as the most critical attribute, then complexity should matter. However, here components are probably seen as black box and their complexity is not seen as important.

On the other hand, size was not popular, as it was selected by only 18 out of 157 practitioners (11.5%). This rather surprising finding may suggest a discrepancy in views with regard to the total cost of acquisition and integration of a software component into the development environment. As pointed out by Jørgensen and Shepperd (Jorgensen & Shepperd, 2007) most research on software cost estimation focuses on the introduction and evaluation of estimation methods. Our hypothesis is that larger components require substantial more cost and effort to be integrated into the development organization on top of their acquisition cost (e.g., purchase cost), as this hypothesis is also supported by previous work on integration and maintenance costs (Abts et al., 2000; Nguyen, 2010). Therefore, we believe that this discrepancy should be further explored by researchers and emphasized to practitioners making CSO decisions.

The number for other is also low (18 out of 157, 11.5%) which reveals that the list of the given attributes covers the most important criteria needed to evaluate a component by practitioners. Among the other answers, the most frequent responses include: licensing issues and long-term strategies such as differentiation in the market and vendor relations. Licensing issues are often highlighted as a primary obstacle by companies that want to utilize OSS components, as is organizational resistance, which can be overcome by increased education and knowledge regarding OSS license types. However, these were responses mentioned by just 18 practitioners and thus do not raise any threats to the attributes selected to be included in the survey.

4.2 Results from research questions: RQ1

What matters the most to industry practitioners when selecting CSOs?

We use radial plots and box-plots to further investigate which attributes affect the selection of certain CSOs or combination of CSOs in industry, i.e., how the preferences on specific CSOs affect the prioritization of the 12 attributes. Figure 2 shows the summary of which CSOs the 157 practitioners typically considered, and they could select between one and four options (multiple options were allowed for this question). The majority of the practitioners (90%) consider software developed internally either as the sole option or in combination with other CSOs. Moreover, more than half of the practitioners considered OSS and COTS to be possible options, at 61% and 54%, respectively. The least considered option (34%) was outsourcing. It is evident that more than one CSO are typically considered by practitioners. A potential interpretation could be due to the high pace of technology acceleration, requirements on maintenance and upgrades, which typically leads to migration and put extra pressure on practitioners to consider alternative CSO practices. An interesting aspect to investigate is at which product maturity or software lifecycle stage this happens and to what types of organizations. Moreover, in the present work, we were interested in studying the specific combinations of CSOs and on which attributes the practitioners based their choices.

Fig. 2
figure 2

Which CSOs the 157 practitioners choose?

Figure 3 depicts the choices of 157 practitioners regarding their selection of CSOs, or a combination of CSOs. The results showed that almost the 24% of the practitioners considered only one CSO when they were adding or buying a new component. Additionally, a strong majority of them (73%) only considered software developed internally when deciding among CSOs. A very small percentage of practitioners would only consider open-source software (16%, or 4% in total) and outsourced software (8%, or 2% in total), but no practitioners considered COTS as the sole choice for adding new software. The fact that only 4% of respondents would consider only OSS option is interesting and worth exploring further. Considering that many software products have a commodity functionality layer and a differentiation functionality layer (Bosch, 2018), they are treated differently. The commoditized functionality layer is typically optimized for minimizing the total cost of ownership and selecting OSS components could potentially provide an optimal solution for the cost of ownership minimization (Linåker et al., 2018). It appears that our respondents have not yet uncovered the potential of using only OSS components in the commoditized layer or there may be other obstacles or regulatory forces (Sulaman et al., 2014) to prevent them from doing that and thus they have to consider more than one CSO source.

Fig. 3
figure 3

Which CSOs or combination of CSOs the 157 practitioners chose?

More than one quarter (27.5%) of the reviewed practitioners would choose a combination of two CSOs. Among these, about half of them (13.5%) would opt for software developed internally and OSS and 8.5% would develop software internally and use COTS. However, a significant number of practitioners (30%) would choose a combination of three CSOs, whereas 18% of them would choose to develop software internally, using OSS and COTS; 18.5% of the practitioners’ reported that they would consider all four CSOs when adding or buying new components. Moreover, the majority of the practitioners (57.5%, see Fig. 3) consider two or three CSOs. A possible interpretation could be the lack of competence in the area where the component is needed, e.g., a company needs a database component that is not the core differentiator of the product but is a necessary “enabler” for the differentiating functionality to be offered to the customers. Building a database solution internally is neither feasible nor justified from an economic standpoint. Another explanation could be time pressure, which forces decision makers to look for alternative solutions other than in-house development. Obtaining the necessary components externally has the potential of reducing time-to-market to integration and system testing time.

From Fig. 3, we can conclude that software developed internally is primarily considered a CSO among practitioners, either as a sole option or in combination with other CSOs since the top-ranked choices include software developed internally. It is also shown that the second most popular option combined with internal software development is OSS (13.5%), followed by COTS (8.5%). Only a very small number of practitioners combine internal software development with outsourcing (5%) or outsourcing and OSS (4%). This is probably because it could cause an increase in costs in the quality evaluation of an integrated solution and the monitoring of the outsourced solution’s quality.

We generated box plots representing practitioners’ preferences separately for each combination of CSOs and grouped by the number of the CSOs they have considered. These are presented in Figs. 4, 5, 6, and 7.

Fig. 4
figure 4

Which attributes are the most important for the practitioners who chose one CSO?

Fig. 5
figure 5

Which attributes are the most important for the practitioners who chose two CSOs?

Fig. 6
figure 6

Which attributes are the most important for the practitioners who choose three CSOs?

Fig. 7
figure 7

Which attributes are the most important for the practitioners who chose four CSOs?

Figure 4 illustrates how the 12 different attributes are prioritized by the practitioners who choose only one CSO to add or buy for their new system. It is clear that cost is the most important attribute. However, there are variations regarding how high cost is prioritized in each case. Moreover, the results show that for different CSOs, different attributes are important. Thus, to the practitioners that choose only to develop their software in-house (internally, Fig. 4), cost is the most important attribute, however they did not assign high values to this attribute (maximum 50) when programming language performance and longevity prediction are also highly ranked. Regarding cost, the same is true for practitioners that chose only open-source software development. However, the practitioners that chose only this CSO prioritize higher access to relevant documentation and API adequacy.

The results are different for practitioners who only consider outsourcing software development (outsourced, Fig. 4). Cost is still here the highest-ranked attribute, but the practitioners also assign more importance to this attribute. Code quality is also very important, along with the support of the component and longevity prediction. The other attributes are of no importance to practitioners. The results are not surprising, since in the outsourcing scenario, support for the delivered component is as important for the practitioners as the quality of the code. At the same time, our respondents seem to be less concerned about the code quality for internally developed code than for outsourced code. In other words, they believe that the internal teams would not have significant quality issues and have confidence that they can deliver high quality code. We believe that this assumption may not always be true and that some outsourcing partners could actually deliver code with higher quality than internal developers, mainly due to experience and expertise.

The practitioners that typically consider two CSOs also assign higher values to cost (Fig. 5). However, when they choose to develop their software in-house, in combination with open source software, level of off-the-shelf fit to product is their next priority. One possible explanation for this result is the inability to order functionality from an open-source community. OSS communities operate based on the meritocracy principle, where the most prominent contributors (individuals or organizations) have the most influence on a decision. Therefore, if an organization uses an OSS component without substantial contributions, the guarantee that it is compatible with the internally developed product is managed by the organization. Thus, many software organizations that use OSS components investigate the stability of interfaces and the health of the OSS ecosystems (Baars & Jansen, 2012) before joining or using their software. Adherence to standards and code quality are important attributes for the practitioners that consider developing their software in-house in combination with outsourcing or COTS. Adherence to standards suggests that these participants probably take responsibility for product software certification and therefore need to ensure that the third party providing software delivers it according to the required standards and with comprehensive documentation. Cost is particularly important to practitioners that combine outsourcing and COTS (Fig. 5d). This, however, may be due to the small number of samples.

Furthermore, a similar pattern appears when the practitioners consider three CSOs for their new components. Cost is the most important attribute, but the practitioners did not give it a high value (maximum 50). A possible explanation is that even if cost is an important attribute, other attributes are important too, i.e., level of off-the-shelf fit to product, code quality, support of the component, and programming language performance (Fig. 7).

Finally, when the practitioners considered all four CSOs: cost along with longevity prediction, and support of the component are the attributes that were ranked higher (Fig. 7). One of the interpretations of the results is that the more options the practitioners have, the more perspectives they include, or perhaps different perspectives are taken under consideration, which are greatly ignored when only one CSO source is considered. This confirms the need for supporting tradeoffs between various CSO sources and assisting decision makers in choosing between various CSOs for software components.

4.3 Results from research questions: RQ1

Is the decision process regarding the CSO affected by the practitioners’ known characteristics (i.e., role, working experience and education)?

  1. a)

    Non-parametric tests

For the next step, we investigated how the inherent characteristics of the practitioners influence their decisions regarding which CSO to choose. For this purpose, the prioritized data was transformed with the methods of CoDa described in the previous section (replacement of zeros and CLR transformation) and Kruskal-Wallis. A non-parametric test (Kruskal & Wallis, 1952) was applied to the obtained data. The test was performed in order to investigate the distribution of each attribute across all demographic characteristics. Pairwise comparisons were also performed using Dunn’s (Dunn, 1964) procedure, with a Bonferroni correction (Dunn, 1964) for multiple comparisons. The results revealed that there are no statistical differences (p > 0.05) between:

  • The different roles of the practitioners (a description of the roles is available in Table 3),

  • Practitioners with different working experience, and

  • The education of the practitioners

Table 3 Description of roles

and the way the practitioners prioritize the 12 aspects.

Therefore, in order to explore trends and peculiarities among our population, we need a powerful descriptive tool designed for this type of data: the biplot.

  1. b)

    CoDA analysis—biplot

    1. 1)

      All practitioners

Figure 8 illustrates the biplot for the prioritization of 12 aspects by all 157 practitioners. The practitioners are represented by dots, while the rays represent 12 attributes. The results showed that there is a wide distribution of the dots in all the axes, and long rays (thus long links too) for most of the aspects. These indicate high dissimilarity between the practitioners, large variability among the aspects and some interesting correlations between variables. The biplot clearly depicts the high level of complexity of the decision process to prioritize attributes and illustrates how dissimilarly the practitioners prioritize each choice. More specifically:

  • The longest rays correspond to level off-the-shelf fit to product, code quality, cost, and longevity prediction indicating the aspects with the highest variance. In other words, practitioners allocated values from 0 to 100 to those attributes.

  • The longest links are the ones between cost and access to documentation, between longevity prediction and code quality and between level off-the-shelf fit to product and adherence to standards. Therefore, the largest differences, considering all aspects together, are located between these pairs of variables, which also seem to be negatively correlated (due to the nearly 180 angles that they have). For example, if a practitioner chooses to allocate more monetary points to cost, then we assume that he/she will assign almost 0 points to access to documentation and vice versa.

  • The shortest link connects the ray ends of size and complexity and indicates that the distribution in those two aspects is quite similar and positively correlated (due to the nearly zero angle that they have). Longevity prediction and support of a component seem to be positively correlated as well. In other words, the practitioners tend to allocate the same amount of points for the aforementioned aspect. For example, size and complexity.

  • The nearly orthogonal pairs of variables (the rays that form right angles with each other), such as the ones corresponding to cost and code quality or level off-the-shelf fit to product indicate correlation of these aspects with cost close to zero, which means that we can claim that the way a practitioner allocates the points for the two above-mentioned attributes is not related, either positively or negatively.

  • The angle between support of the component and longevity is close to zero, which indicates a positive correlation, which is not surprising since longevity is connected with suitable support. Otherwise, the organization has to maintain the component, which could be costly. Moreover, the same positive connection exists for API and level of off-the-shelf fit to product. API is essentially the interface that allows the connection of the component to the software. Hence, if the API is not adequate, the component will not fit the product, regardless of how well the functionality of the component is provided by the component.

Fig. 8
figure 8

All practitioners prioritize all attributes

Regarding the distribution of the practitioners with respect to the attributes prioritized, there are areas of high density as well as areas of low density. This means that groupings of practitioners exist, i.e., there is a group of high density near cost which means that a significant number of practitioners have assigned a larger proportion of effort to cost.

For the next step, a deeper analysis of the practitioners’ decision process was conducted based on their inherent characteristics. The characteristics that were investigated were related to the role a practitioner holds in the company, their general working experience and educational level, the maturity of the product they are working on and the size of the company they belong to. Each group of practitioners was aggregated by computing the mean value of their preferences. In the following figures the groupings of the practitioners’ inherent characteristics appear in dots in italics.

  1. 2)


Figure 9 illustrates a biplot of the practitioners grouped by their role within the company. It is clear that outside employees and those employees working with legal issues have completely different aspect prioritizations than the rest of the workforce. More specifically, they need to decide on changing a component based on other issues, i.e., licensing issues or vendor relations.

Fig. 9
figure 9

Practitioners grouped by their role

On the other hand, the practitioners who work with management (strategic and operational) but also product developers on average take into consideration the level of off-the-shelf fit to product and code quality.

  1. 3)

    Working experience

From Fig. 10, it is clear that employees with little work experience on average are more interested in issues related to level of off-the-shelf fit to product, access to the relevant documentation and API adequacy. On the contrary, more experienced employees on average focus more on code quality and the support of the component.

  1. 4)

    Educational level

Fig. 10
figure 10

Practitioners grouped by their work experience

Those practitioners with a university education on average seem to consider similar attributes for their decision process (i.e., level of off-the-shelf fit to product and access to the relevant documentation). Size also seems to be considered among practitioners with an academic background. When practitioners who attended professional courses or have a trade school education chose a new component, they considered attributes related with development and the usage of the new component to be more important, i.e., complexity, programming language performance, and code quality. (Fig. 11).

  1. 5)

    Maturity of the product

Fig. 11
figure 11

Practitioners grouped by their education

From Fig. 12, we can claim that the practitioners who work on more mature products (more than 15 years) place more emphasis on non-functional attributes (i.e., size). However, cost is still their first priority. On the other hand, practitioners who work with less mature and newly established products (less than 10 years) seem to be more interested in complexity and API adequacy, and they are not focused as much on the cost.

  1. 6)

    Size of the company

Fig. 12
figure 12

Practitioners grouped by the maturity of the product they are working with

Regarding the size of the company a practitioner works at, the results are available in Fig. 13. It seems that smaller organizations focus more on development and maintenance of the component (complexity, API adequacy, and access to relevant documentation). On the other hand, bigger organizations focus on properties associated with cost.

Fig. 13
figure 13

Practitioners grouped by the size of their company

5 Conclusions and future work

The decision for an organization whether to develop components internally or to acquire them from external sources is crucial. The present study focuses on the investigation of what matters the most to industry practitioners during component selection. In this work, we focused on four CSOs: (1) Software developed internally (in-house), (2) Software developed outsourced, (3) COTS, and (4) OSS. The practitioners were free to choose more than one CSO if they believed that to be necessary. While choosing between the four CSOs, the practitioners had to indicate which information was the most important input for their decision process by prioritizing 12 attributes. Since few studies regarding CSO selection exist, the main contribution of our work is into the understanding of the CSO decision and selection. Firstly, we performed a descriptive of our data. The results show that cost is clearly considered to be the most important attribute during the selection of a component. Other important attributes for the practitioners were: support of the component, longevity prediction, and level of off-the-shelf fit to product. Moreover, we examined our data in depth and we focused on the following two research questions.

  • RQ1: We wanted to explore what matters the most to industry practitioners when selecting CSOs for their existing or new projects. The results showed that a number of practitioners consider in-house software development as the sole CSO; however, there is a trend to consider additional CSOs. A large proportion of the practitioners included all four options in their CSO decision, i.e., internal software development, OSS, COTS and Outsourcing. In that particular case, the prioritization of the attributes is cost, longevity prediction, support of the component, whereas the rest of the attributes are ranked in a similar way among the practitioners, with API adequacy, size, and other as exceptions. Different attributes appear in a different order of importance for each case.

  • RQ2: We wanted to explore if the decision process regarding the CSO was affected by the practitioners’ characteristics. After a detailed analysis, based on the practitioners’ inherent characteristics, it seems that smaller organizations and more immature products focus on properties associated with ease of use, development and maintenance of the component. On the other hand, bigger organizations and more mature products focus more on the properties associated with cost. Therefore, smaller companies need support in order to identify components that allow for easier use in terms of development. On the other side of the spectrum, bigger organizations with mature products need, and are looking for, less costly components.

Our research has several implications for research and practice. Firstly, we observed a wide variety of decision processes experienced by the survey respondents, even though we were not surprised given the fact that we had to deal with heterogeneous contexts. Moreover, our survey showed that decisions are based on data, and we tried to understand what types of data are used, how they are used, and how the data is translated into the actual decisions.

The data gathered in such studies is affected by various sources of variation and are therefore subject to large variability. The statistical analysis of such data can reveal significant differences, trends, disagreements, and groupings between the practitioners and can constitute a valuable aid for understanding the attitudes and opinions of the interviewed persons and therefore a tool for decision making.

In terms of future work, we intend to focus on providing support to companies in improving their component selection process. Within our work program, we plan to continue research and efforts towards efficient and effective decision making in component-based software engineering.