1 Introduction

In the modern world, the recommendation system has become an indispensable part, especially fore-commerce, medical, and social media systems. The recommendation systems provide suggestions and recommendation based on the user interests. Recommendation system usually use data sources to develop the system components and train for making appropriate decisions. In digital business world the recommender system could predict whether a particular user would prefer an item or not based on the user's profile. Recommender systems are beneficial to both service providers and users [20]. They minimize transaction expenses of finding and selecting items in an online shopping environment [20].

In this paper we have considered the following formal definitions of recommendation system: Ricci et al. [44], defines that “A recommender system or a recommendation system is a subclass of information filtering system that seeks to predict the rating, suitability or preference a user would give to an item.” Hossain et al.[20] stated that “A Recommender System refers to a system that is capable of predicting the future preference of a set of items for a user and recommends the top items.” Moreover, Sridevi et al. [50] underlined that “A recommendation engine filters the data using different algorithms and recommends the most relevant items to users.”

The current era is considered the revolution period in the field of artificial intelligence (AI). “To train the AI models, the selection of efficient source code is a critical phase [7]. Most of the practitioners use the source code which is available on public sources, e.g., GitHub, source frog, etc. and the recommender system plays an important role in the selection of appropriated source code [7]. A code recommender system refers to a system that uses the code sources (e.g., Github, source frog) and recommends the most suitable source code to the developers and researchers. While selection the code, the affectability, and reliability of the code are very important [34]. There are critical concerns related to code recommender system i.e. code analysis concerning the code quality of and code implementation capability [34]. Mark et al. [18] underlined the significance of the code analysis before its preprocessing for feature extraction and implementation. Moreover, Gregorio et al. [45] highlighted the importance of analyzing the complexity, code size, and the available resources for code implementations. Yamashita and Moonen [55] emphasized the compatibility of the development environment capacity and for the selected source code. They further mention that to get successful results; the code should be executed in a compatible development environment.”

The importance of CRS in recent era motivated to conduct a comprehensive empirical study to explore the key challenges which might hinder the performance of CRS. The objective of the study consists of two main fold: (1) to explore the CRS challenges from the literature and verify them with industry practices; (2) to prioritize the investigated challenges concerning to their significance for CRS using fuzzy-AHP. The results and analysis of this study will provide the prioritization-based taxonomy of the investigated challenges. We believe that the deep analysis of CRS challenges will assists the industry experts and researcher to entertain the most priority challenges and develop the new techniques for the improvement in code recommendation systems. Following are the research questions develop to achieve the key objectives of this study:

  • [RQ1] What challenges of code recommendation system are discussed in the existing literature?

  • [RQ2] What is the real-world significance of the code recommendation system challenges?

  • [RQ3] What would be the prioritization-based taxonomy of investigated challenging?

2 Related work

The recommendation is helpful to “decide if there is no prior experience or knowledge about a particular matter. This is the technological revolutionized era, and the peoples believe in the auto recommendation system [38]. Due to the higher acceptance level, the business industry motivated to automate its businesses with a strong and reliable recommendation system [24]. We found several studies conducted to improve the performance of recommendation systems, e.g., [16, 39].”

Like the other areas of life, “the recommender system also has significant importance in the selection of source code for the training of artificial intelligence systems. Code recommender system refers to a system that uses the source code sources (e.g., Github, source frog) and recommends the most suitable code to the developers and researchers [41]. Currently, the source codes related to the machine learning filed received much attention from the software industry and academic researcher’s community. With the intersection of research areas, i.e., “software engineering,” “programming languages,” “machine learning” and “natural language processing,” the various communities have been composed into the areas of “big code” or “code naturalness” with numerous significant outcomes [44]. Mostly, in the field of machine learning, the researcher needs large corpora of code to train the artificial intelligence model and to learn the probabilistically causes concerning coding practices at a large scale. The primary aim is to train and implement the trained model as a useful tool in the required area. However, besides the importance of source code, there is little research has been conducted to address the problem of code recommendation systems. Mens and Lozano [40] highlighted that the selection of reliable source code from the available sources is an important activity to get the fruitful results. Janjic et al. [22] also emphasized the importance of a reliable code recommendation system.”

Considering the state of the art literature and to the best of our knowledge, little empirical research has been conducted to address and highlight the concerns of CRS systems. Though, we tried to fill this gap by conducting a comprehensive empirical study aiming to identify the key challenges that could hinder the performance of CRS. The systematic literature review (SLR) approach has been adopted to explore the existing literature studies and investigate the factors that could be critical challenges for CRS. The survey questionnaire method has been further used to evaluate the SLR results and encapsulate the perceptions of the field experts. The finally summarise list of the challenging factors is used to develop the prioritization taxonomy using the fuzzy AHP technique. Fuzzy AHP is widely used approach for multi criteria problems and has been used in different software engineering research projects [37, 43, 48, 49, 54]. For example, Khan and Shameem [29] used the fuzzy-AHP analysis to rank the success factors of software process improvement paradigm. Similarly, Shameem et al. [47] taxonomies the factors that could influence the agile processes in geographically distributed environment. Moreover, Akbar et al. [3] prioritize the DevOps challenging factors using fuzzy AHP. Based on the above discussion, we could justify the application of fuzzy AHP method for this research study.

3 Research methodology

To address the study objectives, three different steps were adopted. In first step, the systematic literature review was conducted to explore the challenges of CRS, reported by the researchers. In second step, the finding of literature review were verified by conducting the questionnaire survey study with industry experts. In third step, the fuzzy-AHP was used to prioritize the identified list of challenges considering their criticality for CRS systems. All the adopted research methodology steps are presented in Fig. 1 and described in below section.

Fig. 1
figure 1

Proposed research design

3.1 Systematic literature review (SLR)

The SLR is the most significant approach of identifying and interpreting the available research evidence, in formal manner, based on the developed research questions and protocols. In this study, we have performed the SLR study using the guidelines of [31]. The SLR steps are discussed in in below sections.

3.1.1 Review process planning

3.1.1.1 Research questions

To identify the challenges related to CRS the following research question was developed:

[RQ1] What challenges of code recommendation system are reported in the literature?

3.1.1.2 Database selection

The selection of appropriated data sources are critical for the collection of most potential literature relevant to the study proposed research the questions [12]. We have considered the following seven databases for data extraction considering recommendations provided by Chen et al. [12, 25, 42]:“IEEE Xplore (http://ieeexplore.ieee.org)”, “ACM Digital Library (http://dl.acm.org)”, “Springer Link (link.springer.com)”, “Wiley Inter Science (www.wiley.com)”, “Science Direct (http://www.sciencedirect.com)”, “Google Scholar (scholar.google.com)”, “IET-digital libraries (www.theiet.org)”.

3.1.1.3 Search strings

A search string is a combination of text, symbols, keywords and their alternatives used to extract the data from digital repositories [11, 15, 19, 23, 30, 42]. The Boolean “OR” and “AND” the selected keywords and their alternative were concatenated:

(“barriers” OR “obstacles” OR “hurdles” OR “difficulties” OR “impediments” OR “hindrance” OR “challenges” OR “limitations”) AND (“code recommendation systems” OR “code recommender systems” OR “code filtering systems”).

3.1.1.4 Inclusion and exclusion criteria

The inclusion criteria are the characteristics that must be included in study, while exclusion criteria are the characteristics to disqualify certain material from inclusion in the study. The same approach has been used in other studies of software engineering domain e.g. [42, 57] and [35].

Inclusion criteria:

  1. (1)

    Studies published in conference proceedings, workshop, journal, and book chapters.

  2. (2)

    Selected literature should be in English.

  3. (3)

    Articles whose findings directly related with the objective of this study.

  4. (4)

    Most recent article will be considered if two or more studies are of similar nature or from same research project.

Exclusion criteria:

  1. (1)

    Articles out of the CRS scope.

  2. (2)

    Studies that did not provide the detail discussion of CRS.

  3. (3)

    Studies that have not focused on CRS challenging factors.

3.1.1.5 Study quality assessment (QA)

To exemplify the degree of conformity of primary selected studies QA is performed. The checklist questions and Likert scale used for QA are given in Table 1. The objective of QA is to check the suitability and appropriability of the selected literature concerning to address the research questions of this paper.

Table 1 Selected studies quality assessment criteria

3.1.2 Conducting the review

Study selection and data extraction process

Total 631 studies were extracted from the selected repositories (Sect. 3.1.1.2) using the search strings discussed in (Sect. 3.1.1.3) and the given inclusion/exclusion criteria (Sect. 3.1.1.4). We further use tollgate approach [1] to further refine the selected studies and identify the most relevant primary studies. The tollgate approach consists of five phases. Figure 2 highlight the steps and flow of the tollgate approach used for this study. We finally shortlist total 34 primary studies after performing the five phase process of the tollgate approach. Each selected study is tagged as “ST” to differentiate it with other references. All the selected studies are given in Appendix A.

Fig. 2
figure 2

Selection of primary studies for data extraction

The data were extracted from the finally selected 34 primary studies that were synthesized by all the authors of the study. The first and third author thoroughly reviewed the selected studies and extracted the most relevant material (barriers, titles, and publication year). The review process of the extracted data has been conducted by the second author. He thoroughly reviewed the findings and identify if there are any inconsistences and incompleteness. The finally identified themes, concepts and statements were carefully reviewed and classified into 19 compact statement of challenges.

Moreover, the research biasness for the data extraction process has been removed by performing the inter-rater reliability analysis using the “non-parametric Kendall's coefficient of concordance” (W) test [1]. Five external experts were requested to participate in the inter-rater reliability analysis. The two experts were from King Saud University Saudi, one from NetSole Pakistan and two from ST Tech Sweden. The experts were requested to randomly select ten studies from the selected 34 primary studies. They were asked to perform the step-by-step data extraction process to measure the research biasness between the authors and the invited experts. Finally, the “non-parametric Kendall's coefficient of concordance” (W) value was calculated to compare the findings of the authors and the external experts. The value of W ranges from 0 to 1, where 0 show no agreement between the findings and 1 is the strong positive agreement [33]. In this study, the given results (“W = 0.88, p = 0.003”) indicate that there is positive agreement between the data extraction process of the authors and the external experts. It justified that the extracted SLR findings are reliable, and we could use it for further analysis and discussions.

3.1.3 Reporting the review

Selected primary studies quality assessment

Quality assessment (QA) was performed to measure the quality of the selected literature and their association to the research questions. The QA criteria have been developed (Sect. 3.1.1.5) to evaluate the quality of each study. The QA results obtained based on the given criteria are provided in Appendix A. It is noted that 84% of the selected score ≥ 80%, which justify that the data extracted from the selected primary studies are suitable and appropriate to answer the research questions. The threshold value for the QA score is 50% which is adopted from different other SLR studies published in the domain of software engineering [2, 21].

Temporal analysis

The temporal distribution of primary studies is given in Fig. 3. The publication years of the primary studies range from 2000 to 2020, which indicated that how the research consistently working on code recommendation systems area. According to the Fig. 3, the frequency of per year publication is higher in 2020 which is 5 paper, this indicated that the code recommendation system is currently an active research area.

Fig. 3
figure 3

Publication years of selected studies

3.2 Empirical study

3.2.1 Development of survey instrument

The questionnaire survey was developed to collect opinion of experts related to the challenges of coding schemes in software development. The designed question consists of bibliographic detail i.e. organization type, designation, gender, work experience etc. We added open-ended section in the questionnaire survey to elicit any additional challenges from the experts, which are not mentioned in questionnaire. To mark the importance of each challenge Likert scale was also provided in the questionnaire. The five scale Likert scale is “strongly agree, agree, neutral, disagree, and strongly disagree”. It is significant to consider neutral option that will give neutral space to those survey participants who are not sure about the criticality of a specific challenging factor [17].

3.2.2 Pilot assessment of survey instrument

The pilot assessment in an effective approach to improve the quality of the survey instrument. Lewis-Beck et al. [36] stated the importance of pilot assessment as “a clear and complete survey instrument is significant to collect appropriate responses”. The survey instrument (questionnaire) assessment has been conducted based on the expert’s opinion. Four qualitative software engineering experts from Western University Canada, City University of Hong Kong and Wuhan University, China were invited to assess the appropriate-ability survey instrument. The experts reviewed the questionnaire and provide suggestions and recommendations based on their experience and domain knowledge. The experts suggested to improve the presentation of questionnaire. They recommended to present the survey variables in tabular form rather than plain questions which are more confusing. Moreover, they point out multiple grammatical and spelling mistakes. We finally incorporated all the suggestions, and the updated survey questionnaire is provided in Appendix B.

3.2.3 Data sources

The data sampling and data collection are the key phases of questionnaire survey studies. Snowball technique was adopted to develop the data sample from the targeted population [33]. Snowballing is more appropriate and effective way to approach the large targeted population [6, 46]. The targeted population were approached using the professional social media networks including ResearchGate, LinkedIn and WeChat. Moreover, the authors used personal industrial contacts to distribute the questionnaire with the practitioners and further requested to share with others [27, 28, 42, 48].The data were collected from 87 survey participants during the time period February 2020 to April 2020.The survey responses were manually analysed by the first and third authors and excluded the ten incomplete responses. The final 77 responses were used for further data analysis process. The details bibliographic data of the survey respondents are provided in Appendix-C.

3.2.4 Survey data analysis

The collected responses were analyzed using frequency analysis method as it is consider the most appropriate technique for ordinal and nominal data type [28, 32]. Different other researchers used the same data analysis approach for similar nature of studies [33].

3.3 Fuzzy set theory and AHP

A fuzzy-AHP was used to rank the list of investigated challenges with respect to CRS. This section provides detail discussion regarding the key perceptions of fuzzy set theory and AHP approach.

3.3.1 Fuzzy set

Zadeh et al. [56] introduce fuzzy set theory to deal with vagueness and uncertainties of problems in real world. It also manages to control ambiguities while making group decisions. The main contribution of fuzzy set theory to represent the vague data [52]. A membership function in fuzzy setis characterized to map the objects between ‘0’ and ‘1’. The detail and definition about fuzzy set theory is given below.

Definition

A triangular fuzzy number (TFN) F is denoted by a set (fl,fm,fu), as shown in Fig. 4. The given Eq. (1) defines the membership function μF(x) of F.

$$\mu_{F} (x) = \left\{ {\begin{array}{*{20}c} {\frac{{x - f^{l} }}{{f^{m} - f^{l} }},} & {f^{l} \le x \le f^{m} } \\ {\frac{{f^{u} - x}}{{f^{u} - f^{m} }},} & {f^{m} \le x \le f^{u} } \\ {0,} & {Otherwise} \\ \end{array} } \right\}$$
(1)
Fig. 4
figure 4

Triangular fuzzy number

where, fl,fm and fu presents the crisp numbers that shows the lowest, most significant and high possible values respectively.

The algebraic operations for the two triangular fuzzy numbers (TFNs) i.e. Ť1, Ť2 are given in Table 2.

Table 2 Triangular fuzzy numbers

3.3.2 Fuzzy AHP

AHP is commonly used technique for decision making in “multi-criteria decision-making” (MCDM) problems. The AHP takes the pair-wise comparison of all alternatives with respect to selected criteria. It provides decision support tool to handle multi-criteria decision-making problems. AHP is the most widely used technique for quantitative and qualitative MCDM problems. Following are the key phases of AHP process:

  • Phase1: “Decompose the complex decision problem into the hierarchical structure” (Fig. 5)

  • Phase2: “Calculate priority vector at each level of hierarchy with the help of pair-wise comparison.”

  • Phase3: “Compute the consistency ratio of the pairwise comparison.”

  • Phase4: “Calculate the final priority weight for the factors and the sub-factors” (Fig. 5).

Fig. 5
figure 5

Fuzzy AHP decision hierarchy

However, common AHP approach has some limitations that could be covered by combining both AHP and fuzzy set theory i.e., “crisp environment, and absence of uncertainty, judgmental scale is unbalanced, selection of judgment is subjective”. Therefore, fuzzy analytical hieratical process FAHP is the most popular approach to deal with fuzziness and uncertainties in MCDM problems [5]. The FAHP deals with multiple decision-makers to capture data by handling the linguistic terms. These linguistic variables were transformed into numerical form by using TFNs scale. This approach has been considered in other engineering fields to measure vagueness of fuzzy environment [9]. In this study the fuzzy AHP approach proposed by Chang [48] has been applied which is more appropriate and consistent.

In a prioritization problem, let X = {x1, x2,…,xn} indicated the challenges of main categories as an object set and U = {u1, u2,…, un} indicated the each challenge of every category as a goal set. According to [10] approach, every object is measured, and the level of analysis of each goal (gi) is executed. Hence, for object, there are (m) extent analysis values that can be determined using Eqs. (2) and (3):

$$F^{1}_{gi} ,F^{2}_{gi} ,...,F^{m}_{gi} ,$$
(2)
$$i = \, 1,2,...,n$$
(3)

where, all Fjgi, (j = 1, 2, , m) are indicated the TRNs.

To perform the Chang’s extent analysis approach [48] the used steps are presented below:

Step 1::

The Eq. 4 is used to define the ith object of a fuzzy synthetic extent as:

$$S_{i} = \sum\limits_{j = 1}^{m} {F^{j}_{gi} } \otimes \left[ {\sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{m} {F^{j}_{gi} } } } \right]^{ - 1}$$
(4)

To achieve the expression \(\sum\limits_{j = 1}^{m} {F^{j}_{gi} }\), “execute the fuzzy addition operation of m extent analysis such as:”

$$\sum\limits_{j = 1}^{m} {F^{j}_{gi} } = \left( {\sum\limits_{j = 1}^{m} {f^{l}_{gi} ,\sum\limits_{j = 1}^{m} {f^{m}_{gi} } ,\sum\limits_{j = 1}^{m} {f^{u}_{gi} } } } \right)$$
(5)

and to achieve the expression \(\left[ {\sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{m} {F^{j}_{gi} } } } \right]^{ - 1}\), “the fuzzy addition operation is executed on” \(F^{j}_{gi} (j = 1,2,.....m)\) value, as follow:

$$\sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{m} {F^{j}_{gi} } } = \left( {\sum\limits_{i = 1}^{n} {f^{l}_{i} ,\sum\limits_{i = 1}^{n} {f^{m}_{i} } ,\sum\limits_{i = 1}^{n} {f^{u}_{i} } } } \right)$$
(6)

and finally, calculate the inverse of the vector with the help of Eq. (7):

$$\left[ {\sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{m} {F^{j}_{gi} } } } \right]^{ - 1} = \left( {\frac{1}{{\sum\limits_{i = 1}^{n} {f^{u}_{i} } }},\frac{1}{{\sum\limits_{i = 1}^{n} {f^{m}_{i} } }},\frac{1}{{\sum\limits_{i = 1}^{n} {f^{l}_{i} } }}} \right)$$
(7)
Step 2::

“As Fa and Fb are two triangular fuzzy number then the degree of possibility of” Fa = (fla, fma, fua) ≥ Fb = (flb, fmb, fub) is defined as follows. The Eq. 8 is also given below:

$$V\left( {F_{a} \ge F_{b} } \right) \, = \sup [\min \left( {\mu_{Fa} \left( x \right),\left( {\mu_{Fb} \left( x \right)} \right)} \right]$$
(8)
$$V\left( {F_{a} \ge F_{b} } \right) = hgt(F_{a} \cap F_{b} ) = \mu_{{F_{a} }} (d) = \left\{ {\begin{array}{*{20}c} 1 & {if\,f^{m}_{a} \ge f^{m}_{b} } \\ {\frac{{f^{u}_{a} - f^{l}_{b} }}{{(f^{u}_{a} - f^{m}_{a} ) + (f^{m}_{b} - f^{l}_{b} )}}} & {f^{l}_{b} \le f^{u}_{a} } \\ 0 & {Otherwise} \\ \end{array} } \right\}$$
(9)

In this context, the value of d indicates ordinate of the highest intersection point between D, µFa and µFb (Fig. 6). The values of V1 (Fa ≥ Fb) and V2 (Fa ≥ Fb) are required for determining the value of P1 and P2.

Fig. 6
figure 6

Triangular Fuzzy number

Step 3::

Determining the complete “degree of possibility of a convex fuzzy number and the other convex fuzzy” numbers Fi (i = 1, 2,…, k) can be defined as follow:

$$V(F \ge F_{1} ,F_{2} ,F_{3} ....F_{k} ) = \min V(F \ge F_{i} )$$
(10)

Assuming that,

$$d^{^{\prime}} (F_{i} ) = \min V(F_{i} \ge F_{k} )$$
(11)

for k = 1,2,…,n; k ≠ i.

The Eq. 12 is used to determine the weight vector.

$$W^{^{\prime}} = \,(d^{^{\prime}} (F_{1} ),\,d^{^{\prime}} (F_{2} ),\,d^{^{\prime}} (F_{3} ),.....d^{^{\prime}} (F_{n} ))$$
(12)

where, Fi (i = 1,2,…,n) are definite variables.

Step 4::

Equation 13 shows the normalised values of weight vector in Eq. 12. The normalised non-fuzzy valueis considered the priority weight for each challenging factor.

$$W = \,(d(F_{1} ),\,d(F_{2} ),\,d(F_{3} ),.....d(F_{n} ))$$
(13)

where W is the priority weight of a specific factor.

Step 5::

Consistency check: It is mandatory that the fuzzy AHP pairwise comparison matrixes should be consistent [51]. Therefore, it is significant to determine the consistency ratio (CR) of each matrix. The given matrices are diffuzified using the graded mean technique. For example, the Eq. 14 is used to difuzify the triangular fuzzy matrix P = (l, m, u):

$$P_{crisp\,} = \frac{{\left( {4m + l + u} \right)}}{6}$$
(14)

The final consistency ratio could be determine using Eqs. 15 and 16:

$$CI = \frac{{\lambda_{\max } - n}}{n - 1}$$
(15)
$$CR = \frac{CI}{{RI}}$$
(16)

where, λmax: “the largest eigenvalue of the comparison matrix”, n: “the number of items being compared in the matrix and”, RI: “the random index and its value can be opted from Table 3”. CI: “the consistency index, which could be calculated using Eq. 15”.

Table 3 Random consistency index (RI) with respect to matrix size

If the calculated of CR is less than 0.1, it renders that the matrix is consistent, else the matrix is not consistent and there is need the decision makers gain conduct pairwise judgements.

4 The results and analysis

The result and discussion of this study are provided in this section.

4.1 Investigations of SLR study

By carefully reviewing the collected studies, we have extracted a total of 19 challenges (Table 4). The ultimate aim of this study is to prioritize the investigated challenges with respect to their significant for code recommendation system. Though, the identified list of challenges were further mapped into three core categories (i.e., “Human resources”, “Process” and “Technology”) (Table 4). The identified challenges were categorized to develop the hierarchy structure which is needed for fuzzy-AHP analysis. To do this, the coding scheme of ground theory technique [14] was used in categorization process.

Table 4 List of explored challenges

4.2 Findings of empirical study

We identified total 19 challenges during the SLR study that were further informally classified into three different categories. The survey study was conducted to know the opinions and perceptions of experts regarding the SLR findings and formally validate the categorical classification of the challenges.

The collected survey responses were classified in positive, negative and neutral categories (Table 5), where positive category results show the percentage of survey participants who were positively agree with the SLR findings. Similarly, the negative category results illustrate the percentage of those respondents who were disagree with SLR results and the challenges classification. The natural category gives neutral option to those participants who were not sure about the impact of investigated challenge on code recommendation system.

Table 5 Empirical assessment of identified challenges

The summarized results of empirical study are presented in Table 5 that shows majority of the survey respondents are agree as the investigated challenges could have negative impact on code recommendation systems. It is noticed that the results of challenges mention in positive category are greater than 60%. We also noted that the respondents are strongly agree with the categorization of the investigated challenges (Table 5). Moreover, open ended questions were also added in the survey questionnaire in order to note additional novel challenges of code recommendation systems. However, the survey participants have not reported any new challenge, therefore we finalize the list of the identified 19 challenges for fuzzy AHP based prioritization. Different other studies also use the same data analysis approach [29].

4.3 Fuzzy-AHP analysis

This section consists of fuzzy AHP results and analysis. All the steps of fuzzy-AHP were carefully performed to determine the weight of challenges within the category and for among all the categories. The fuzzy-AHP was performed using “MATLAB R2016b programming environment” which has been executed personal computer with the specification of “Intel Corei3 3.5-GHz processor and 8-GB memory”. All the steps of fuzzy-AHP are performed and their implications are presented in this section.

Step-1: (Decomposing a problem into hierarchy structure)

In this step, we develop a hierarchy structure of complication decision making problem [4, 48]. The hierarchy structure was classified at three levels as presented in Figure as presented in Fig. 5. The key problem is presented at level-1, the categories and their respective challenges are presented at level-2 and 3. The proposed hierarchy structure is shown in Fig. 7.

Fig. 7
figure 7

Hierarchy structure of the investigated challenges

Step-2: Pairwise comparison

The objective of fuzzy AHP analysis is to rank the identified list of challenges considering their importance for CRS. To do this, the pairwise comparison was conducted to prioritize the invested challenges and their respective categories. The pairwise comparison was conducted based on the expert’s opinions. To collect the responses of experts, we have conducted fuzzy-AHP survey. We developed a questionnaire survey and approach to the participant of first survey study and out of 77 participants only 28 were agreed to participate in fuzzy-AHP survey. The developed questionnaire is given in appendix-C. Therefore, during data collection process, we have collected a total of 28 complete response from the experts. The collected responses were manually check to find the inconsistencies and uncompleted entries. We found that all the 28 response were complete, and we considered them to develop the pairwise comparison matrixes. The data size of 28 studies might not strong enough to generalize the findings of fuzzy AHP. Though, considering the existing studies [13, 48, 53], the collected 28 response are justified for the generalization of results.

To transform the survey (expert’s judgements) into TFN numbers, the geometric mean was calculated. The following formula of geometric mean was considered:

$${\text{Geometric mean}} = {\text{n}}\sqrt {{\text{v}}1{\text{x v}}2{ } \times {\text{v}}3 \ldots \ldots \ldots .{\text{vn }}}$$

where t = “Weight of each response”, n = “Number of responses”.

We have used the Linguistic variable corresponding to the fuzzy triangular values as given in Table 6. The triangular fuzzy matrix developed by [8] was used in the development of pairwise comparison matrixes.

Table 6 “Conversion scale of triangular fuzzy numbers” [8]

Step-3: Consistency evaluation

The steps adopted to determine the consistency check are presented in this section. To interpret these steps, we have used the matrix of core categories of the challenges (Table 7). We have defuzzified to crisp number of pairwise comparisons of main categories of the challenges using Eq. 14, and get the corresponding Fuzzy Crisp Matrix (FCM) as presented in Table 7:

Table 7 “Pairwise comparison of challenges categories

Step-4: Identified challenges and their categories local priority weight

  1. i.

    A numerical example

The priority vector for each category is given in Table 7. The priority weight for each challenging factor and the category was determined using Eq. 3.

Firstly, “the synthetic extent values of three categories” (human resources, process and technology) were calculated, and finally applied the Eq. 4 to calculate the priority weight of the given categories. Following are the calculations to determine the priority weight of the categories.

$$\sum\limits_{i}^{n} {\sum\limits_{j}^{m} {F_{gi}^{j} } } = \,(1,1,1)\, + (1.5,2,2.5)\, + \,(1,1.5,2)... + (0.5,0.6,1)\, + \,(1,1,1)\, = (14.1,18.2,22.8)$$
$$\left[ {\sum\limits_{i}^{n} {\sum\limits_{j}^{m} {F_{gi}^{j} } } } \right]^{ - 1} = \left( {\frac{1}{22.8},\,\frac{1}{18.2},\,\frac{1}{14.1}} \right) = \left( {0.04386, \, 0.054945, \, 0.070922} \right)$$
$$\sum\limits_{j = 1}^{m} {F_{g1}^{j} } = \,(1,1,1)\, + \,(1.5,2.5,3)\, + \,(1,1.5,2) = (5,7,8.5)$$
$$\sum\limits_{j = 1}^{m} {F_{g2}^{j} } = \,\,(0.3,0.4,0.6)\, + (1,1,1)\, + \,(0.4,0.5,0.6)\,\, = \,(2.2,2.5,3.2)$$
$$\sum\limits_{j = 1}^{m} {F_{g3}^{j} } = \,\,(0.5,0.6,1)\, + \,(1.5,2,2.5)\, + (1,1,1)\, = \,(4,5.1,6.5)$$

Equation 4 is used to calculate the synthesis values of “Human resources (HR)”, “Process (P)”, and “Technology (T)” categories:

$$\begin{aligned} HR & = \sum\limits_{j}^{m} {F_{g1}^{j} } \, \otimes \left[ {\sum\limits_{i}^{n} {\sum\limits_{j}^{m} {F_{gi}^{j} } } } \right]^{ - 1} \\ & = \,(5,7,8.5) \otimes \,\,\left( {0.04386, \, 0.054945} \right) = \left( {0.219298, \, 0.384615} \right) \\ \end{aligned}$$
$${\text{P}} = \,(2.2,2.5,3.2) \otimes \,(0.04386, \, 0.054945) = \left( {0.096491, \, 0.137363} \right)$$
$$T = \,(4,5.1,6.5) \otimes \,(0.04386, \, 0.054945) = \left( {0.175439, \, 0.280220} \right)$$

Equation 6 is used to calculate the degree of possibility and the minimum degree of possibility is determined using Eq. 8, that specifically presents the priority weight of the categories.

Hence, we have calculated the weight vector as: W = (1, 0.030016, 0.69837) (Table 8). By normalizing the values the significance of attributes were determined as W = (“0.4789, 0.01435, 0.3337”). The determined results shows that human resources category is declared as the most important category to the investigated challenges as it gain the highest priority weights compared with other challenges categories.

  1. ii.

    Consistency check

Table 8 Results of V values for criteria

In fuzzy-AHP, every pairwise comparison matrix should be consistent. In order to present the consistency check procedure, an example of consistency check is presented using the core categories of challenges (Table 9). For consistency check, the Eq. 14 and the determined FCM table was used.

Table 9 Fuzzy Crisp Matrix (FCM) for challenges categories

We further determine the column sum of each FCM matrix aiming to calculate largest Eigenvector (λmax) value. Each value of FCM matrix is divided by its respective column sum to develop the normalised matrix (Table 10). Similarly, the priority weight of the categories is determined by taking average of their respective rows (Table 10).

$$\lambda_{\max } = \, \Sigma \left( {\left[ {\Sigma \, Cj} \right] \, \times \, \{ W\} } \right)$$
(17)

where, ƩCj = column sum [C] (Table 7), W = priority weight (Table 10), therefore λmax = 3.6*0.37938 + 5.5*0.14945 + 3.2*0.27593 +  = 3.0707.

Table 10 Normalized matrix of challenges main categories

The total number of elements are (n = 4), therefore the value of random index (CI) is 0.9 (Table 3). The final values of consistency index (CI) and consistency ration (CR) are respectively calculated using Eqs. 15 and 16

$$CI = \frac{{\lambda_{\max } - n}}{n - 1} = \frac{3.0707 - 3}{{3 - 1}} = {0}{\text{.035553}}$$
(15)
$$CR = \frac{CI}{{RI}} = \frac{{{0}{\text{.035553}}}}{0.58} = {0}{\text{.061}}$$
(16)

The value of CR is 0.061 which is < 0.10 and it is presented that the pairwise matrix of challenges categories is consistent. Using the same steps, the consistency for all the pairwise comparison matrixes are determined as given in Tables 11, 12, 13, 14 and 15, respectively.

Table 11 Pairwise comparison of ‘human resource’ category
Table 12 Pairwise comparison of ‘process’ category
Table 13 Pairwise comparison of ‘technology’ category
Table 14 Pairwise comparison of in between the categories
Table 15 Local and global weights

Phase 5: Calculating the weight of challenges

The weights of the identified challenges were calculated aiming to determine their ranking with concerning to their significant for code recommendation systems. Hence, to calculate the rankings of the challenges, we have determined the local and global weights as presented in Table 15.

The local weights were calculated to determine the local ranking of each challenge. The local ranking renders the importance of a challenge within their category. The local ranking helps the practitioners to consider the most important challenges within a particular category. For example, CH3 (“Possible solutions have to be compared”, LW = 0.420) is ranked as the highest priority challenge in human resource category. This indicated that to successfully address the human resources domain in code recommendation systems, the practitioners should consider CH3 on top priority. Moreover, the results shows that CH1 (“Lack of software engineering knowledge”, LW = 0.241) and CH2 (“Lack of monitoring and management”, LW = 0.180) are ranked as 2nd and 3rd most important challenges of code recommendation in human resource category. The local ranking assists to fix the challenges concerning to their significance in a particular category.

We also determine the global ranking, by calculating the global weights of each challenge. The global weight was calculated by multiplying the local weight of each challenges with the weights of their respective categories. For example, the global weight (GW) of CH1 is GW = 0.241 × 0.37938 = 0.759. Though, based on the determined global with of CH1 it is ranked as the 11 most important challenge for the efficiency of code recommendation system. The results presented in Table 15, shows that CH12 (Lack of data implementation analysis, GW = 2.207) is ranked as the highest priority challenge for effective and efficient execution of code recommendation system. We further observed that CH16 (“Poor functions design with respect to reusability”, GW1.932) and CH4 (“Lack of multiple facts adoption”, GW = 1.897) are ranked as the top priority challenges for recommendation systems. The global weights assists to address the most priority challenges for the efficient execution of code recommendation systems.

Phase 6: Taxonomy of challenges

We further developed the taxonomy identified challenges by considering their categorization and their prioritization (Fig. 8). The prioritization-based taxonomy of the investigated challenges is develop using the local and global rankings of each challenge and their core categories. The developed taxonomy of the challenges shows that impact of a particular challenge within the category (local rankings) and for overall code recommendation systems (global ranking). The results shows that CH1 (Lack of software engineering knowledge) is ranked as 2nd in human resource category and standout 11th most significant challenge in overall ranking. This shows how impact of a particular vary with respect to local and global ranking. Similarly, it is observed that CH3 (“Possible solutions have to be compared) is ranked as 1st most priority challenging factor in human resource category and it ranked as 16th according to the global ranking. This variation between the priorities of challenging factors assists the practitioners to consider the most important factor with considering their receptive category and by considering the overall impact of a challenge on code recombination system. To conclude, the developed prioritization-based taxonomy assist the researchers and practitioners to consider the most significant challenges with respect to their influence on code recommendation systems.

Fig. 8
figure 8

Prioritization based taxonomy

5 Study implications and limitations

Implications: this study enlisted the challenges reported in the literature that could hinder the efficiency and effectiveness of code recommendation system were explored. The investigated challenges indicated the key areas that need to be addressed for the efficiency of code recommendation systems. The empirical investigations shows that the identified challenges are important and need special considering for the effectives of code recommendation systems. Moreover, the explored challenges were classified and the fuzzy-AHP was applied to prioritize the explored challenges concerning to their significance for code recommendation systems. The categorization and fuzzy-AHP analysis provide the prioritization-based taxonomy of the investigated challenges. The developed taxonomy provides the body of knowledge to academic researcher’s community to develop the new techniques for the effective code recommendation systems.

The results of this study also have practical implications as the prioritization-based taxonomy of the identified challenges educate the real-world practitioners to consider the most critical challenging factors on priority basis. Furthermore, the identified challenging factors gives the direction to the practitioners to focus on the most critical areas and develop the new strategies for the development of an efficient recommendation system.

Limitations: Besides, there is a chance of researcher’s baseness in data extraction process and the researchers might continually extract wrong data. To address this threat the inter-rater reliability text was conducted, and the results show that there is no baseness in the extracted data. The execution of search string on the selected database might lead towards the incomplete data collection. Therefore, based on the exiting studies [26, 28, 42], this omission is not systematic. Similarly, the small sample size of response for fuzzy-AHP analysis is a critical threat towards the findings of this study. As the fuzzy-AHP is a subjective study, though the results based on small data size can generalizable.

6 Summary of research findings

The aim of this research study is to identify the factors that could be potential challenges for the code recommendation system. This objective has been achieved by conducting the SLR and empirical study to explore the key challenges of the code recommendation system. Using SLR approach, total of 19 challenging factors were identified that could be critical for the code recommendation system. Additionally, the survey questionnaire study was conducted with the code recommendation systems experts to know their perceptions regarding the identified challenges. The survey results indicate that the identified challenges are significant for code recommendation system. Finally, the reported challenges and their categories were prioritize using the fuzzy AHP approach. The given prioritization provides a roadmap to the researchers and practitioners who work on code recommendation system projects. It could be used to tackle the key challenges that could hinder the code recommendation system projects. The summary of the findings against research question are briefly discussed in Table 16.

Table 16 Summary of study findings

7 Conclusion and future direction

The importance of code recommendation system inspired us to identify the challenges that could hinder its effectiveness and efficiency. Using the systematic literature review, 19 challenges were investigated that are reported in state-of-the-art literature and were “mapped into three core categories of” system process improvements. The investigated challenges and their categorization were further verified conducting the questionnaire survey study. The “results of questionnaire survey study” shows that the identified challenging factors of code recommendation system from the literature are practical oriented. Finally, we performed the fuzzy-AHP analysis to prioritize the investigated challenges and their categories. The results of fuzzy-AHP analysis shows that “Lack of data implementation analysis”, “Poor functions design with respect to reusability”, “Lack of multiple facts adoption”, “Rapidly occurrence of changes” and “Lack of monitoring and management” are declared as the highest priority challenges for code recommendation systems. The identified list of challenges, their categorization into core three categories and their priority rankings provides the robust taxonomy that render that impact of a particular challenge within their respective category and for overall code recommendation systems. We are confident that the results and analysis of this study will be contributed towards the improvement and development of new techniques for the effective and efficient code recommendation systems.

The next phase of this research project will be based on the multivocal study in which we will identify the challenging factors discussed in grey literature and formally published literature studies. Moreover, the success factors will be identified that could positively influence the code recommendation system.