1 Introduction

Higher education has traditionally been notorious for the inefficient use of data to improve the quality and the value of graduates in meeting market needs. The amount of data available to higher education sector surpasses their utilization opportunities, in the wake of big data field emergence. Managing, processing and analyzing these large volumes of data prompted the establishment of learning analytics (LA) field to improve learning processes (Ferguson 2012); and to bridge the gap between education and industry. Higher education graduates are expected to develop a set of qualities or dispositions necessary for building up their competency in embracing change, and making informed career decisions.

Developing such a career-readiness capacity requires a sustained and progressive growth of professional habits. We propose a model that nurtures these dispositions, alongside the formal academic path. Community of practice (CoP) is an alternative informal way to achieve this aim (Jakovljevic et al. 2013). CoP fosters a new form of apprenticeship as students observe and emulate mentors, while engaging in a “learning to be” cycle to master the skill of a field. The practices and norms from established practitioners in the field through process fosters the acquisition of apprenticeship experiences. In this research work, we introduce a framework to improve career readiness and enhance the career’s success prospects of learners in higher education institutions. Our framework bridges the gap between higher education and industry through an online social structure made up of interconnected CoPs. This structure extends the perspective of educational institutions and develop a joint effort with the industry to leverage education and workforce development. The proposed approach also provides indicators and means for institutions to intervene in order to positively affect career readiness. It also incorporates a persuasive model to incite guide learners adopt career paths that are currently demanded by the industry. Our framework incorporates three major modules as illustrated in Fig. 1:(1) career readiness, to assert professional dispositions, (2) career prediction to identify a domain of employment, and (3) career development to persuade learners sustaining the identified career path.

Fig. 1
figure 1

Career readiness framework

After the initial formation of CoPs, we build a classification model, which considers each CoP as a class and each learner belonging to a CoP as a member of the corresponding class. The learners’ profiles and other attributes, along with the CoP membership information are then used as training data for a classifier. The classification model is highlighted in grey in Fig. 2 as it is outside the scope of this paper. Career development phase aims at constructing CoP-network with a dense community structure to connect learners who belong to the same CoP based on their social similarity. Learners in such networks can benefit from the power of collaborative knowledge towards achieving common career goals and objectives. This module uses social network analytics (SNA) to enhance the properties of ties and how learners develop and maintain these relationships to support their career development. SNA metrics are used to evaluate social influence propagation in the CoP-network; and to identify most influential learners based on their structural position in the network. Those learners are then employed as “persuasive agents” to drive behavioral changes and persuade learners to adopt career paths demanded by the industry.

Fig. 2
figure 2

A high level implementation model of proposed solutions

As a case study consider the global transformational trends induced by data sciences shift. Employers want to influence this shift within education to meet foreseeable workforce needs across derivative disciplines. In Middle East market particularly, computer science and information technology (IT) graduation and post-graduation qualifications, are highly emphasized when looking for suitable candidates (Bayt 2015), even-though recruiters want to point-out specific data science areas (including data modeller, data analyst and business intelligence developer), and related derivatives (Indeed 2015; Sealey-Morris 2014). It is interesting to observe how the title “analyst” or the skill-term “analytics” appear almost in every job post. One more interesting observation is the emergence of new IT job titles: social medial director and chief data officer (Sealey-Morris 2014). Using this type of input from industry market, we analyze typical higher-educational programs to map our CoP-based apprenticeship model. In our College of IT at UAE University, learners are expected to complete formal education requirements while preparing for a career-in-demand using our proposed apprenticeship model. A Bachelor degree in Computer Science for example includes a range of courses such as programming, operating systems, database systems etc., which are generic qualifications. Our model complements this scenario by immersing learners into areas of professional focus that shape market demands for next years (upon students graduation), at an early stage of their study journey, as illustrated in Fig. 3.

Fig. 3
figure 3

The CoP-apprenticeship model: IT scenario

At the first phase, and as an initial stream of generated data sources, College learners fill-out a career profile to capture initial information about current competencies, qualifications, skills; and also list-down potential career interests. Learners also complete the career readiness instrument in order to evaluate their career dispositions and identify the gaps towards prescribed career aspirations. The provided instrument results in a storehouse view of career dispositions relevant to the general IT practitioner through an integrated portal which captures self-stated learning experiences and converts them into analytical diagnosis that root-out deficiencies and prescribe improvement recommendations.

In the subsequent step, the career prediction module allocates CoP memberships to connect learners who share similar career interests. These interests are analyzed against the market needs in order to map CoP construct to a certain career path (or overlapped paths) that is leads to highly-demanded careers (i.e. data science). Each CoP matches a particular professional career and is linked to competencies of that profession. These competencies are extracted from job descriptions and analyzed against career profiles to evaluate the competency gaps and identify priorities for learning development. A similar motivation was suggested in Gaeta et al. (2016) where training activities inside CoP are integrated within the educational-career path. These activities are designed to extend learning objectives beyond formal education and more into employment-market along a personalized learning context to achieve those objectives. Each CoP is assigned academic mentors and/or industry professionals to drive the community towards attaining career-oriented learning objectives. The mentor and the initial members create the structure and operational processes of their community. This includes sharing experiences and learning resources to sustain the career development within the community in a collaborative effort. To support learning relationships, we reinforce learners’ s ties via social similarities while participating in CoP-based social network. The members’ constant interactions within and outside CoP create a dynamic knowledge container and a repertoire of shared practices and experiences. Our dynamic CoP lifecycle evolves to more focused subdomains by forming a sub-CoP of learners who are “influenced” to steer their initial major into sub disciplines (i.e. data modeler, data analyst, business intelligent analysts, data chief officer), forming new CoPs as illustrated further in Fig. 3. As the community thrives, learners develop domain practices, that lead to further recognize and reach-out other potential members (outside their CoP) who may be driven to migrate to their CoPs, for e.g. statistics, semantic web, data governance.

The remaining sections of this paper are organized as follows. First, in Sect. 2, we provide some background and discuss some related works in the context of career disposition. Then, in Sect. 3, we discuss learning analytics for career dispositions, and reveal in Sect. 4 our predictive analytics approach to CoP formation. Section 5 extends this structure into a broader social network where we integrate our influence propagation approach in the context of our persuasive learning framework. Section 6 shows performance evaluation indicators and results of the approaches discusses in the paper and Sect. 7 concludes the paper with a summary of contributions and suggested future works.

2 Background and related work

2.1 Personality traits and career success

Several research studies relate personality attributes to aspects of career success (Boudreau et al. 2001; Sutin et al. 2009). In particular, the literature on career success identifies five categories of personal traits (termed Big Five personality factors) that influence the general mental ability for career success (McCrae and Costa 1991; Judge et al. 1999; McCrae and John 1998; Seibert and Kraimer 2001; Zimmerman 2008). The contributions of the established Big Five dimensions of personality to career success have been examined by increasing number of studies in the field of human development psychology. For example, conscientiousness was found to be the most consistent predictor of career advancement (Howard and Bray 1990); and that it is related positively to job performance (Salgado 1997). Extraversion was also found to positively affect job and life satisfaction (McCrae and Costa 1991), as well as career promotions (Barrick et al. 1993; Salgado 1997). On the other hand, a negative relationship was correlated between high neuroticism and negative job performance (Salgado 1997), and hence subsequent career dissatisfaction (Furnham and Zacherl 1986).

The need for certain personality traits of career success has been addressed in our research to develop standard criteria that emerge from proactive behaviors in order to instill positive habits or dispositions. In doing so, we noticed a substantial gap between career success and lifelong learning literatures, although 21st century successful professionals build their career path following a sustained lifelong learning aptitude known as learning power (Deakin Crick 2007). These dispositional traits are most likely dominated by the earliest experiences acquired in the family, school and college (Bourdieu 1977; Brunello and Schlotter 2011). Earlier beliefs that personality traits that shape individuals’lifetime are a matter of fate do not apply (Roberts and Mroczek 2008; McAdams and Olson 2010).Therefore, the development of traits that support career success in higher education institutions—or even earlier in schools-appears as prominent to reshape students’ prospective traits to manage their future professional growth successfully.

2.2 Learning in social communities of practice (CoPs)

Community of practice (CoPs) is defined as “computer-mediated discussion-form focused on problems of practice that enable individuals to exchange advises and ideas with others based on common interests” (Wasko and Faraj 2005). It is an online CoP, although we refer to our computerized method as a CoP in general. CoPs play a central role in knowledge management (KM) strategies and collaborative learning, and are perceived as an effective mechanism for knowledge creation and exchange. Knowledge in its turn is central to formal education and professional practice. KM literature differentiates between knowledge sharing and knowledge exchange, whereby knowledge sharing can occur in a one-way broadcast form, whereas knowledge exchange occurs at dyadic level and indicates a reciprocal relationship (Pan et al. 2015), hence the social ties within and outside CoPs. Knowledge is also often divided into two distinct entities: explicit knowledge (knowing that) which refers to possession of information and facts; and tacit knowledge (knowing how) which refers to procedural and application form of knowledge (Curran et al. 2009). CoPs are particularly contributing to the creation and exchange of tacit knowledge, whereas formal education tends to focus more on explicit knowledge.

The theory of CoPs lies at the intersection of knowledge transfer and learning processes; and it has become more widespread in higher education due to the benefits derived from collaborative generation of knowledge and cooperative learning activities within and outside the classroom (Jakovljevic et al. 2013; Jawitz 2007; Tight 2004; Mandl et al. 1996). Johnson noted “the learning evolved form these communities is collaborative, in which the collaborative knowledge of the community is greater than any individual knowledge” (Johnson 2001). It is also argued that individual learning is enhanced through engagement with others enabling the extension of the individuals’ capability to a new and higher level (Vygotsky 1980). Learning in CoPs is recognized as “ situated learning” that is defined as knowing how to be in practice rather than knowing about the practice (Lave and Wenger 1991; Wenger 1999; Brown et al. 1989). This involves the process of formation of the individual identity as becoming a member of the community and participate in knowledge development. Supported with the sense of connectedness, knowledge development within CoPs can be continuous, and fluid in a cyclic pattern (Koliba and Gajda 2009). Learning in CoPs also occurs within the context of the cyclical process of DDAE: dialogue, decision-making, action and evaluation (Mindich 2011). Examining DDEA dynamics in CoPs draw a clear link between the group’s capacity to evaluate its current practices and learn through this evaluation by talking about it (dialogue), make decisions based on this discussion, and subsequently implement this decision into action (Koliba and Gajda 2009; Gajda and Koliba 2008). Learners participating in CoPs engage and interact to gain knowledge and skills from community members, partly with those who are positioned as mentors or field experts. This view reiterates the apprenticeship model to complement learning in higher education (Warhurst 2003).The demands to extend abstract bodies of knowledge taught in formal higher-education with CoP-based learning is dictated by the need to equip learners with skills that make them immediately ready to embrace real-world problems upon graduation in order to reduce the current education-market disparities (Jakovljevic et al. 2013; Tynjälä et al. 2003).

Contemporary social networks (SNs) can be employed to build online CoPs within higher education context (Gunawardena et al. 2009; Zhang et al. 2010). Recent research indicates a substantial value of SNs in strengthening student-to-student interactions, enhancing student social engagements, and building campus communities toward improving student learning (Davis III et al. 2012). Facebook, one of the most powerful SNs, enhances the connectedness as well as social learning in higher-education settings (Baran 2010; Qureshi et al. 2015; Selwyn 2009); and information-sharing for knowledge development and innovation (O’Brien and Glowatz 2013). However, seldom has research tapped into the emergence and cultivation of a social structure that emphasizes learners and the network in which this learner navigates in support of his or her professional career practices (Gray et al. 2010).

3 Learning analytics for career dispositions attainment

3.1 Career dispositions

Big Five model identifies five categories of personal traits that influence career success; while lifelong learning dispositions make up the individuals capacity for developing lifelong learning attributes. Career dispositions emerge as the joint set of attitudes and generic skills that engenders professional behaviors; and influences the ability to adapt and respond to changing work situations and environments. They describe the natural tendencies, mind state and preparations of each individual towards a professional practice. We model career dispositions as a 6-dimensional construct captured from the Big Five and Learning Power to comprise (See Fig. 4): (1) Openness to challenge (OC) “the degree to which an individual open to new ideas and experiences”; (2) Critical Thinking (CT) “the degree to which an individual is investigative, attentive reader/listener, inquisitive, analytical and an evidence-based decision-maker”; (3) Resilience (R) “the degree to which an individual is conscientious, determined, assertive and achievement-oriented”; (4) Learning Relationships (LR) “the degree to which an individual is cooperative, expressive, agreeable and social oriented”; (5) Responsibility for Learning (RL) “the degree to which an individual is dependable, autonomous, motivated, organized and punctual”; and (6) Creativity (C) “the degree, to which an individual is intellectual, imaginative, adventuresome, curious and original”.

Fig. 4
figure 4

Career dispositions

In order to measure career disposition data values, we developed and validated a self evaluation report that is conceptually underpinned by constructs from career success and lifelong learning dispositions literature following our blended approach to career dispositions (Khousa and Atif 2014). Based on this scale, data is routinely captured through the proposed instrument to elicit quantitative reflections of career disposition via multidimensional infrastructure as we will present next.

3.2 Multidimensional career profile

We extract a structure to determine a career readiness construct, labelled career profile (Fig. 5). This construct is designed as a standard mean to collect and access information about learners while they are moving towards a predestined career path. Career profile augments an existing IEEE learner information package (LIP) standardFootnote 1 to capture learning data as well as career indicators. Our proposed construct of career profile is structured into three main categories aimed at predicting and assisting learners with their career development throughout their formal education. We use LIP-defined interests, competency, and goal, categories to specify career interests, domain-related qualifications, and long term career objectives of individual learners. We differentiate two types of interests: career interests and social interests. We also introduce a new category labeled professional as a slot for career dispositions ratings and other generic attributes pertaining to career readiness. The multidimensional data attributes reflecting the professional aptitude, career prospects and dispositions of a learner are used to detect a CoP, where members share knowledge, experience and passion for a predicted practice to build capabilities and maintain momentum.

Fig. 5
figure 5

Career profile structure

The multidimensionality conceptual view allows better understanding and analysis of data in terms of the subjects (facts) and the different and range of views where a subject can be analyzed from (dimensions). Each dimension is associated with hierarchal levels which contain consolidated data or descriptors; while a fact contains measures (also known as variables or metrics). One fact and several dimensions to analyze it define the data cube or a simple aggregation function (Franconi and Kamble 2004).

4 A predictive analytics approach to community of practice (CoP)

We propose a semi-supervised clustering algorithm as a predictive analytic method to assign learners into common virtual CoPs according to their career interests. The proposed method uses a fuzzy-logic objective function to address issues pertaining to overlapping domains of career interests.

4.1 Semi-supervised clustering analysis

Clustering methods that utilize any side information are said to be operating in a semi-supervised mode (Chapelle et al. 2006). One of the most common methods to specify the side information are in the forms of: (a) pairwise constraints where set of must-link and cannot-link specifies weather point pair connected by the constraint belong or not belong to the same cluster (Davidson and Ravi 2005); and (b) seeding, where some labeled data is used along with large amount of unlabeled data for better clustering (Gu and Lu 2012). Must-link denoted by \(c=(x,y)\) and cannot-link denoted by \(c {{\ne }}(x,y)\), meaning that two instance x and y must be in the same cluster or cannot be in the same cluster, respectively.

4.2 Proposed fuzzy pairwise-constraints K-means (FPKM)

We propose a clustering-based career prediction model that analyzes data from Career Profile in order to predict a hypothetical career practice and bring learners with similar career patterns together into a common cluster. This process leads to a social structure made up of CoPs that are identified to specifically respond to imminent industrial needs. To solve the cold-start problem of CoP construction, we use the career readiness data warehouse as a source for initializing groups (or clusters) of learners and denote each such cluster as a CoP. In order to conduct this initial grouping process, we apply a semi-supervised clustering technique that brings a seed set of learners into an initial set of CoPs . The seed set consists of learners who achieved high scores in career disposition values that are above a given parameter threshold. There is typically at least one seed member in each cluster (CoP) for which his/her career profile matches the definition suggested by the career ontology that yielded the CoP. The rationale of privileging highly ranked learners in their career dispositions to create dedicated CoPs is driven by the prospects to sustain CoPs. From this initial stage, we infer the use of career disposition values only to provide seed set of new CoPs (including the initial ones).

One challenging problem occurs when and whether a violation of the link constraint should be penalized. In traditional semi-supervised clustering algorithms, a violation of the link constraint is always penalized. Now, as we allow the instances to be associated with multiple labels, a constraint can be violated legitimately. Thus, we re-designed the penalty function of constraints K-means algorithms to allow fuzzy labeling and to estimate if a constraint violation could be legitimate or not. Accordingly, we develop the fuzzy pairwise-constraints K-means (FCKM) algorithm that is presented and evaluated on simulated data in our previous publications (AbuKhousa and Atif 2016); Khousa et al. 2015). The main objective of the FCKM algorithm is to assign learners of overlapped interests to multiple clusters or CoPs. In this paper, we stimulate data from a real world scenario to apply our clustering method in addition to the typically benchmarked data sets.

5 Social learning analytics for persuasive learning

Social learning analytics (SLA) is a distinctive subset of LA, which highlights the social perspective of learning. SLA draws on the significant educational research work evidencing that new skills and ideas are developed and passed on through interactions and collaboration; and that learning cannot be understood without reference to context. As a group of learners engaged in a joint activity, their success is related a combination of individual knowledge and skills, environment, use of tools and ability to work together (Wells and Claxton 2008). In this paper, we mainly focus on social network analysis (SNA) to analyze the social influence power in CoP-network to achieve persuasive learning. we evaluate social structural regularities that influence individuals’ behaviors and actions (Otte and Rousseau 2002).

Social influence is defined as a “change in an individual’s thoughts, feelings, attitudes, or behaviors that results from interaction with another individual or a group” (Rashotte 2007). Users influence each others to form different networking structures or crowds which directly, explicitly or implicitly, promote information diffusion. Given a SN represented by a graph \(G=(V,E)\) with the set of vertices V representing individuals and the set of edges E representing links among individuals, if any node \(v_{1}{ \in }V\) , replicates the action of another node \(v_{2}{\in }V\), we may assume that \(v_{2}\) has influence on \(v_{1}\) . When the influential node performs an action, nodes connected to it will replicate the action starting information propagation. The influence of a node on other connected nodes can be due to some external factors like trust; the popularity of the information or action or to the prestige or celebrity of the influential node. The relational structure in the network (i.e. role and structural position of individuals) also plays an important part in determining how influential an individual is, in this network (Friedkin 2006; Hu et al. 2015). Therefore, understanding the structure of SNs provides important further insights into how individuals influence each other’s behavior.

A very important task for the maximization of diffusion in SNs is to identify the influential nodes or the adopters who can exploit social network effects. Researchers in social influence target to identity a small set of influential individuals (referred to as a seed set) who can ideally maximize the influence across the entire network in minimal time (Fu et al. 2014). The problem of influence maximization can be expressed as follows: “given a network with influence estimates, how to select an initial set of k users such that they eventually influence the largest number of users in the social network” (Goyal et al. 2010). Given a SN, a positive integer k, and information diffusion model, the goal is to find such a target set \(A_{k}^{*}\) of k nodes that maximizes the expected number of adopters of the information if \(A_{k}^{*}\) initially adopts it. The expected number of nodes influenced by a target set is referred to as its influence degree, and this combinatorial optimization problem is called the influence maximization problem of size k (Kimura et al. 2010).

On similar research direction, persuasive technology emerges as a recent proposal to alter the mindset, attitudes and behaviors of individuals through technologies which create opportunities for persuasive interactions such as those enabled by Web and SNs (Atkinson 2006). The main process used in the psychology of public persuasion involves: (1) collecting facts and opinions of the public and the object of interest; (2) applying diagnostic procedures and statistics to interpret this collected data, and (3) applying various techniques of persuasion to guide the target group towards adopting desired ideas or behaviors (Bernays 1928). In the persuasion paradigm, influence is presented as a detailed argumentation delivered to individual recipients which impact is limited to minimal social interactions. Social influence—in contrast—includes simple information (e.g. “Likes” in FaceBook) about the source’s behavior in the network, and delivered across wider social interactions (Wood 2000). A key construct for research in the filed of persuasive technology in related to public persuasion is the Behavioral Change Support Systems (BCSSs) defined as an “information system designed to form, alter or reinforce attitudes, behaviors or an act of complying without using deception, coercion or inducements” (Oinas-Kukkonen 2010). By definition, BCSS may utilize either computer-human persuasion or computer-mediated persuasion to achieve three outcomes: (1) reinforcement of current attitudes or behaviors; (2) changing an individual’s response to an issue; and (3) shaping attitudes and behaviors by formulation of a pattern for a situation when one does not exist beforehand. Computer-human persuasion utilizes some patterns of interaction similar to social communication, whereas computer-mediated persuasion means that individuals are persuading others through computers (e.g. discussion forums, instant messages, blogs, or SNs) (Oinas-Kukkonen and Harjumaa 2009).

In our work, we first construct a CoP-network of a dense community structures; then we apply SNA to select the most influential individuals in this network to be hired as “persuasive agents” to support desired career adoption in higher education.

5.1 Reciprocal-weighted euclidean to construct CoP-network

We propose a reciprocal-weighted euclidean similarity function (RWD), which is inspired by the hybrid weighted euclidean function and existing research works on matching user profiles of a network (Carbonell et al. 2014; Raad et al. 2010), to consider self-assigned weights and position orders to profile attributes, when determining similarity measure across profiles. Let L and U be two learner profiles represented by n-dimensional attribute vector, \(L=(l_{1},l_{2},\ldots ,l_{n})\) and \(U=(u_{1},u_{2},\ldots ,u_{n})\) depicting n pre-established measurements made associated with the learner from n attributes, respectively \(A_{1},A_{2}, \cdots A_{n}\) which represent common social interests. RWD calculates similarity or (the match score used later) of L and U as follows:

$$\begin{aligned} RWD(L,U)=\sqrt{\sum _{i}^{n}rw_{i}l_{i}-rw_{i}u_{i})^{2}} \end{aligned}$$

Where \(rw(l)=(w_{1},w_{2},\ldots ,w_{n})\) is the weighting vector assigned to learner vector L; and \(rw(u)=(w_{1},w_{2},\ldots ,w_{n})\) is the weighting vector assigned to learner vector U. Using hierarchies as suggested in Carbonell et al. (2014) is the simplest and most convenient method to allow learners to assign order and weight to their attributes by including it in their Career Profile then measure the attributes or interests of a given learner using RWD. In such representation, interests may be associated with an interest level, such as a numeric or graded rating that represents the strength or weakness of the learner’s interest in a particular topic. Our proposed reciprocal-weighted interest accumulation networking (RWIAN) algorithm (see Algorithm 1) constructs a CoP-network that employs this profile structure to generate vector attributes of learners within a common CoP, resulting in an overall CoP-network. This process calculates the social similarity using RWD function to generate the similarity matrix (SM) in order to establish (weighted) links between learners with highest similarity.

figure a

5.2 Triadic closure to enhance community structure in CoP-network

Transitivity of a relation means that when there is a tie between l and u in a network , and also between u and v nodes within the same network, then there is also a tie between l and v. Strong ties that connect individuals with similar attributes are more often transitive than weak ties. Similarity and transitivity together lead to the formation of cliques (fully connected clusters), which enhance the network structure and community detection as internal edges of community likely to form cliques while inter-community edges unlikely to form cliques. Thus, we propose a triadic-closure approach based on social interests and career interests as presented in Algorithm 2 to enhance the constructed (RWIAN) social networks among learners. In our research work scope, we aim at introducing a systematic approach to build dense CoP-based SNs in which learners can benefit from the power of collaborative knowledge towards achieving career common goals and objectives. In our proposed approach, we filter the initially constructed CoP-network by removing edges of first similarity score below a given threshold \(\left( S>=t\right)\), and use a second similarity measure as a decisive metric to form the triadic tie if it holds, and only if l and v nodes have strong common interests that are different than the initial interests used to construct the network . For example, if we use Career Interest as the first similarly measure \((S1=Career-Interests)\) to construct the (RWIAN) network: \(G_{w}=(V,E)\), where V is a set of learners and \(E\subset V\times V\) is a set of relationships connecting learners of highest S1. Three users (luv), where links between l and u \((e_{l,u}=1)\) and between u and v \((e_{uv}=1)\) exist, form a candidate open triad OT(luv). We then use Social Interests as the second similarity measure \((S2=Social-Interests)\), to study whether the OT(luv) will (or not) become a closed triad CT(luv) to establish a link between l and v \((e_{lv}=1)\). The strength of the newly established link \(e_{lv}\) is then a result of a combination of S1 and S2.

figure b

5.3 Persuasive approach for behavioral adoption

In this section, we describe the underlying structure of our proposed persuasive approach for behavioral change support system for career-adoption (BCSS-CA). We propose to start the persuasion process by identifying the most influential individuals in the CoP-network to be hired as persuasive agents. Persuasive agents be will treated by different persuasive strategies according to their “persuasive profile” to get them to adopt the desired career path. Pervasive agents are then expected to leverage their powerful structure position in the CoP-network to influence other learners to adopt same career paths as required by the persuasion source (i.e. human resource planning authority , recruitment agency).

Our proposed BCSS-CA model aims to stimulate persuasive agents to adopt certain careers based on the recommendations of local human resource authorities that oversee the market needs. The model is a part of career development interactive process which continues to monitor, analyze, and process career-related information; and periodically provides feedback to learners to keep them up-to-data about career opportunities and the special skills set required in the areas of career interests. The designated authorities target first to hire this group of “persuasive agents” to help in influencing other learners to adopt the desired career path. The performance of persuasive agents to get more learners adopting the desired career is monitored by the persuaders in order to provide frequent feedback. This is in alignment to the original process of developing the tacit knowledge and professional skills required for this career. How to stimulate key users to be persuasive agents is discussed next.

5.4 Persuasive agents

Identifying persuasive agents involves creating the “personalized persuasive profile” to develop an understanding of individual behavior in response to persuasion; and so use suitable means to persuade each agent into adopting the desired career path. They will then naturally influence others in their circle to follow. Typically, the persuasive profile will include measures of “persuasive susceptibility” ; i.e. collections of expected effects of different influence strategies or principles for a specific individual such as authority, liking, and reciprocity. Having the persuasive profile of each agent contributes also in deciding the right motivator, right trigger and right time to persuade him or her. Persuasive agents can then be categorized into different behavioral modalities for the source to design tailored persuasive techniques or messages.

The Fogg behavioral model (FBM) (Fogg 2009) states that for a target behavior to happen, an individual must have sufficient motivation, sufficient ability, and an effective trigger. All three factors myst be present at the same time for the behavior to occur. Most specifically, as individual has increased motivation and increased ability, the more likely it is that he or she will perform the target behavior. The FBM model provides us with the insights to use the learners of highest career dispositions as the primary “persuasive agents” as they accordingly will have the required levels of motivation and ability to adopt the target career. Then, based on their structural position in the CoP-network, they will be able to influence other individuals—who should also have some non-zero-level of both motivation and ability—to follow them. As illustrated in Fig. 6, we apply FBM in our design approach by identifying motivators; abilities and triggers as follows:

  • Motivators Persuasive agents are selected so that they already have the strongest desire to pursue a successful career path. Their structural position in the network fulfills their need to feel they are unique (independence) and to have influence on others (power).

  • Abilities We differentiate two set of abilities for our persuasive agents: (a) career dispositions; (b) personalized persuasive profile, and (c) powerful network structure. We need the persuasive agents to be able to respond and adopt to the change in career needs as soon as it is delivered from the persuaders. On the other hand, personalized persuasive profile combines the insights of personal traits and persuasion principles to personalize the persuasive intervention in a way that the messages, timing, interfaces, the persuasive strategies and other factors of the BCSS are tailored to one specific individual or agent.

  • Triggers We identify three types of triggers: (a) interaction with industry, (b) feedback, and (c) recommendations and suggestions. Interactions with industry either via CoP-networks or through industrial insights presented in an analytics dashboard serve as triggers to encourage individuals to adopt future careers when they most in demands. Another feasible approach to prompting behavioral changes in adopting career paths is to monitor career adoption development; and provide real-time feedback to the individuals. We may enhance the strength of feedback sing instantly processed and visualized results driven from the CoPs monitoring mechanisms. As for recommendations and suggestions, they may come as direct messages from human resources authorities or recruitment agencies.

Fig. 6
figure 6

FBM with persuasive agents

5.5 SNA and influence diffusion

In our proposed method to identity persuasive agents, we employ social mining techniques in order to select initial seed nodes effectively so as to maximize influence diffusion in CoP-network. For comparative analysis, we use different SNA metrics (\(sna-d\)) for seeds selection. Influence diffusion models require as input, initial seed nodes (k) which are considered to be influenced (have performed action a already) at the start of the experiment. Algorithm 3 defines the implemented algorithm to mine influence in CoP-network using various selection methods to select k . In Algorithm 3, G is a graph representing the CoP social network, consisting of V nodes and E edges, where \(sna-d\) is the structure position of a node with respect to the five measures (degree, betweenness, closeness, coreness, eccentricity) (Scott 2012). First, we initiate seed set S and its influence \(I_{S}\). The while loop stores a set of nodes with highest \(sna-d\) in S; this set is used to start influence spread process. Next, we execute the influence propagation processes using diffusion models. The maximum influence from S, i.e. \(I_{S}\) is returned.

figure c

6 Performance evaluation

6.1 Fuzzy semi-supervised clustering to predict CoP

We used MATLAB environment to design two experiments utilizing: (1) IRIS benchmarked data set that is overlapped by nature (Pastizzo et al. 2002); and (2) multidimensional data set simulated from career-related real world scenario. Each experiment consists of several rounds to test two main varying parameters: (1) number of seeds; and (2) degree of overlap between clusters. We compare the results of our proposed methods along two K-means candidate methods: (1) Seeded K-means (SKM) (Basu et al. 2002); and (2) CVQE-based pairwise-constraints K-means algorithm (Davidson and Ravi 2005) (denoted as PKM) that allows constraints violation with certain penalties. To evaluate the performance of the clustering algorithms, we employ two major external metrics that utilize a priori knowledge about the classification information: (1) total accuracy of clustering to measure the extent to which each cluster contains the correct objects from the corresponding ground-truth category; and (2) total F- Score to measure the effectiveness of clusters retrieval.

6.1.1 Benchmarked data

Figure 7a, b show the total accuracy and total F-Score of all methods when the size of the overlapped region increases. We can see that when the size of the overlapped region increases, in general the performance of the fuzzy algorithm increases or remain stable while the performance of the two baseline methods decrease. This indicates the effects of seeds for accurately recovering the overlapped region. Without seeds, the clustering becomes harder when the overlapped region becomes larger, and thus the performance of the two baseline methods drops. On the contrary, with the help of seeds, the performance of our fuzzy algorithm does not drop even when the size of the overlapped region increases. Thus, seeds are critical for the clustering of the instances in the overlapped region and it is crucial to allow the seeds or constraints from the overlapped region to be associated with multiple cluster labels. This is indeed the advantage of our method over other non-fuzzy algorithms, or other membership score based fuzzy algorithms which are not able to specify the membership scores for the seeds or constraints.

Fig. 7
figure 7

The performance of three clustering methods on IRIS of different overlapped region size (Number of seeds = 5)

6.1.2 Data simulated from real world scenario

The performance of the three algorithms for different number of seeds in overlapped region is presented in Fig. 8b showing that the total F-score of our fuzzy method is better than those of the baseline methods. This is because both precision and recall of the fuzzy algorithm increase as the number of seeds in the overlapped region increases. However, when the number of seeds is too large in the overlapped region, the performance of the fuzzy algorithm drops. Also we observe that the total F-score in general increases when the number of seeds increases, as we get more information on the overlapped region from the seeds. The performance drops when the number of seeds is too large. This might be because with too many seeds, we obtain more false positives in the overlapped region. We observed similar patterns for the total accuracy as well (see Fig. 8a).

Fig. 8
figure 8

The performance of three clustering methods on real world scenario of different number of seeds in overlapped region

6.2 Triadic closure to build CoP-social network

We generated a dataset (L) of a group of learners: \(L=\left\{ l_{i}\right\} _{i=1}^{N}\) while N is the size of L. Each learner \(l_{i}\) is described by a vector of n-attributes: \(l_{i}=\left\{ x_{i}\right\} _{i=1}^{n}\) that represents his interests and is used to match with other likeminded learners. The value of each attribute \(x_{i}\) is a value within a given range: \(x_{i}\in [min(x_{i}),max(x_{i})]\) and that is generated following a normal distribution. Each attribute is assigned a weight \(w_{i}\) that is a value within a given range: \(w_{i}{{\in }}[min(w_{i}),max(w_{i})]\) and that is also generated following a normal distribution.

We use the modularity Q as the basic evaluation metric to show the performance of our method. Modularity is a property of a network that specifics the degree of division of that network into coherent communities.

In the network setup experiment, we generate a dataset of 100 learners’ vectors consisting of 15 attributes (\(n=15)\) to represent hobbies/social interests and sub-interests in areas such as sports, travel, and entertainment. These vectors are used to generate a static network (Size \(N=100\)) of similar profiles utilizing the \(100\times 100\) similarity matrix (SM) such that l and u are similar and thus linked in the network, only when \(SM(l,u)=[S_{lu}]{{\ge } similarity threshold (\Delta )}\), where \(S_{ij}\) indicates the similarity weight between \(l_{i}\) and \(l_{j}\).

In the network enhancement step, we created a sparse matrix of a specified size \((n\times n)\) by combining together n learners vectors. Then, we set two similarity measures (S1, S2). S1 is used to setup the initial network (G) through a set of attributes (e.g. career interests). S2 is further used to add edges to G using another set of attributes (for example: social interests) where these edges are enforced form triads in (G) to produce the enhanced network \((G')\).

We employed CNM community analysis algorithm (Clauset et al. 2004) to compare community structures in the generated network. T-CNM represents community structure using our triadic closure enhancement method. Given (SM), we generated the frequency distribution of the similarity weights (S1); and then built several networks by using two different methods. The goodness of the generated network is evaluated using CNM , and T-CNM clustering and Modularity (Q) to assess the obtained communities.

6.2.1 Results and discussion

The edges in the original similarity matrix (generated based on career interests i.e. S1) are filtered using a threshold \(\Delta\) to remove weak edges. Different values of \(\Delta\) were used and the modularity metrics of the obtained networks are provided in Table 1. As indicated in the obtained results, Q value decreases when we include more links of poor similarity weight \((0.60<=S1<=0.85)\).

Table 1 Metrics obtained using absolute similarity threshold filtering

The edges in the original similarity matrix are further filtered using the best K-NN strategy (Wu et al. 2008). For every node, its nearest k neighbors are kept in the network, whereas the other edges are removed. Different values of k were used and the metrics for the obtained networks are provided in Table 2. As expected, the more neighbors (k) we include, the more chances to add weak links, resulting in more inter-communities edges than inner-communities edges and so decreasing Q value. On the other hand, a small number of (k) connects nodes which are highly similar in increasingly dense communities and, thus improving Q.

Table 2 Metrics obtained using k-nearest neighbors

6.3 Social network analytics to identify persuasive agents

Using generated network \(G'\), we applied five SNA methods: degree, closeness, betweenness, closeness, coreness and eccentricity for selection of nodes as initial seed for influence diffusion models. Two widely-used fundamental diffusion models are the independent cascade (IC) model (Kempe et al. 2003) and the linear threshold (LT) model (Watts 2002). In both these models, at a given timestamp, each node is either active (has already adopted the information) or inactive, and each node’s tendency to become active increases monotonically as more of its neighbors become active. Nodes can switch from being inactive to being active, but cannot switch from being active to being inactive. Given an initial set A of active nodes, both model assume that the nodes in A first become active and all the other nodes remain inactive at time-step 0. The diffusion process of active nodes unfolds in discrete time-steps \(t\ge 0.\) As time unfolds, more and more of neighbors of an inactive node u become active, eventually making u become active, and u’s decision may in turn trigger further decisions by nodes to which u is connected. Thus, the spread of the information through the network \(G'\) is represented as the spread of active nodes on \(G'\).

We set the seed nodes or the number of target influential nodes \(k=5\%\); of total nodes. We fix the number of steps (iterations) to be 50. The number of iterations can be considered as an indicator of the time required to influence the entire network as they represent how many iterations it takes to try to influence every node in a network. The ideal seed selection would be with maximum influence and minimum iterations for a network. We measured one primary parameter for the comparative analysis that is the percentage of influenced vertices.

Figure 9 shows the results of influence diffusion using the five SNA selection methods to select initial seed nodes (k) for IC and LT models. The first observation from the graphs for the influence spread is the effects of different seed selection strategies on influence results; and that high coreness nodes have the highest impact in influencing other nodes under both models (29 and \(55\%\) respectively); revealing a more central position in the network structure. On the other hand and among centrality measures, closeness—under both models—achieved the least impact to influence other nodes. This is could be attributed to the fact that closeness centrality measures implicitly assume that the underlying social network is strongly connected (Ghosh and Lerman 2010).

Degree and betweenness centrality measures perform well under both model; with degree performs better that betweenness under IC model; and the other way around under LT model. This is can be justified by the technical nature of each model, as that node thresholds are random in LT model, while in the IC model it is the propagations through links that are random. Since degree centrality is a local measure, the small value of node thresholds in LT results in less tendency of nodes to adopt the new behavior when their neighbors do. On the other hand, betweenness measure depends on the number of shortest paths or links that pass through a given node; while IC model operates according to the random propagation probability through these links. Finally, the strategy of using eccentricity measure to select seeds performs well in LT model which was higher by \(43\%\) than IC model.

Fig. 9
figure 9

Percentage of influenced vertices (\(k=5\%\))

7 Concluding remarks

We developed a scale to measure the lifelong learning capacity and professional skills referred to as “Career Dispositions” for each learner as indicators of preparation level towards practicing in knowledge-based market. We implemented a diagnostic LA tool to analyze and visualize information on collected data; and deliver it to users through a portal structure. Further, we proposed a fuzzy pairwise-constraints K-means (FPKM) algorithm as a predictive LA method to cluster learners of similar career patterns into interconnected (CoPs) that are driven by current industrial needs. We then constructed a CoP-network to support social learning by linking learners who share social interests beside their common career aspirations. We also devised a triadic closure approach to enhance community structures in CoP-network in order to optimize the learning performance of groups of learners. Finally, we incorporated different SNA metrics and concepts of social influence into a persuasive model to instill and track the diffusion of target behavior in a CoP-based network.

Future work involves developing further LA methods to capture, analyze and aggregate learning data that is inherently not interoperable such as learners administrative data, academic performance data, and classroom and online data. While integrating data sets from a variety of unconnected systems can be extremely difficult, it offers better extensive insights that automatically lead to improved capabilities of LA models. Further, we aim to develop LA management model to explore the effectiveness of CoP on improving career preparation outcomes in relation to the set of skills required by the industry for each designated career.