Information Systems Frontiers

, Volume 13, Issue 3, pp 371–380

A process mining based approach to knowledge maintenance

Authors

    • School of Economics and ManagementBeijing University of Aeronautics and Astronautics
    • School of Business Administration, China University of Petroleum
  • Lu Liu
    • School of Economics and ManagementBeijing University of Aeronautics and Astronautics
  • Lu Yin
    • School of Economics and ManagementBeijing University of Aeronautics and Astronautics
  • Yanqiu Zhu
    • China State Construction Engineering Corporation 1st Bureau Ltd
Article

DOI: 10.1007/s10796-010-9287-4

Cite this article as:
Li, M., Liu, L., Yin, L. et al. Inf Syst Front (2011) 13: 371. doi:10.1007/s10796-010-9287-4

Abstract

The quality of knowledge in the knowledge repository determines the effect of knowledge reusing and sharing. Knowledge to be reused should be checked in advance through a knowledge maintenance process. The knowledge maintenance process model is difficult to be constructed because of the balance between the efficiency and the effect. In this paper, process mining is applied to analyze the knowledge maintenance logs to discover process and then construct a more appropriate knowledge maintenance process model. We analyze knowledge maintenance logs from the control flow perspective to find a good characterization of knowledge maintenance tasks and dependencies. In addition, the logs are analyzed from the organizational perspective to cluster the performers who are qualified to do the same kinds of tasks and to get the relations among these clusters. The proposed approach has been applied in the knowledge management system. The result of the experiment shows that our approach is feasible and efficient.

Keywords

Knowledge maintenanceKnowledge managementProcess miningProcess mining

1 Introduction

Knowledge is an essential strategic resource for an organization (Nonaka and Takeuchi 1995; Polanyi 1967). Organizations need to utilize their valuable knowledge to gain sustainable competitive advantages (Davenport and Prusak 1998; Edvinsson and Malone 1997; Liao 2003). Knowledge management system (KMS) has evolved to address organizational needs for managing valuable knowledge within the organization (Alavi and Leidner 2001; Liao 2003; Yang and Huh 2008). However, the benefit from the knowledge sharing depends, to a large extent, on knowledge itself quality such as accuracy, currently and usefulness (Rao and Osei-Bryson 2007; Kane et al. 2005). The low quality knowledge such as outdated knowledge and useless knowledge will make it much longer to find the appropriate knowledge. Moreover, the reuse of little piece of incorrect knowledge may even damage the whole work. For example, in aircraft design, the whole hydraulic system was redesigned due to the wrong requirements of the pressure in the design manual.

In order to safeguard the knowledge quality, any piece of knowledge to be reused should be evaluated in advance through a process where the professionals or experts first evaluate it and then recognize it as sharable knowledge.

Knowledge maintenance process can be defined as the process that is used to control the knowledge flowing such as the entry to the repository and the removal from the repository. Each process is composed of separate tasks and the corresponding performers. For example, assuming the magazine Information System Frontiers is the knowledge repository, the article reviewing process that is expected to determine whether the submitted article will be published can be regarded as the knowledge acquisition process. The process is composed of three tasks which are initial review, expert review and final review. These tasks are performed by the editors and qualified experts.

Besides the knowledge acquisition process, there are other kinds of knowledge maintenance process such as knowledge removing process, which determines whether the knowledge is outdated and should be stopped from sharing.

In organizations, the actual knowledge maintenance processes and the processes perceived by the management are not fully identical. The knowledge maintenance processes are not under complete control of the system to improve the flexibility. Predefined tasks can not be skipped but the additional task can be added manually by the process originator to meet the unofficial needs of knowledge maintenance process change. Moreover, the authorized people can delegate the tasks to others unofficially who are considered to be qualified to perform these tasks. All these lead to the discrepancy between the predefined knowledge maintenance process model and the actual knowledge maintenance process model.

The optimization of the knowledge maintenance process model based on the actual process model is more objective and adaptive for the real knowledge maintenance needs. Most knowledge management systems, especially those with the embedded workflow system, provide knowledge maintenance process logs. Typically such logs register the process instances. Every instance comprises the tasks in the process and the timestamp, performer and some additional data of each completed task. Analyzing these logs is a practical way to know about the actual knowledge maintenance process. Thus, we introduce the process mining techniques to support the optimization of the knowledge maintenance process by analyzing logs.

In this paper, we elaborate how process mining is used for obtaining insights related to knowledge maintenance process from two perspectives: (1) the control flow perspective, the control flow mining algorithm is proposed to find a good characterization of knowledge maintenance tasks and paths and (2) the organizational perspective, task performers are clustered and the relations among the clusters are derived. The performers in each cluster are qualified to carry out the same knowledge maintenance tasks. The result of the experiment shows the applicability of the proposed methods.

This paper is organized as follows: Section 2 provides an overview of knowledge quality control and process mining. Section 3 shows the application of process mining to the knowledge maintenance. Then, we will describe the experiments in Section 4. Finally, Section 5 concludes the paper.

2 Related work

2.1 Knowledge quality control

Knowledge is one of the most important sources of competitive advantage for firms (Grant 1996; Nonaka 1990, 1991, 1994; Nonaka and Takeuchi 1995; Nonaka and Toyama 2003). Knowledge is different from information or data. Data are what come directly from sensors, reporting on the measured level of variable. Information is organized data that is placed in context and endowed with meaning. Knowledge is the rules that are deduced from the information (Bohn 1994).

The quality of knowledge determines the effects of knowledge reuse. (Rao and Osei-Bryson 2007). Many researches have concentrated on finding ways to control the quality of knowledge (Wang et al. 2005; Mark 1998). These researches are classified into pre-control and after-control categories. Pre-control means checking the knowledge before knowledge sharing (Wang et al. 2005). On the contrary, after-control means the quality of the knowledge is evaluated by users during the sharing process. (Mark 1998).

Although the after-control method can distinguish the valuable knowledge from useless knowledge, these useless or even wrong knowledge are reused unavoidably because the distinguishing process can only be made after lots of people find it and give their assessments. Moreover, the various expertises of users further decrease the effects of the knowledge quality evaluation. On the contrary, the pre-control is more applicable. All the knowledge that is in the repository can be regarded useful and valuable because any piece of knowledge to be stored in the repository has been checked by experts. For organizations, pre-control can reduce, to a large extent, the hazard that is caused by the poor quality knowledge.

2.2 Process mining

Process mining is the discovery of knowledge by analyzing task execution logs (van der Aalst et al. 2004). It addresses the problem that there is always a significant gap between what is prescribed or supposed to happen, and what actually happens. Process mining has been applied in many domains such as software engineering processes (Rubin et al. 2007; Cabac and Denz 2008), decision making (van der Aalst 2008) and healthcare (Yang and Hwang 2006).

There exist two popular perspectives from which we can analyze the logs: control-flow perspective and organizational perspective. The mining from the control-flow perspective, that is control flow mining, is to find the dependency between the tasks by the analysis of the event sequences observed in the log (van der Aalst et al. 2003). Agrawal algorithm (Agrawal et al. 1998) and α-algorithm (van der Aalst et al. 2003) are two notable control flow mining techniques. Agrawal algorithm has been chosen in this paper to find the dependency between the tasks since the process definition takes the form of a directed graph, which is fit for the uncomplicated knowledge maintenance processes.

Organization mining, that is the mining from the organizational perspective, is to cluster the task performers and find the relations among clusters. Song and van der Aalst (2008) proposed the Agglomerative Hierarchical Clustering to cluster the performers according the performed tasks from the logs. In knowledge maintenance processes, we focus on finding the performers with the same knowledge qualifications. The performers who are in the same knowledge areas and the corresponding same knowledge level will be clustered. Since the knowledge needed to perform the same kind of knowledge maintenance task varied because of the kinds of the knowledge itself, the accurate actual knowledge maintenance organization model can not be derived by clustering merely according to the performed tasks.

3 Process mining based knowledge maintenance

In this section we present the application of process mining in the knowledge maintenance. In knowledge maintenance process mining, the useless tasks should be found and removed in the optimized process model to improve the efficiency and the newly added useful tasks should be preserved to check the knowledge more comprehensively. In addition, the tasks can take effect in the condition that they are performed by the qualified performers. The absence of the qualified performers may halt the knowledge maintenance process. Therefore, finding the same qualified performers is the other task of knowledge maintenance process mining.

Figure 1 shows the architecture of the process mining based approach to knowledge maintenance. Knowledge maintenance tasks are performed according to the predefined process model and the generated process logs are stored in the log repository. The logged data are preprocessed and converted to the required format of process mining. Then, the control flow mining is used to find the newly added tasks, useless tasks and the sequence of the tasks. The organization mining is used to construct the organization model, with which the performers qualified to perform the same tasks can be found.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9287-4/MediaObjects/10796_2010_9287_Fig1_HTML.gif
Fig. 1

Architecture of the process mining based approach to knowledge maintenance

Before diving into the concrete applications, we give preliminary introduction of the collected lessons learned knowledge acquisition process log as shown in Table 1, which will be used as an example to illustrate the proposed mining methods in the remainder of the section.
Table 1

Lessons learned knowledge acquisition process log R

Id

Logs

1

(Start, Jack),(Exam, Bill), (Start, Jack),(Exam, Bill), (Secrecy, Eileen),(Censor, Mona), (Start, Jack), (Exam, Bill) , (Secrecy, Eileen), (Censor, Mona) , (Approve, Robert),(Check, Pete)

2

(Start, Shirley), (Exam, Bill), (Start, Shirley), (Exam, Bill) , (Secrecy, Eileen), (Censor, Mona), (Start, Shirley), (Exam, Bill) , (Secrecy, Eileen), (Censor, Mona) , (Approve, Robert), (Check, Pete)

3

(Start, Shirley), (Exam, Bill), (Start, Shirley), (Exam, Bill) , (Secrecy, George), (Censor, Mona), (Start, Shirley), (Exam, Bill) , (Secrecy, George), (Censor, Mona) , (Approve, Fred), (Check, Pete)

4

(Start, Jack), (Exam, Bill), (Secrecy, George), (Start, Jack), (Exam, Bill), (Secrecy, George) , (Censor, Mona), (Approve, Robert), (Check, Pete)

5

(Start, BOB), (Exam, Bill), (Secrecy, George), (Start, Bob), (Exam, Bill), (Secrecy, George) , (Censor, Mona), (Approve, Fred), (Start, Bob), (Exam, Bill), (Secrecy, George) , (Censor, MONA), (Approve, Fred) , (Check, Pete)

6

(Start, Jack), (Exam, Bill), (Secrecy, George), (Censor, Mona), (Approve, Fred), (Start, Jack), (Exam, Bill), (Secrecy, George), (Censor, Mona), (Approve, Fred) , (Check, Pete)

7

(Start, Tom), (Exam, Marie) , (Censor, Mona), (Secrecy, Eileen), (Approve, Robert), (Check, Pete)

8

(Start, Nancy), (Exam, Marie), (Censor, Mona), (Secrecy, George), (Start, Nancy), (Exam, Marie), (Censor, Mona), ( Secrecy, George), (Approve, Robert), (Check, Pete)

9

(Start, Tom), (Exam, Marie), (Censor, Mona), ( Secrecy, Eileen), (Approve, Fred), (Check, Pete)

10

(Start, Nancy), (Exam, Marie), (Censor, Mona), (Secrecy, George), (Start, Nancy), (Exam, Marie), (Censor, Mona), ( Secrecy, George) , (Approve, Fred), (Check, Pete)

The process logs were generated by the knowledge management system (KMS) that had been implemented in an aviation design institute. Each of the ten preprocessed process instances includes the task name, the corresponding performers and sequence of the tasks. The predefined dependency of the tasks, which is the original control flow model, is shown in Fig. 2.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9287-4/MediaObjects/10796_2010_9287_Fig2_HTML.gif
Fig. 2

Original control flow model of the lessons learned knowledge acquisition process

In Fig. 2, each real task is represented by the circles with solid edge and one virtual task is represented by the circle with doted edge. The virtual task just means that all the tasks in the process are completed and the knowledge has entered the repository. The meanings of tasks are clarified as follows.
  • Task Start is the starting task of the process. In other words, the process is originated.

  • Task Exam means the examination task which is the basic quality examination.

  • Task Censor means the censoring task which is the quality examination from the specialty aspect.

  • Task Approve means the approving task which is the quality examination from the aviation model aspect. It is often performed by the vice chief designer.

  • Task Check is the last task which is performed by the chief designer. The documents that are relative to the aviation should be checked by the chief designer traditionally.

3.1 Mining from the control flow perspective

In this section, we focus on the task and dependency mining. Besides deriving information about the original process model, we check whether the reality conforms to the a-priori model as shown in Fig. 2. As we concentrate on the tasks and the relation between them, we eliminate the performers of the log R, and the result is shown in Table 2.
Table 2

Processed log L

Id

Logs

1

(Start),(Exam), (Start),(Exam) ,(Secrecy),(Censor), (Start), (Exam) , (Secrecy), (Censor) , (Approve),(Check)

2

(Start), (Exam), (Secrecy), (Start), (Exam), (Secrecy) , (Censor), (Approve), (Check,)

3

(Start), (Exam), (Secrecy), (Start), (Exam), (Secrecy) , (Censor), (Approve), (Start), (Exam), (Secrecy) , (Censor), (Approve ) , (Check)

4

(Start), (Exam), (Secrecy), (Censor), (Approve), (Start), (Exam), (Secrecy), (Censor,), (Approve) , (Check)

5

(Start), (Exam), ( Censor), ( Secrecy), (Start), (Exam), ( Censor), ( Secrecy), (Approve), (Check)

6

(Start), (Exam), ( Censor), ( Secrecy), (Approve), (Check)

At first, we will illustrate the knowledge maintenance process from the control flow perspective in more detail. Knowledge maintenance process is composed of some separate tasks. Each repository has its own knowledge maintenance processes. For example, the acquisition process of airworthiness knowledge consists of two tasks. However, the acquisition process of lessons learned knowledge consists of four tasks, which are censoring, examination, approving and evaluation. Each task in the process is to check whether the knowledge meets the quality requirement from the specific aspect. The key is to check for errors. Each task is performed by the kind of people according to pre-defined rules. The successive tasks will be activated if the outcome of the current task is positive. If the outcome is negative, the knowledge is routed back to the originator. In other words, the knowledge will be checked from other aspects if it has met the current quality requirement. Otherwise, the knowledge will be returned to the originator to correct the found errors. The knowledge can not be modified by anyone but the process originator because only the originator knows the reason to initialize the process for the knowledge. In addition, it would prevent him from making the same mistakes in the future.

Knowledge maintenance control flow mining includes the task mining, which eliminates the useless tasks, and the dependency mining, which finds the dependency of the tasks. The knowledge maintenance control flow mining algorithm is shown in Table 3.
Table 3

Knowledge maintenance control flow mining algorithm

The knowledge maintenance process can be modeled as the graph G = (V, E), with V being set of tasks of the process which includes the tasks that are pre-defined and the tasks that are added manually in the execution of the process and E being set of dependencies between tasks. The initial value of E is Φ and V is {S} respectively. S is the start task. It means the user originates the process. V and E will be obtained through the mining algorithm. Each process execution p in L is a sequence of tasks which starts from S. P = {s, t1, t2……, tn}.

1. For each p in L and for each task t except the start task S in p; if the outcome of t is negative and t exists in V, then the count of t pluses one. If the outcome of t is negative and t does not exit in V, then adds t to V.

2. Remove from V the tasks the counts of which are less than the given threshold δ except the task S.

3. p’ is the instance in which duplicated tasks have been eliminated. L’ is the set of p’. For each p in L, the last sub sequence that starts with task S is connected to p’. After p is processed, p’ is added to L’.

4. For each treated process instance p’ in L’, and for each pair of tasks u, v such that u terminates before v starts in p’, add the edge (u, v) to E.

5. Remove from E the edges that appear in both directions.

6. For each strongly connected component of G, remove from E all edges between tasks in the same strongly connected component.

7. For each treated process instance p’ execution in L’:

(a) Find the induced sub graph of G.

(b) Compute the transitive reduction of the sub graph.

(c) Mark those edges in E that are present in the transitive reduction.

8. Remove the unmarked edges in E.

9. Return (V, E).

The algorithm is divided into three parts. The first part including the first two steps deals with the tasks. The noisy instances and the useless tasks are removed. The noisy data is removed to get a more general model and the elimination of the useless task improves the knowledge maintenance process efficiency. The second part of the algorithm which only includes step (3) removes the duplicate sub processes in the instances. The instances which include the negative outcome tasks contain duplicate tasks. Moreover, the re-promote process may include additional tasks which are added manually. Only the latest successful execution sub process is preserved in the dealt instance. The third part including step (4)-step (9) which is derived from the Agrawal algorithm is used to find the dependency of the remaining tasks.

In the following, we explain the algorithms more specifically with the concrete application of the algorithms in Log L as shown in Table 2.

Step 1 selects effective tasks and counts their effective occurrence times. The occurrence of the task can be considered as effective if the outcome is negative. It indicates that errors are found. The task is considered ineffective if all the outcomes of the task are positive. It means that no errors are found. The task may take the similar effect with previous tasks. Table 4 shows the mining result after step1. Task Check is ineffective and removed.
Table 4

The selected effective tasks

Task

Count

Exam

1

Secrecy

3

Censor

1

Approve

2

Step 2 eliminates the tasks which are noisy in the process log. The tasks that are executed infrequently are considered as noisy. In other words, the tasks the count of which is less than the given threshold will be removed. In the example, we set δ = 2, and no task is removed.

Step 3 is adopted to treat the processes which include loops. The last sub process which starts with Start is preserved in the treated process. We can get L’ after step 3.
$$ L\prime = \left\{ {\left( {start,exam,\sec recy,censor,approve} \right),\left( {start,exam,censor,\sec recy,approve} \right)} \right\} $$
Step 4 defines the dependencies between tasks in the process with respect to the log. Fig. 3 shows the mining result after step 4
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9287-4/MediaObjects/10796_2010_9287_Fig3_HTML.gif
Fig. 3

Dependency graph of log L

Step 5 and step 6 identify those tasks which ought to be treated as independent because they appear in reverse order in two separate executions. To find such independent tasks, we find the strongly connected components in the graph. For two tasks in the same strongly connected component, there exist paths of following from the one to the other; consequently, edges between tasks in the same strongly connected component are removed. Figure 4 shows the mining result after step 5 and step 6. In Fig. 4, the relations between task Secrecy and task Censor are removed.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9287-4/MediaObjects/10796_2010_9287_Fig4_HTML.gif
Fig. 4

Minimal conformal graph of log L

Steps 7–8 retain only those edges from graph that are necessary for at least one execution in L. Figure 5 shows the mining result after step 7 and step 8. In Fig. 5, the relations that do not exit in the actual logs are removed such as the relation between task Start and task Secrecy.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9287-4/MediaObjects/10796_2010_9287_Fig5_HTML.gif
Fig. 5

Mined process model of log L

Compared with the original process model as shown in Fig. 2, the task Check is removed and the task Secrecy is added. The task check takes no effects and is removed from the mined process model, because the he performer of the check task is so busy that he only give the results according to suggestions given by the vice chief designer. The added task Secrecy means the secrecy control task which is the secrecy related examination according to the secrecy rules. Although the mining result shows that the secrecy control task and censoring task are performed in parallel, in each specific department they are performed in series. The sequence of the two tasks varies with the department for the lack of official rules. We are not limited to one specific department but focus on the whole organization. Therefore, the sequence of the two tasks in the mining result is in parallel.

3.2 Mining from the organizational perspective

After the analysis of knowledge maintenance process from the control flow perspective, we direct our attention towards the organization perspective, which are the performers of the tasks. In organization model mining, we focus on deriving information about the knowledge maintenance organization model instead of the formal organization model.

The knowledge maintenance organization model usually consists of units, performers, and their relationships (i.e. who belongs to which unit, the hierarchy among units). The unit indicates the knowledge area. Performers within the same unit are in the same knowledge area and the same knowledge level. The unit level reflects the knowledge level in the knowledge maintenance organization model. Vertical connections between units represent relations such as the coverage between the knowledge areas. The units in the higher level are more professional and cover all the knowledge areas that the successive units are involved in. There are no knowledge maintenance relations among the parallel units because each unit works in one specific knowledge area. The knowledge is handed over following the hierarchical structure. The knowledge will only be maintained by specific unit and its senior departments. For example, the knowledge about wireless is maintained by the wireless unit and its senior departments such as electrical apparatus department. It would not be checked by the irrelevant units such as hydraulic pressure unit.

The knowledge maintenance has no direct connection with the formal organizational model, but has close correlation with the knowledge maintenance process. For example, the manager who is in the top level of the formal organizational model may not be the most professional. However, the hierarchy of the knowledge maintenance organizational model reflects the knowledge level, that is, the higher the level, the more professional the people.

Figure 6 shows a knowledge maintenance organization model. Jack, Shirley, Bob and Bill are in the Ground Equipment knowledge area and Bill is more professional. Correspondingly, George, Marie, Tom and Nancy are in the Undercarriage knowledge area and Marie is more professional. Eileen is specialized in the Equipment knowledge area which includes the Ground Equipment knowledge area and Undercarriage knowledge area. Pete is the chief designer and he is most professional.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9287-4/MediaObjects/10796_2010_9287_Fig6_HTML.gif
Fig. 6

The example of the knowledge maintenance organization model

In order to get the actual knowledge maintenance organization model, the knowledge maintenance organization model mining algorithm is proposed as follows Table 5.
Table 5

Knowledge maintenance organization model mining algorithm

The organization model can be modeled as the graph G = (V, E), with V being set of performers and E being set of relations between the performers. The initial values of E and V are Φ. Let L be the set of logs, πp(l) be the set of performers in l, P(v1,vn) = {v1,v2...,vn} be the path from v1 to vn and length (P(v1,vn)) be the length of path P which is the count of performers in P.

1. For each l in L

The last sub sequence that starts with task Start is connected to p’. After l is processed, l is set p’.

\( \forall {P_i},{P_j} \in {\pi_p}(l),V = V \cup \{ {P_i}\} \cup \{ {P_j}\} \)

If there exists a direct handover Pi → Pj in l

If there exist paths from Pi to Pj in E

If P (Pi,Pj) in l is longer than all the paths from Pi to Pj in E

Use the direct handover Pi → Pj in l to replace the paths from Pi to Pj in E.

Else add the edge (Pi, Pj) to E.

End For

2. For each u, v in V

If output(u) = output(v) and input(u) = input(v)

u and v are clustered into one unit.

End For

Return G = (V, E).

In the following, we give the explanation to the algorithm. As we focus on the relation of the performers, we only reserve the performers and the relations in the distilled log P as shown in Table 6.
Table 6

Distilled log P

Id

Performers

1

Jack, Bill, Eileen, Mona, Robert, Pete

2

Shirley, Bill, Eileen, Mona, Robert, Pete

3

Shirley, Bill, George, Mona, Fred, Pete

4

Jack, Bill, George, Mona, Robert, Pete

5

Bob, Bill, George, Mona, Fred, Pete

6

Jack, Bill, George, Mona, Fred, Pete

7

Tom, Marie, Mona, Eileen, Robert, Pete

8

Nancy, Marie, Mona, George, Robert, Pete

9

Tom, Marie, Mona, Eileen, Fred, Pete

10

Nancy, Marie, Mona, George, Fred, Pete

The first step defines knowledge relations between two performers with respect to the log. The rational is that j is i’ superior if there is a handover of knowledge from i to j. That is, j has more knowledge than i. The longer the path is, the more knowledge relations of performers can be derived. To get the complete hierarchical relations, the longest path between two nodes is reserved in the graph. Figure 7 shows the generated network from the log P after step 1.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9287-4/MediaObjects/10796_2010_9287_Fig7_HTML.gif
Fig. 7

Generated network

The second step clusters the performers who are in the same knowledge areas and the same knowledge level. The performers that have both the same output nodes and the same input nodes are considered in the same knowledge areas and knowledge level. The input nodes indicate the inferior users’ evaluation on these performers. As usual, if the performers are deemed no different, then the inferior users will hand the knowledge over to any of them. The output nodes reflect the evaluation of themselves. Each output node is in the specific knowledge areas. If the two performers hand the document to the same output nodes, they are probably are in the same knowledge areas.

Finally, knowledge maintenance organization model is derived, as shown in Fig. 8.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9287-4/MediaObjects/10796_2010_9287_Fig8_HTML.gif
Fig. 8

Mined Knowledge maintenance organization model of log P

In comparison to the predefined knowledge maintenance organization model as shown in Fig. 6, Mona is added in the mined model. Eileen and Mona are both specialized in the Equipment knowledge area which includes the Ground Equipment knowledge area and Undercarriage knowledge area. Mona is more specialized in the secrecy.

The other difference is the level of George. George is in the same knowledge area and at the same level with Tom and Nancy as shown in Fig. 6. However, he is at the same level with Eileen and is involved in the other two knowledge areas in the mined knowledge maintenance organization model as shown in Fig. 8. It indicates that he is qualified to perform and performed the same kind of knowledge maintenance tasks with the department leader Eileen although he is an ordinary designer in the predefined organization model. Task delegation from Eileen to George probably occurred in the Equipment Department.

With the above findings, George is the most qualified one to replace Eileen. Meanwhile, George can be chose as a candidate for the censoring task in the optimized knowledge maintenance organization model. Moreover, Bill, Marie and Mona are keys to the knowledge maintenance process for no other person can replace their work. In the optimized knowledge maintenance organization model, more people should be involved in the tasks such as Bill, Marie and Mona. In the condition, the absence of any performer will not halt the knowledge maintenance process.

4 Experiment and evaluation

This section presents the evaluation of the proposed knowledge maintenance process mining methods, which have been implemented in the knowledge management system in an aviation design institute. The main focus in evaluating the methods is to determinate if the derived methods is fit for the instances. That is, how the mined model reflects the actual knowledge maintenance process is evaluated.

4.1 Evaluation metrics

For the evaluation of proposed mining methods, recall and precision are measures widely used to evaluate the performance of the process mining (Hwang and Yang 2002).

The precision of an algorithm X is the ratio of the number of correct transitions returned by X to the total number of transitions returned by X.

The recall of an algorithm X is the ratio of the number of correct transitions returned by X to the total number of correct transitions.

4.2 Experiment result

We have designed experiments in which the proposed methods are used to analyze the knowledge maintenance logs. We consider a data set with 6000 instances collected from the knowledge management system. These logs cover 9 repositories and 2 kinds of processes. In the experiments, we just considered the latest successful execution sub process. The corresponding precision and recall of control flow mining and organization mining are shown in Table 7, Table 8 and Figs. 9 and 10 respectively. When 40% of instances are used for training in process mining, precision and recall are 89.9% and 82.1% respectively. Meanwhile, when 60% of instances are used for training in organization mining, precision and recall are 85.9% and 80.2% respectively.
Table 7

Detailed experiment result of control flow mining

Log size

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

precision

0.301

0.589

0.831

0.899

0.913

0.924

0.938

0.948

0.959

0.976

recall

0.232

0.401

0.659

0.821

0.839

0.876

0.893

0.917

0.918

0.931

Table 8

Detailed experiment result of organization mining

Log size

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

precision

0.241

0.449

0.731

0.779

0.839

0.859

0.882

0.903

0.919

0.939

recall

0.189

0.302

0.619

0.699

0.743

0.802

0.839

0.874

0.883

0.891

https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9287-4/MediaObjects/10796_2010_9287_Fig9_HTML.gif
Fig. 9

Expirement result of control flow mining

https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9287-4/MediaObjects/10796_2010_9287_Fig10_HTML.gif
Fig. 10

Experiment result of organization mining

In both figures, we see that recall and precision improve as the number of instances increases, especially in the relative fewer test instances. In the beginning, the training set is fewer, and most nodes are considered as useless and the derived corresponding transitions is less, so all the metric is bad. With the increasing of the training set, more nodes are preserved and more transitions are derived, so the test result is better. When almost all the nodes and the transitions are derived, the performance will not increase dramatically with the increase of the training set. Moreover, the experiment result of the control flow mining is better than the organization model. The control-flow model only depends on the knowledge repository and the maintenance process. However, in organization model, the performers that carry out the same task may be clustered into more than one cluster due to the kinds of knowledge itself. The derived organization model includes more nodes and more transitions. Thus, the performance of organization mining is poorer than that of control flow mining.

However, in all, these values reveal a good performance of the algorithms. Both of the mined models can fit for the real executions and reflects the real knowledge maintenance processes.

5 Conclusion and Future Work

In this paper, we have focused on the applicability of process mining in the knowledge maintenance. We explained process mining based approach to knowledge maintenance from the two perspectives (1) control flow perspective. The additional tasks and the inefficient tasks are found and the dependencies of the tasks are deduced using the proposed knowledge maintenance control flow mining algorithm and (2) organization perspective. The performers are clustered according to the handover of the knowledge to get the knowledge maintenance organization model. Experimental results have shown the usefulness and effectiveness of our proposed methods.

As future work, the data mining approach will be used to find the how the related factors such as the kind of the process, the unit and the secrecy of the knowledge have influence on the performance of the knowledge maintenance processes. In addition, the social network analysis will also be used to analyze the relations between the performers more comprehensively.

Acknowledgement

The research is supported by the National Natural Science Foundation of China under Grant No. 70671007, 70871006, 70901004, the PhD Program Foundation of Education Ministry of China under Contract No. 200800060005 and Research Funds Provided to New Recruitments of China University of Petroleum-Beijing (QD-2010-06).

Copyright information

© Springer Science+Business Media, LLC 2010