Workflow Clustering Method Based on Process Similarity

  • Jae-Yoon Jung
  • Joonsoo Bae
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3981)


Process-centric information systems have been accumulating a mount of process models. Process designers continue to create new process models and they long for process analysis tools in various viewpoints. This paper proposes a novel approach of process analysis. Workflow clustering facilitates to analyze accumulated workflow process models and classify them into characteristic groups. The framework consists of two phases: domain classification and pattern analysis. Domain classification exploits an activity similarity measure, while pattern analysis does a transition similarity measure. Process models are represented as weighted complete dependency graphs, and then similarities among their graph vectors are estimated in consideration of relative frequency of each activity and transition. Finally, the models are clustered based on the similarities by a hierarchical clustering algorithm. We implemented the methodology and experimented sets of synthetic processes. Workflow clustering is adaptable to various process analyses, such as workflow recommendation, workflow mining, and process patterns analysis.


Activity Birth Rate Business Process Management Transition Vector Activity Vector Cosine Measure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. van der Aalst, W.M.P., Dongen, B.F., Herbst, J., Maruster, L., Schimm, G., Weijters, A.J.M.: Workflow mining: A survey of issues and approaches. Data & knowledge engineering 47(2), 237–267 (2003)CrossRefGoogle Scholar
  2. van der Aalst, W.M.P., Weijters, A.J.M.M.: Process Mining: a Research Agenda. Computers in industry 53(3), 231–244 (2004)CrossRefGoogle Scholar
  3. Bae, J., Bae, H., Kang, S.-H., Kim, Y.: Automatic Control of Workflow Processes Using ECA Rules. IEEE Trans. Knowl. Data Eng. 16(8), 1010–1023 (2004)CrossRefGoogle Scholar
  4. Bunke, H., Shearer, K.: A Graph Distance Metric based on the Maximal Common Subgraph. Pattern Recognition Letters 19, 255–259 (1998)MATHCrossRefGoogle Scholar
  5. Cardoso, J.: How to Measure the Control-flow Complexity of Web processes and Workflows. In: Fischer, L. (ed.) Workflow Handbook 2005, WfMC, Lighthouse Point, pp. 199–212 (2005)Google Scholar
  6. Ha, B., Bae, J., Park, Y.T., Kang, S.-H.: Development of process execution rules for workload balancing on agents. Data & Knowl. Eng. 56(1), 64–84 (2006)CrossRefGoogle Scholar
  7. Hammouda, K.M., Kamel, M.S.: Efficient Phrase-Based Document Indexing for Web Document Clustering. IEEE Trans. on Knowledge and Data Engineering 16(10), 1279–1296 (2004)CrossRefGoogle Scholar
  8. Hur, W., Bae, H., Kang, S.: Customizable Workflow Monitoring. Concurrent Engineering Research and Applications 11(4), 313–326 (2003)CrossRefGoogle Scholar
  9. Jung, J., Hur, W., Kang, S., Kim, H.: Business Process Choreography for B2B Collaboration. IEEE Internet Computing 8(1), 37–45 (2004)CrossRefGoogle Scholar
  10. Kim, Y., Kang, S., Kim, D., Bae, J., Ju, K.: WW-Flow: Web-Based Workflow Management with Runtime Encapsulation. IEEE Internet Computing 4(3), 55–64 (2000)CrossRefGoogle Scholar
  11. Lian, W., Cheung, W.W., Mamoulis, N., Yiu, S.: An Efficient and Scalable Algorithm for Clustering XML Documents by Structure. IEEE Transactions on Knowledge and Data Engineering 16(1), 82–96 (2004)CrossRefGoogle Scholar
  12. Malone, T.W., Crowston, K., Herman, G.A.: Organizing Business Knowledge: The MIT Process Handbook. The MIT Press, Cambridge (2003)Google Scholar
  13. Reijers, H.A., Vanderfeesten, I.T.P.: Cohesion and Coupling Metrics for Workflow Process Design. In: Desel, J., Pernici, B., Weske, M. (eds.) BPM 2004. LNCS, vol. 3080, pp. 290–305. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  14. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. Advanced Computer Science Series. McGraw-Hill, Auckland (1983)MATHGoogle Scholar
  15. Simitsis, A., Vassiliadis, P., Sellis, T.: State-Space Optimization of ETL Workflows. IEEE Trans. on Knowledge and Data Engineering 17(10), 1404–1419 (2005)CrossRefGoogle Scholar
  16. Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: Proc. 21th Int. ACM SIGIR Conference, pp. 46–54 (1998)Google Scholar
  17. Zhang, K., Shasha, D.: Simple Fast Algorithms for the Editing Distance between Trees and Related Problems. SIAM Journal of Computing 18(6), 1245–1262 (1989)MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jae-Yoon Jung
    • 1
  • Joonsoo Bae
    • 2
  1. 1.Dept. of Technology ManagementEindhoven University of TechnologyEindhovenThe Netherlands
  2. 2.Dept. of Industrial and Information Systems EngineeringChonbuk National UniversityJeonju, ChonbukRepublic of Korea

Personalised recommendations