Healthcare Trajectory Mining by Combining Multidimensional Component and Itemsets
Sequential pattern mining is aimed at extracting correlations among temporal data. Many different methods were proposed to either enumerate sequences of set valued data (i.e., itemsets) or sequences containing multidimensional items. However, in real-world scenarios, data sequences are described as events of both multidimensional items and set valued information. These rich heterogeneous descriptions cannot be exploited by traditional approaches. For example, in healthcare domain, hospitalizations are defined as sequences of multi-dimensional attributes (e.g. Hospital or Diagnosis) associated with two sets, set of medical procedures (e.g. \( \lbrace \) Radiography, Appendectomy \(\rbrace\)) and set of medical drugs (e.g. \(\lbrace \) Aspirin, Paracetamol \(\rbrace\)) . In this paper we propose a new approach called MMISP (Mining Multidimensional Itemset Sequential Patterns) to extract patterns from a complex sequences including both dimensional items and itemsets. The novelties of the proposal lies in: (i) the way in which the data can be efficiently compressed; (ii) the ability to reuse and adopt sequential pattern mining algorithms and (iii) the extraction of new kind of patterns. We introduce as a case-study, experimented on real data aggregated from a regional healthcare system and we point out the usefulness of the extracted patterns. Additional experiments on synthetic data highlights the efficiency and scalability of our approach.
KeywordsSequential Patterns Multi-dimensional Sequential Patterns Data Mining
Unable to display preview. Download preview PDF.
- 2.Appice, A., Berardi, M., Ceci, M., Malerba, D.: Mining and Filtering Multilevel Spatial Association Rules with ARES (2005)Google Scholar
- 3.Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential pattern mining using a bitmap representation. In: KDD, pp. 429–435 (2002)Google Scholar
- 4.Chiu, D.-Y., Wu, Y.-H., Chen, A.L.P.: An efficient algorithm for mining frequent sequences by a new strategy without support counting. In: ICDE, pp. 375–386 (2004)Google Scholar
- 5.Cohen, J., Eshleman, J., Hagenbuch, B., Kent, J., Pedrotti, C., Sherry, G., Waas, F.: Online expansion of largescale data warehouses. In: PVLDB, vol. 4(12), pp. 1249–1259 (2011)Google Scholar
- 6.Egho, E., Jay, N., Raïssi, C., Napoli, A.: A FCA-based analysis of sequential care trajectories. In: Napoli, A., Vychodil, V. (eds.) The Eighth International Conference on Concept Lattices and their Applications - CLA 2011, Nancy, France. INRIA Nancy Grand Est - LORIA (October 2011)Google Scholar
- 9.Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Prefixspan: Mining sequential patterns by prefix-projected growth. In: ICDE, pp. 215–224 (2001)Google Scholar
- 10.Pinto, H., Han, J., Pei, J., Wang, K., Chen, Q., Dayal, U.: Multi-dimensional sequential pattern mining. In: CIKM, pp. 81–88 (2001)Google Scholar
- 13.Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer, Heidelberg (1996)Google Scholar
- 14.Yan, X., Han, J., Afshar, R.: Clospan: Mining closed sequential patterns in large datasets. In: SDM, pp. 166–177 (2003)Google Scholar
- 15.Yang, Z., Kitsuregawa, M., Wang, Y.: Paid: Mining sequential patterns by passed item deduction in large databases. In: IDEAS, pp. 113–120 (2006)Google Scholar
- 18.Zhang, C., Hu, K., Chen, Z., Chen, L., Dong, Y.: Approxmgmsp: A scalable method of mining approximate multidimensional sequential patterns on distributed system. In: FSKD (2), pp. 730–734 (2007)Google Scholar