Design and Implementation of Parallel Modified PrefixSpan Method
The parallelization of a Modified PrefixSpan method is proposed in this paper. The Modified PrefixSpan method is used to extract the frequent pattern from a sequence database. This system developed by authors requires the use of multiple computers connected in local area network. This system, which has a dynamic load balancing mechanism, is achieved through communication among multiple computers using a socket and an MPI library. It also includes multi-threads to achieve communication between a master process and multiple slave processes. The master process controls both the global job pool, to manage the set of subtrees generated in the initial processing and multiple slave processes. The results obtained here indicated that 8 computers were approximately 6 times faster than 1 computer in trial implementation experiments.
KeywordsExecution Time Frequent Pattern Performance Ratio Support Ratio Motif Discovery
Unable to display preview. Download preview PDF.
- 1.Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H.: PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In: Proc. of International Conference on Data Engineering (ICDE 2001), pp. 215–224. IEEE Computer Society Press, Los Alamitos (2001)Google Scholar
- 3.Bailey, T.L., Elkan, C.: Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers. In: Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28–36. AAAI Press, Menlo Park (1994)Google Scholar
- 5.Sonnhamer, E.L.L., Eddy, S.R., Durbin, R.: Pfam: A Comprehensive Database of Proteins, vol. 28, pp. 405–420 (1997)Google Scholar
- 6.Jonassen, I., Collins, J.F., Higgins, D.G.: Finding Flexible Patterns in Unaligned Protein Sequences, Protein Science, pp. 1587–1595. Cambridge University Press, Cambridge (1995)Google Scholar
- 8.Rigoutsos, I., Floratos, A.: Motif Discovery without Alignment or Enumeration. In: Proceedings of Second Annual ACM International Conference on Computational Molecular Biology (RECOMB 1998), March 1998, pp. 221–227 (1998)Google Scholar
- 9.Floratos, A., Rigoutsos, I.: On the Time Complexity of the TERIESIAS Algorithm, IBM Research Report, RC 21161(94582) (April 1998)Google Scholar
- 10.Araki, T., Murai, H., Kamachi, T., Seo, Y.: Implementation and Evaluation of Dynamic Load Balancing Mechanism for a Data Parallel Language, Information Processing Society of Japan: vol. 43(SIG 6(HPS5)) Transactions on High Performance Computing System, pp. 66–75 (September 2002)Google Scholar