Design and Implementation of Parallel Modified PrefixSpan Method

  • Toshihide Sutou
  • Keiichi Tamura
  • Yasuma Mori
  • Hajime Kitakami
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2858)


The parallelization of a Modified PrefixSpan method is proposed in this paper. The Modified PrefixSpan method is used to extract the frequent pattern from a sequence database. This system developed by authors requires the use of multiple computers connected in local area network. This system, which has a dynamic load balancing mechanism, is achieved through communication among multiple computers using a socket and an MPI library. It also includes multi-threads to achieve communication between a master process and multiple slave processes. The master process controls both the global job pool, to manage the set of subtrees generated in the initial processing and multiple slave processes. The results obtained here indicated that 8 computers were approximately 6 times faster than 1 computer in trial implementation experiments.


Execution Time Frequent Pattern Performance Ratio Support Ratio Motif Discovery 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H.: PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In: Proc. of International Conference on Data Engineering (ICDE 2001), pp. 215–224. IEEE Computer Society Press, Los Alamitos (2001)Google Scholar
  2. 2.
    Kitakami, H., Kanbara, T., Mori, Y., Kuroki, S., Yamazaki, Y.: Modified PrefixSpan method for Motif Discovery in Sequence Databases. In: Ishizuka, M., Sattar, A. (eds.) PRICAI 2002. LNCS (LNAI), vol. 2417, pp. 482–491. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    Bailey, T.L., Elkan, C.: Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers. In: Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28–36. AAAI Press, Menlo Park (1994)Google Scholar
  4. 4.
    Bairoch, A., Bucher, P., Hofman, K.: The PROSITE Database: Its Status in 1995. Nucleic Acids Research 24, 189–196 (1996)CrossRefGoogle Scholar
  5. 5.
    Sonnhamer, E.L.L., Eddy, S.R., Durbin, R.: Pfam: A Comprehensive Database of Proteins, vol. 28, pp. 405–420 (1997)Google Scholar
  6. 6.
    Jonassen, I., Collins, J.F., Higgins, D.G.: Finding Flexible Patterns in Unaligned Protein Sequences, Protein Science, pp. 1587–1595. Cambridge University Press, Cambridge (1995)Google Scholar
  7. 7.
    Rigoutsos, I., Floratos, A.: Combinatorial Pattern Discovery in Biological Sequences: The TEIRESIAS Algorithm. BIOINFORMATICS 14(1), 55–67 (1998)CrossRefGoogle Scholar
  8. 8.
    Rigoutsos, I., Floratos, A.: Motif Discovery without Alignment or Enumeration. In: Proceedings of Second Annual ACM International Conference on Computational Molecular Biology (RECOMB 1998), March 1998, pp. 221–227 (1998)Google Scholar
  9. 9.
    Floratos, A., Rigoutsos, I.: On the Time Complexity of the TERIESIAS Algorithm, IBM Research Report, RC 21161(94582) (April 1998)Google Scholar
  10. 10.
    Araki, T., Murai, H., Kamachi, T., Seo, Y.: Implementation and Evaluation of Dynamic Load Balancing Mechanism for a Data Parallel Language, Information Processing Society of Japan: vol. 43(SIG 6(HPS5)) Transactions on High Performance Computing System, pp. 66–75 (September 2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Toshihide Sutou
    • 1
  • Keiichi Tamura
    • 1
  • Yasuma Mori
    • 1
  • Hajime Kitakami
    • 1
  1. 1.Graduate School of Information SciencesHiroshima City UniversityHiroshimaJapan

Personalised recommendations