A Framework for Discovering Important Patterns Through Parallel Mining of Protein–Protein Interaction Network

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 381)


Association rule mining can be applied in the field of bioinformatics for identification of co-occurrences between various biological elements such as genes and protein. In bioinformatics, protein–protein interaction network provides useful information regarding the functions of protein. Association analysis has been used for identification of frequently occurring interactions among the proteins in the network for predicting the functions of proteins. As the amount of data is increasing exponentially, parallel implementation of association analysis for identification of co-occurrences between proteins in protein–protein interaction network will be more efficient, fast, and scalable. In this paper we proposed an efficient framework for association analysis of frequently occurring pattern in the protein–protein interaction network. The algorithm has been parallelized using Hadoop software. The performance view of the parallel algorithm has been depicted in graph and it shows that the parallel version is more effective than the sequential one.


Association rule mining Hadoop MapReduce Protein–protein interaction Hyperclique pattern 


  1. 1.
    Pandey, G., Kumar, V., Steinbach, M.: Computational approaches for protein function prediction: a survey. Technical Report, Department of Computer Science and Engineering, University of Minnesota 06–028 (2006)Google Scholar
  2. 2.
    Agrawal, R., Imielinski, T., and Swami, A.N.: Mining association rules between sets of items in large databases. In: Buneman, P., Jajodia, S. (eds.) Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington 207–216 (1993)Google Scholar
  3. 3.
    Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Pandey, G., Steinbach, M., Gupta, R., Garg, T., Kumar, V.: Association analysis based transformations for protein interaction networks: a function prediction case study. In: Proceedings of the 13th ACM SIGKDD International Conference, pp. 540–549 (2007)Google Scholar
  5. 5.
    Salwinski, L., Eisenberg, D.: Computational methods of analysis of protein-protein interactions. Curr. Opin. Struct. Biol. 13(3), 377–382 (2003)CrossRefGoogle Scholar
  6. 6.
    Xiong, H., Tan, P.-N., Kumar, V.: Hyperclique pattern discovery. Data Min. Knowl. Discov. 13(2), 219–242 (2006)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Tom, W.: Hadoop :The Definitive Guide. O’reilly, 3rd edn. ISBN: 978-1-449-31152-0Google Scholar
  8. 8.
    Agarwal, R., Shafer, J.: Parallel mining association rules. IEEE Trans. Knowl. Data Eng. 8(6), 962–969 (1996) (pp. 4–6, 14)Google Scholar
  9. 9.
    Zaki, M.J., Parthasarathy, S., Li, W.: Parallel data mining for association rules on shared memory multi-processors. In: Supercomputing, vol. 96, pp. 17–22, Pittsburg, Nov 1996Google Scholar
  10. 10.
    Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proceedings of 3rd International Conference on Knowledge Discovery and Data Mining, pp. 283–296, Aug 1997Google Scholar
  11. 11.
    Xenarios, I., Salwinski, L., Duan, X.J., Higney, P., Kim, S.-M., Eisenberg, D.: DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30(1), 303–305 (2002)CrossRefGoogle Scholar
  12. 12.
    “HDFS High Availability Using the Quorum Journal Manager.” Apache Software Foundation. Available at http://hadoop.apache.org/docs/current/hadoop-yarn/hadoopyarnsite/HDFSHighAvailabilityWithQJM.html. Accessed on June 5, 2013
  13. 13.
    Li, N., Zeng, L., He, Q. Shi, Z.: Parallel implementation of a priori algorithm based on map reduce. In: Proceedings of 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, @IEEE (2012)Google Scholar
  14. 14.
    Steele, G.L.: Common Lisp: the Language, 2nd edn. Digital Press, Bedford (1990)Google Scholar
  15. 15.
    Dean, J., Ghemawat, S.: Map reduce: simplified data processing on large clusters. In: OSDI’04, 6th Symposium on Operating Systems Design and Implementation, Sponsored by USENIX, in cooperation with ACM SIGOPS, pp. 137–150 (2004)Google Scholar
  16. 16.
  17. 17.
    Pandey, G., Steinbach, M., Gupta, R., Garg, T., Kumar, V.: Association analysis based transformations for protein interaction networks: a function prediction case study. In: Proceedings of the 13th ACM SIGKDD International Conference, pp. 540–549 (2007)Google Scholar

Copyright information

© Springer India 2016

Authors and Affiliations

  1. 1.Department of MCATechno India College of TechnologyKolkataIndia
  2. 2.Department of Computer Science and EngineeringUniversity of CalcuttaKolkataIndia

Personalised recommendations