Skip to main content

Parallel Partitioning and Mining Gene Expression Data with Butterfly Network

  • Conference paper
Book cover Database and Expert Systems Applications (DEXA 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8055))

Included in the following conference series:

Abstract

In the area of massive gene expression analysis, Order-Preserving Sub-Matrices have been employed to find biological associations between genes and experimental conditions from a large number of gene expression datasets. While many techniques have been developed, few of them are parallel, and they lack the capability to incorporate the large-scale datasets or are very time-consuming. To help fill this critical void, we propose a Butterfly Network based parallel partitioning and mining method (BNPP), which formalizes the communication and data transfer among nodes. In the paper, we firstly give the details of OPSM and the implementations of OPSM on MapReduce and Hama BSP and their shortcomings. Then, we extend the Hama BSP framework using Butterfly Network to reduce the communication time, workload of bandwidth and duplicate results percent, and call the new framework as BNHB. Finally, we implement a state-of-the-art OPSM mining method (OPSM) and our BNPP method on top of the framework of naïve Hama BSP and our BNHB, and the experimental results show that the computational speed of our methods are nearly one order faster than that of the implementation on a single machine and the proposed framework has better effectiveness and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gao, B.J., et al.: Discovering Significant OPSM Subspace Clusters in Massive Gene Expression Data. In: Proceedings of KDD, pp. 922–928. ACM Press, New York (2006)

    Google Scholar 

  2. Frey, B.J., Dueck, D.: Clustering by Passing Messages between Data Points. Science 315(5814), 972–976 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  3. Chui, C.K., Kao, B., et al.: Mining Order-Preserving Submatrices from Data with Repeated Measurements. In: Proceedings of ICDM, pp. 133–142. IEEE Press, Cancun (2008)

    Google Scholar 

  4. Zhang, M., Wang, W., Liu, J.: Mining Approximate Order Preserving Clusters in the Presence of Noise. In: Proceedings of ICDE, pp. 160–168. IEEE Press, Cancun (2008)

    Google Scholar 

  5. Fang, Q., Ng, W., Feng, J., Li, Y.: Mining Bucket Order-Preserving SubMatrices in Gene Expression Data. IEEE Trans. on Know. and Data Engin. 24(12), 2218–2231 (2012)

    Article  Google Scholar 

  6. Gene Data, http://genomebiology.com/content/supplementary/gb-2003-4-5-r34-s8.txt

  7. Dean, J., et al.: MapReduce: Simplified Data Processing on Large Clusters. In: Proceedings of OSDI, pp. 137–150. USENIX Press, California (2004)

    Google Scholar 

  8. Ding, L., Xin, J., Wang, G., Huang, S.: ComMapReduce: An Improvement of MapReduce with Lightweight Communication Mechanisms. In: Lee, S.-G., Peng, Z., Zhou, X., Moon, Y.-S., Unland, R., Yoo, J. (eds.) DASFAA 2012, Part II. LNCS, vol. 7239, pp. 150–168. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  9. Feldmann, R., Unger, W.: The Cube-Connected Cycles Network is a Subgraph of the Butterfly Network. Parallel Processing Letters 2(1), 13–19 (1992)

    Article  Google Scholar 

  10. Bulk Synchronous Parallel, http://en.wikipedia.org/wiki/Bulk_synchronous_parallel

  11. Apache Hama, http://hama.apache.org

  12. Cancer Program Data Sets, http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi

  13. Kang, U., et al.: PEGASUS: A Peta-Scale Graph Mining System-Implementation and Observations. In: Proceedings of ICDM, pp. 229–238. IEEE Press, Florida (2009)

    Google Scholar 

  14. Zhou, J., Larson, P.A., et al.: Incorporating Partitioning and Parallel Plans into the SCOPE Optimizer. In: Proceedings of ICDE, pp. 1060–1071. IEEE Press, California (2010)

    Google Scholar 

  15. Malewicz, G., et al.: Pregel: A System for Large-scale Graph Processing. In: Proceedings of SIGMOD, pp. 135–146. ACM Press, Indiana (2010)

    Google Scholar 

  16. Eltabakh, M.Y., Tian, Y., et al.: CoHadoop: Flexible Data Placement and its Exploitation in Hadoop. In: Proceedings of VLDB, pp. 575–585. ACM Press, Washington (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jiang, T., Li, Z., Chen, Q., Wang, Z., Pan, W., Wang, Z. (2013). Parallel Partitioning and Mining Gene Expression Data with Butterfly Network. In: Decker, H., Lhotská, L., Link, S., Basl, J., Tjoa, A.M. (eds) Database and Expert Systems Applications. DEXA 2013. Lecture Notes in Computer Science, vol 8055. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40285-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40285-2_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40284-5

  • Online ISBN: 978-3-642-40285-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics