Abstract
Parallel and high performance computing is continuously gaining attention in the last years as a means to accelerate several kind of computationally expensive applications. This chapter is a review of different research works and publicly available tools whose target is the acceleration of microarray data analysis, thanks to exploiting high performance computing systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdelrahman TS (2016) Accelerating K-means clustering on a tightly-coupled processor-FPGA heterogeneous system. In: 2016 IEEE international conference on application-specific systems, architectures and processors (ASAP), pp 176–181
Abduallah Y, Turki T, Byron K, Du Z, Cervantes-Cervantes M, Wang JTL (2017) MapReduce algorithms for inferring gene regulatory networks from time-series microarray data using an information-theoretic approach. BioMed Res Int. https://doi.org/10.1155/2017/6261802
Agapito G, Cannataro M, Guzzi PH, Marozzo F, Talia D, Trunfio P (2013) Cloud4SNP: distributed analysis of SNP microarray data on the cloud. In: 2013 International conference on bioinformatics, computational biology and biomedical informatics (BCB), p 468
Alborzi SZ, Maduranga DAK, Fan R, Rajapakse JC, Zheng J (2014) CUDAGRN: parallel speedup of inferring large gene regulatory networks from expression data using random forest. In: 2014 IAPR international conference on pattern recognition in bioinformatics (PRIB), pp 85–97
ARACNe-AP: network reverse engineering through AP inference of mutual information (2018). https://sourceforge.net/projects/aracne-ap/. Last accessed March 2018
Asadi NB, Fletcher CW, Gibeling G, Glass EN, Sachs K, Burke D, Zhou Z, Wawrzynek J, Wong WH, Nolan GP (2010) Paralearn: a massively parallel, scalable system for learning interaction networks on FPGAs. In: 2010 ACM international conference on supercomputing (SC), pp 83–94
Belean B, Borda M, Le Gal B, Terebes R (2012) FPGA based system for automatic cDNA microarray image processing. Comput Med Imaging Graph 36(5):419–429
Benso A, Di Carlo S, Politano G, Savino A (2010) GPU acceleration for statistical gene classification. In: 2010 IEEE international conference on automation quality and testing robotics (AQTR), vol 2, pp 1–6
Borelli FF, de Camargo RY, Martins DC, Rozante LCS (2013) Gene regulatory networks inference using a multi-GPU exhaustive search algorithm. BMC Bioinformatics 14(18):S5
Buck I, Foley T, Horn D, Sugerman J, Fatahalian K, Houston M et al (2004) Brook for GPUs: stream computing on graphics hardware. ACM Trans Graph 23(3):777–786
Buyya R, Yeo CS, Venugopal S, Broberg J, Brandic I (2009) Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Futur Gener Comput Syst 25(6):599–616
Canilho J, Véstias M, Neto H (2016) Multi-core for K-means clustering on FPGA. In: 2016 International conference on field programmable logic and applications (FPL), pp 1–4
Carastan-Santos D, de Camargo RY, Martins DC, Song SW, Rozante LCS, Borelli FF (2015) A multi-GPU hitting set algorithm for GRNs inference. In: 2015 IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid), pp 313–322
Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K (2015) Apache Flink: stream and batch processing in a single engine. Bull IEEE Comput Soc Tech Comm Data Eng 38(4):28–38
CFMDS (CUDA-based fast multidimensional scaling) (2018). http://ml.ssu.ac.kr/CFMDS/CFMDS.html. Last accessed March 2018
Chen GK, Guo Y (2013) Discovering epistasis in large scale genetic association studies by exploiting graphics cards. Front Genet 4:266
Chikkagoudar S, Wang K, Li M (2011) GENIE: a software package for gene-gene interaction analysis in genetic association studies using multiple GPU or CPU cores. BMC Res Notes 4(1):158
Chockalingam SP, Aluru M, Aluru S (2015) Information theory based genome-scale gene networks construction using mapreduce. In: 2015 IEEE international conference on high performance computing (HiPC), pp 464–473
Choi Y-M, So HK-H (2014) Map-reduce processing of K-means algorithm with FPGA-accelerated computer cluster. In: 2014 IEEE international conference on application-specific systems, architectures and processors (ASAP), pp 9–16
Clustering algorithms for massively parallel architectures including GPU nodes (2018). https://simtk.org/projects/campaign. Last accessed March 2018
CUDA-MI (2018). https://sites.google.com/site/liuweiguohome/cuda-mi. Last accessed March 2018
Curk T, Rot G, Zupan B (2011) SNPsyn: detection and exploration of SNP–SNP interactions. Nucleic Acids Res 39(suppl_2):W444–W449
Dagum L, Menon R (1998) OpenMP: an industry standard API for shared-memory programming. IEEE Comput Sci Eng 5(1):46–55
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Dudley JT, Pouliot Y, Chen R, Morgan AA, Butte AJ (2010) Translational bioinformatics in the cloud: an affordable alternative. Genome Med 2(8):51
Edge: R package for identifying differentially expressed genes from genome-wide gene expression profiling studies (2018). https://github.com/StoreyLab/edge. Last accessed March 2018
epiGPU v2.0 (2018). https://github.com/explodecomputer/epiGPU. Last accessed March 2018
EPISNPmpi Homepage (2018). https://animalgene.umn.edu/episnpmpi. Last accessed March 2018
fast-mRMR (2018). https://github.com/sramirez/fast-mRMR. Last accessed March 2018
FastEpistasis Homepage (2018). http://www.vital-it.ch/software/FastEpistasis. Last accessed March 2018
FastGCN for gene co-expression network (2018). https://github.com/DrLiang/FastGCN. Last accessed March 2018
Ferreira R, Vendramini JCG (2010) FPGA-accelerated attractor computation of scale free gene regulatory networks. In: 2010 international conference on field programmable logic and applications (FPL), pp 550–555
Galizia A, D’Agostino D, Clematis A (2015) An MPI–CUDA library for image processing on HPC architectures. J Comput Appl Math 273:414–427
GBOOST Homepage (2018). http://bioinformatics.ust.hk/BOOST.html#GBOOST. Last accessed March 2018
Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. SIGOPS Oper Syst Rev 37(5):29–43
González-Domínguez J, Martín MJ (2017) MPIGeneNet: parallel calculation of gene co-expression networks on multicore clusters. IEEE/ACM Trans Comput Biol Bioinform 15(5):1732–1737
González-Domínguez J, Schmidt B (2015) GPU-accelerated exhaustive search for third-order epistatic interactions in case–control studies. J Comput Sci 8:93–100
González-Domínguez J, Schmidt B, Kässens JC, Wienbrandt L (2014) Hybrid CPU/GPU acceleration of detection of 2-SNP epistatic interactions in GWAS. In: 2014 European conference on parallel processing (Euro-Par), pp 680–691
Gonzalez-Dominguez J, Wienbrandt L, Kassens JC, Ellinghaus D, Schimmler M, Schmidt B (2015) Parallelizing epistasis detection in GWAS on FPGA and GPU-accelerated computing systems. IEEE/ACM Trans Comput Biol Bioinform 12(5):982–994
González-Domínguez J, Ramos S, Touriño J, Schmidt B (2016) Parallel pairwise epistasis detection on heterogeneous computing architectures. IEEE Trans Parallel Distrib Syst 27(8):2329–2340
GPU3SNP: exhaustive search for third order epistatic interactions using CUDA (2018). https://sourceforge.net/projects/gpu3snp/. Last accessed March 2018
Greene CS, Sinnott-Armstrong NA, Himmelstein DS, Park PJ, Moore JH, Harris BT (2010) Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS. Bioinformatics 26(5):694–695
Guo X, Meng Y, Yu N, Pan Y (2014) Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering. BMC Bioinformatics 15(1):102
Guzzi PH, Cannataro M (2010) Parallel Pre-processing of Affymetrix microarray data. In: 2010 European conference on parallel processing, Euro-Par, pp 225–232
Harvey BS, Ji S-Y (2017) Cloud-scale genomic signals processing for robust large-scale cancer genomic microarray data analysis. IEEE J Biomed Health Inform 21(1):238–245
Hemani G, Theocharidis A, Wei W, Haley C (2011) EpiGPU: exhaustive pairwise epistasis scans parallelized on consumer level graphics cards. Bioinformatics 27(11):1462–1465
Hendrix W, Palsetia D, Patwary MdMA, Agrawal A, Liao W-K, Choudhary A (2013) A scalable algorithm for single-linkage hierarchical clustering on distributed-memory architectures. In: 2013 IEEE symposium on large-scale data analysis and visualization (LDAV), pp 7–13
Hu X, Liu Q, Zhang Z, Li Z, Wang S, He L, Shi Y (2010) SHEsisEpi, a GPU-enhanced genome-wide SNP-SNP interaction scanning algorithm, efficiently reveals the risk genetic epistasis in bipolar disorder. Cell Res 20(7):854
Hussain HM, Benkrid K, Seker H, Erdogan AT (2011) FPGA implementation of K-means algorithm for bioinformatics application: an accelerated approach to clustering microarray data. In: 2011 NASA/ESA conference on adaptive hardware and systems (AHS), pp 248–255
Hussain HM, Benkrid K, Seker H (2013) Reconfiguration-based implementation of SVM classifier on FPGA for classifying microarray data. In: 2013 Annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 3058–3061
Ingram J, Zhu M (2011) GPU accelerated microarray data analysis using random matrix theory. In: 2011 IEEE international conference on high performance computing and communications (HPCC), pp 839–844
Irizarry RA, Gautier L, Cope LM et al (2003) An R package for analyses of Affymetrix oligonucleotide arrays. In: The analysis of gene expression data. Springer, New York, pp 102–119
Irrthum A, Wehenkel L, Geurts P et al (2010) Inferring regulatory networks from expression data using tree-based methods. PLoS One 5(9):e12776
Islam AKMT, Jeong B-S, Bari ATMG, Lim C-G, Jeon S-H (2015) MapReduce based parallel gene selection method. Appl Intell 42(2):147–156
Jünger D, Hundt C, Domínguez JG, Schmidt B (2017) Speed and accuracy improvement of higher-order epistasis detection on CUDA-enabled GPUs. Clust Comput 20(3):1899–1908
Kässens JC, Wienbrandt L, González-Domínguez J, Schmidt B, Schimmler M (2015) High-speed exhaustive 3-locus interaction epistasis analysis on FPGAs. J Comput Sci 9:131–136
Katsigiannis S, Zacharia E, Maroulis D (2015) Grow-cut based automatic cDNA microarray image segmentation. IEEE Trans Nanobioscience 14(1):138–145
Katsigiannis S, Zacharia E, Maroulis D (2017) MIGS-GPU: microarray image gridding and segmentation on the GPU. IEEE J Biomed Health Inform 21(3):867–874
Kohlhoff KJ, Sosnick MH, Hsu WT, Pande VS, Altman RB (2011) CAMPAIGN: an open-source library of GPU-accelerated data clustering algorithms. Bioinformatics 27(16):2321–2322
Kornaros G (2010) A soft multi-core architecture for edge detection and data analysis of microarray images. J Syst Archit 56(1):48–62
Kumar M, Rath SK (2015) Classification of microarray using mapreduce based proximal support vector machine classifier. Knowl-Based Syst 89:584–602
Kumar M, Rath NK, Rath SK (2016) Analysis of microarray leukemia data using an efficient mapreduce-based K-nearest-neighbor classifier. J Biomed Inform 60:395–409
Lachmann A, Giorgi FM, Lopez G, Califano A (2016) ARACNe-AP: gene network reverse engineering through adaptive partitioning inference of mutual information. Bioinformatics 32(14):2233–2235
Laide S, McAllister J (2017) Multicore distributed dictionary learning: a microarray gene expression biclustering case study. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1168–1172
Leek JT, Monsen E, Dabney AR, Storey JD (2005) EDGE: extraction and analysis of differential gene expression. Bioinformatics 22(4):507–508
Liang M, Zhang F, Jin G, Zhu J (2015) FastGCN: a GPU accelerated tool for fast gene co-expression networks. PLoS One 10(1):e0116776
Liu B, Yu CW, Wang DZ, Cheung RCC, Yan H (2014) Design exploration of geometric biclustering for microarray data analysis in data mining. IEEE Trans Parallel Distrib Syst 25(10):2540–2550
Ma L, Runesha HB, Dvorkin D, Garbe JR, Da Y (2008) Parallel and serial computing tools for testing single-locus and epistatic SNP effects of quantitative traits in genome-wide association studies. BMC Bioinformatics 9(1):315
Magis AT, Earls JC, Ko Y-H, Eddy JA, Price ND (2011) Graphics processing unit implementations of relative expression analysis algorithms enable dramatic computational speedup. Bioinformatics 27(6):872–873
Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera RD, Califano A (2006) ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 7(1):S7
Mauch V, Kunze M, Hillenbrand M (2013) High performance cloud computing. Futur Gener Comput Syst 29(6):1408–1416
Meeus W, Van Beeck K, Goedemé T, Meel J, Stroobandt D (2012) An overview of today’s high-level synthesis tools. Des Autom Embed Syst 16(3):31–51
Mejía-Roa E, Tabas-Madrid D, Setoain J, García C, Tirado F, Pascual-Montano A (2015) NMF-mGPU: non-negative matrix factorization on multi-GPU systems. BMC Bioinformatics 16(1):43
Message Passing Interface Forum (1994) MPI: a Message Passing Interface standard
Misra S, Pamnany K, Aluru S (2015) Parallel mutual information based construction of genome-scale networks on the Intel® Xeon Phi™Coprocessor. IEEE/ACM Trans Comput Biol Bioinform 12(5):1008–1020
Mitchell L, Sloan TM, Mewissen M, Ghazal P, Forster T, Piotrowski M, Trew AS (2011) A parallel random forest classifier for R. In: 2011 International workshop on emerging computational methods for the life sciences (ECMLS), pp 1–6
Mitchell L, Sloan TM, Mewissen M, Ghazal P, Forster T, Piotrowski M, Trew A (2014) Parallel classification and feature selection in microarray data using SPRINT. Concurr Comput Pract Experience 26(4):854–865
Moore JH, Asselbergs FW, Williams SM (2010) Bioinformatics challenges for genome-wide association studies. Bioinformatics 26(4):445–455
MPIGeneNet: parallel tool to construct gene co-expression networks (2018). https://sourceforge.net/projects/mpigenenet/. Last accessed March 2018
Multifactor dimensionality reduction (2018). https://sourceforge.net/projects/mdr/. Last accessed March 2018
Nickolls J, Buck I, Garland M, Skadron K (2008) Scalable parallel programming with CUDA. In: 35th International conference on computer graphics and interactive techniques (SIGGRAPH’08), pp 16:1–16:14
Niu S, Yang G, Sarma N, Xuan P, Smith MC, Srimani P, Luo F (2014) Combining Hadoop and GPU to preprocess large Affymetrix microarray data. In: 2014 IEEE international conference on big data (Big Data), pp 692–700
NMF-mGPU: non-negative matrix factorization on multi-GPU systems (2018). http://bioinfo-cnb.github.io/bionmf-gpu/. Last accessed March 2018
Orzechowski P, Boryczko K (2015) Rough assessment of GPU capabilities for parallel PCC-based biclustering method applied to microarray data sets. Bio-Algorithms Med-Syst 11(4):243–248
ParallABEL: an R library for generalized parallelization of genome-wide association studies (2018). http://www.sc.psu.ac.th/units/genome/CGBR/ParallABEL/. Last accessed March 2018
Parallel DBSCAN Code Download (2018). http://cucis.ece.northwestern.edu/projects/Clustering/download_code_dbscan.html. Last accessed March 2018
Parallel hierarchical clustering code download (2018). http://cucis.ece.northwestern.edu/projects/Clustering/download_code_pink.html. Last accessed March 2018
Parallel OPTICS code download (2018). http://cucis.ece.northwestern.edu/projects/Clustering/download_code_optics.html. Last accessed March 2018
Parallelized preprocessing methods for affymetrix oligonucleotide array (2018). https://bioconductor.org/packages/release/bioc/html/affyPara.html. Last accessed March 2018
Park S, Shin S-Y, Hwang K-B (2012) CFMDS: CUDA-based fast multidimensional scaling for genome-scale data. BMC Bioinformatics 13(17):S23
Patwary MA, Palsetia D, Agrawal A, Liao W-K, Manne F, Choudhary A (2012) A new scalable parallel DBSCAN algorithm using the disjoint-set data structure. In: 2012 International conference on high performance computing, networking, storage and analysis (SC), p 62
Patwary MA, Palsetia D, Agrawal A, Liao W-K, Manne F, Choudhary A (2013) Scalable parallel OPTICS data clustering using graph algorithmic techniques. In: 2013 International conference for high performance computing, networking, storage and analysis (SC), pp 1–12
Pournara I, Bouganis C-S, Constantinides GA (2005) FPGA-accelerated Bayesian learning for reconstruction of gene regulatory networks. In: 2005 International conference on field programmable logic and applications (FPL), pp 323–328
Ramírez-Gallego S, Lastra I, Martínez-Rego D, Bolón-Canedo V, Benítez JM, Herrera F, Alonso-Betanzos A (2017) Fast-mRMR: fast minimum redundancy maximum relevance algorithm for high-dimensional big data. Int J Intell Syst 32(2):134–152
Ray RB, Kumar M, Rath SK (2016) Fast computing of microarray data using resilient distributed dataset of Apache Spark. In: Recent advances in information and communication technology 2016. Springer, Cham, pp 171–182
Ray RB, Kumar M, Tirkey A, Rath SK (2016) Scalable information gain variant on spark cluster for rapid quantification of microarray. Procedia Comput Sci 93:292–298
Rechkalov T, Zymbler M (2015) Accelerating medoids-based clustering with the Intel many integrated core architecture. In: 2015 International conference on application of information and communication technologies (AICT), pp 413–417
Ruchkys DP, Song SW (2003) A parallel solution to infer genetic network architectures in gene expression analysis. Int J High Perform Comput Appl 17(2):163–172
Sangket U, Mahasirimongkol S, Chantratita W, Tandayya P, Aulchenko YS (2010) ParallABEL: an R Library for generalized parallelization of genome-wide association studies. BMC Bioinformatics 11(1):217
SHEsis main (2018). http://analysis.bio-x.cn/. Last accessed March 2018
Schmidberger M, Vicedo E, Mansmann U (2009) affypara—a bioconductor package for parallelized preprocessing algorithms of Affymetrix microarray data. Bioinf Biol Insights 3:83
Schüpbach T, Xenarios I, Bergmann S, Kapur K (2010) FastEpistasis: a high performance computing solution for quantitative trait epistasis. Bioinformatics 26(11):1468–1469
Shi H, Schmidt B, Liu W, Müller-Wittig W (2011) Parallel mutual information estimation for inferring gene regulatory networks on GPUs. BMC Res Notes 4(1):189
Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop distributed file system. In: IEEE 26th symposium on mass storage systems and technologies (MSST’2010), pp 1–10
Sluga D, Curk T, Zupan B, Lotric U (2014) Heterogeneous computing architecture for fast detection of SNP-SNP interactions. BMC Bioinformatics 15(1):216
Stone JE, Gohara D, Shi G (2010) OpenCL: a parallel programming standard for heterogeneous computing systems. Comput Sci Eng 12(3):66–73
Tamada Y, Imoto S, Araki H, Nagasaki M, Print C, Charnock-Jones DS, Miyano S (2011) Estimating genome-wide gene networks using nonparametric Bayesian network models on massively parallel computers. IEEE/ACM Trans Comput Biol Bioinform 8(3):683–697
The Apache Software Foundation (2006). Apache Hadoop
The real-time systems and Image Analysis Lab (2018). http://rtsimage.di.uoa.gr/. Last accessed March 2018
Top-scoring pair and top-scoring triple on the graphics processing unit (2018). https://www.igb.illinois.edu/labs/price/downloads/. Last accessed March 2018
Upton A, Trelles O, Cornejo-García JA, Perkins JR (2015) High-performance computing to detect epistasis in genome scale data sets. Brief Bioinform 17(3):368–379
Wang S, Pandis I, Johnson D, Emam I, Guitton F, Oehmichen A, Guo Y (2014) Optimising parallel R correlation matrix calculations on gene expression data using mapreduce. BMC Bioinformatics 15(1):351
Wu H-C, Wei X-G, Chan S-C (2017) Novel consensus gene selection criteria for distributed GPU partial least squares-based gene microarray analysis in diffused large B cell lymphoma (DLBCL) and related findings. IEEE/ACM Trans Comput Biol Bioinform 15(6):2039–2052
Yung LS, Yang C, Wan X, Yu W (2011) GBOOST: a GPU-based tool for detecting gene–gene interactions in genome–wide case control studies. Bioinformatics 27(9):1309–1310
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M et al (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: 9th USENIX symposium on networked systems design and implementation (NSDI’12), pp 15–28
Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A et al (2016) Apache Spark: a unified engine for Big Data processing. Commun ACM 59(11):56–65
Zhang C, Li P, Rajendran A, Deng Y, Chen D (2006) Parallelization of multicategory support vector machines (PMC-SVM) for classifying microarray data. BMC Bioinformatics 7(4):S15
Zheng M, Zhuo M, Zhang S, Liu G (2017) Inferring genome-wide gene regulatory networks with GPU or CPU parallel algorithm. In: 2017 International conference on computer network, electronic and automation (ICCNEA), pp 54–58
Zhou Z, Liu G, Su L, Yan L, Han L (2013) CChi: an efficient cloud epistasis test model in human genome wide association studies. In: 2013 International conference on biomedical engineering and informatics (BMEI), pp 787–791
Zhou Z, Liu G, Su L (2016) A new approach to detect epistasis utilizing parallel implementation of ant colony optimization by mapreduce framework. Int J Comput Math 93(3):511–523
Zola J, Aluru M, Sarje A, Aluru S (2010) Parallel information-theory-based construction of genome-wide gene regulatory networks. IEEE Trans Parallel Distrib Syst 21(12):1721–1733
Zoppoli P, Morganella S, Ceccarelli M (2010) TimeDelay-ARACNE: reverse engineering of gene networks from time-course data by an information theoretic approach. BMC Bioinformatics 11(1):154
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
González-Domínguez, J., Expósito, R.R. (2019). HPC Tools to Deal with Microarray Data. In: Bolón-Canedo, V., Alonso-Betanzos, A. (eds) Microarray Bioinformatics. Methods in Molecular Biology, vol 1986. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9442-7_10
Download citation
DOI: https://doi.org/10.1007/978-1-4939-9442-7_10
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-4939-9441-0
Online ISBN: 978-1-4939-9442-7
eBook Packages: Springer Protocols