Machine Learning Algorithm Acceleration Using Hybrid (CPU-MPP) MapReduce Clusters

Herrero-Lopez, Sergio; Williams, John R.

doi:10.1007/978-1-4614-9242-9_5

Sergio Herrero-Lopez³ &
John R. Williams⁴

3101 Accesses

Abstract

The uninterrupted growth of information repositories has progressively led data-intensive applications, such as MapReduce-based systems, to the mainstream. The MapReduce paradigm has frequently proven to be a simple yet flexible and scalable technique to distribute algorithms across thousands of nodes and petabytes of information. Under these circumstances, classic data mining algorithms have been adapted to this model, in order to run in production environments. Unfortunately, the high latency nature of this architecture has relegated the applicability of these algorithms to batch-processing scenarios. In spite of this shortcoming, the emergence of massively threaded shared-memory multiprocessors, such as Graphics Processing Units (GPU), on the commodity computing market has enabled these algorithms to be executed orders of magnitude faster, while keeping the same MapReduce-based model. In this chapter, we propose the integration of massively threaded shared-memory multiprocessors into MapReduce-based clusters, creating a unified heterogeneous architecture that enables executing Map and Reduce operators on thousands of threads across multiple GPU devices and nodes, while maintaining the built-in reliability of the baseline system. For this purpose, we created a programming model that facilitates the collaboration of multiple CPU cores and multiple GPU devices towards the resolution of a data intensive problem. In order to prove the potential of this hybrid system, we take a popular NP-hard supervised learning algorithm, the Support Vector Machine (SVM), and show that a 36 ×−192× speedup can be achieved on large datasets without changing the model or leaving the commodity hardware paradigm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Apache.org: Apache mahout: scalable machine-learning and data-mining library. http://mahout.apache.org/
Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008–004, NVIDIA Corporation (2008)
Google Scholar
Catanzaro, B., Sundaram, N., Keutzer, K.: Fast support vector machine training and classification on graphics processors. In: ICML’08: Proceedings of the 25th International Conference on Machine Learning, Helsinki, pp. 104–111. ACM, New York (2008). doi:http://doi.acm.org/10.1145/1390156.1390170
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation – Volume 7, OSDI’06, Seattle, pp. 205–218. USENIX Association, Berkeley (2006)
Google Scholar
Chang, E.Y., Zhu, K., Wang, H., Bai, H., Li, J., Qiu, Z., Cui, H.: Psvm: parallelizing support vector machines on distributed computers. In: NIPS (2007). Software available at http://code.google.com/p/psvm
Chrysanthakopoulos, G., Singh, S: An asynchronous messaging library for c#. In: Proceedings of the Workshop on Synchronization and Concurrency in Object-Oriented Languages, OOPSLA 2005, San Diego (2005)
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). doi:http://doi.acm.org/10.1145/1327452.1327492
Google Scholar
Duarte, M.F., Hu, Y.H.: Vehicle classification in distributed sensor networks. J. Parallel Distrib. Comput. 64, 826–838 (2004)
Article Google Scholar
Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.H., Qiu, J., Fox, G.: Twister: a runtime for iterative mapreduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC’10, Chicago, pp. 810–818. ACM, New York (2010)
Google Scholar
Ghoting, A., Krishnamurthy, R., Pednault, E., Reinwald, B., Sindhwani, V., Tatikonda, S., Tian, Y., Vaithyanathan, S.: Systemml: declarative machine learning on mapreduce. In: Proceedings of the 2011 IEEE 27th International Conference on Data Engineering, ICDE’11, Hannover, pp. 231–242. IEEE Computer Society, Washington (2011). doi:http://dx.doi.org/10.1109/ICDE.2011.5767930.
Hadoop: hadoop.apache.org/core/
He, B., Fang, W., Luo, Q., Govindaraju, N.K., Wang, T.: Mars: a mapreduce framework on graphics processors. In: PACT’08: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, Toronto, pp. 260–269. ACM, New York (2008). doi:http://doi.acm.org/10.1145/1454115.1454152
Herrero-Lopez, S., Williams, J.R., Sanchez, A.: Parallel multiclass classification using svms on gpus. In: GPGPU’10: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, Pittsburgh, pp. 2–11. ACM, New York (2010). doi:http://doi.acm.org/10.1145/1735688.1735692
Hillis, W.D., Steele, G.L., Jr.: Data parallel algorithms. Commun. ACM 29(12), 1170–1183 (1986). http://doi.acm.org/10.1145/7902.7903
Google Scholar
Kearns, M.: Efficient noise-tolerant learning from statistical queries. J. ACM 45(6), 983–1006 (1998). doi: http://doi.acm.org/10.1145/293347.293351
Google Scholar
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44, 35–40 (2010). doi:http://doi.acm.org/10.1145/1773912.1773922
Google Scholar
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Article Google Scholar
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: Rcv1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)
Google Scholar
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization, pp. 185–208. MIT, Cambridge (1999)
Google Scholar
Rafique, M.M., Rose, B., Butt, A.R., Nikolopoulos, D.S.: Cellmr: a framework for supporting mapreduce on asymmetric cell-based clusters. In: IPDPS’09: Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing, Rome, pp. 1–12. IEEE Computer Society, Washington (2009). doi:http://dx.doi.org/10.1109/IPDPS.2009.5161062
Stuart, J.A., Owens, J.D.: Multi-gpu mapreduce on gpu clusters. In: Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium, IPDPS’11, Anchorage, pp. 1068–1079. IEEE Computer Society, Washington (2011)
Google Scholar
tao Chu, C., Kim, S.K., an Lin, Y., Yu, Y., Bradski, G., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: Proceedings of NIPS, pp. 281–288 (2007)
Google Scholar
Vazquez, F., Ortega, G., Fernandez, J., Garzon, E.: Improving the performance of the sparse matrix vector product with gpus. In: 2010 IEEE 10th International Conference on Computer and Information Technology (CIT), Bradford, pp. 1146–1151 (2010). doi:10.1109/CIT.2010.208
Google Scholar
von Eicken, T., Culler, D.E., Goldstein, S.C., Schauser, K.E.: Active messages: a mechanism for integrated communication and computation. SIGARCH Comput. Archit. News 20, 256–266 (1992)
Article Google Scholar
Wang, J.Y.: Application of support vector machines in bioinformatics. Master’s thesis, National Taiwan University, Taipei, Taiwan (2002)
Google Scholar
Yoo, R.M., Romano, A., Kozyrakis, C.: Phoenix rebirth: scalable mapreduce on a large-scale shared-memory system. In: IISWC’09: Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), Austin, pp. 198–207. IEEE Computer Society, Washington (2009). doi:http://dx.doi.org/10.1109/IISWC.2009.5306783

Download references

Acknowledgements

This work was supported by the Basque Government Researcher Formation Fellowship BFI.08.80.

Author information

Authors and Affiliations

Technologies, Equities and Currency (TEC) Division, SwissQuant Group AG, Kuttelgasse 7, 8001, Zurich, Switzerland
Sergio Herrero-Lopez
Massachusetts Institute of Technology, 77 Massachusetts Avenue, 02139, Cambridge, MA, USA
John R. Williams

Authors

Sergio Herrero-Lopez
View author publications
You can also search for this author in PubMed Google Scholar
John R. Williams
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergio Herrero-Lopez .

Editor information

Editors and Affiliations

IBM Research - Ireland, Mulhuddart, Ireland
Aris Gkoulalas-Divanis
IBM Research - Zurich, Rüschlikon, Switzerland
Abderrahim Labbi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Herrero-Lopez, S., Williams, J.R. (2014). Machine Learning Algorithm Acceleration Using Hybrid (CPU-MPP) MapReduce Clusters. In: Gkoulalas-Divanis, A., Labbi, A. (eds) Large-Scale Data Analytics. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-9242-9_5

Download citation

DOI: https://doi.org/10.1007/978-1-4614-9242-9_5
Published: 28 November 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-9241-2
Online ISBN: 978-1-4614-9242-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics