A CUDA implementation of the Continuous Space Language Model

Thompson, Elizabeth A.; Anderson, Timothy R.

doi:10.1007/s11227-013-1023-7

A CUDA implementation of the Continuous Space Language Model

Published: 12 October 2013

Volume 68, pages 65–86, (2014)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Elizabeth A. Thompson¹ &
Timothy R. Anderson²

276 Accesses
1 Citation
Explore all metrics

Abstract

The training phase of the Continuous Space Language Model (CSLM) was implemented in the NVIDIA hardware/software architecture Compute Unified Device Architecture (CUDA). A detailed explanation of the CSLM algorithm is provided. Implementation was accomplished using a combination of CUBLAS library routines, NVIDIA NPP functions, and CUDA kernel calls on three different CUDA enabled devices of varying compute capability and a time savings over the traditional CPU approach demonstrated. The efficiency of the CUDA version of the open source implementation is analyzed and compared to that using the Intel Math Kernel Libraries (MKL) on a variety of CUDA enabled and multi-core CPU platforms. It is demonstrated that substantial performance benefit can be obtained using CUDA, even with nonoptimal code. Techniques for optimizing performance are then provided. Furthermore, an analysis is performed to determine the conditions in which the performance of CUDA exceeds that of the multi-core MKL realization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performant implementation of the atomic cluster expansion (PACE) and application to copper and silicon

Article Open access 28 June 2021

Portable C++ Code that can Look and Feel Like Fortran Code with Yet Another Kernel Launcher (YAKL)

Article Open access 08 December 2022

FleCSI 2.0: The Flexible Computational Science Infrastructure Project

References

Allada V, Benjegerdes T, Bode B (2009) Performance analysis of memory transfers and GEMM subroutines on NVIDIA Tesla GPU cluster. In: Proceedings of the IEEE international conference on cluster computing and workshops (CLUSTER), New Orleans, LA, Aug 31–Sept 4, 2009
Google Scholar
Franco J, Bernabe G, Fernandez J, Acacio ME (2009) A parallel implementation of the 2D wavelet transform using CUDA. In: Proceedings of the 17th IEEE euromicro international conference on parallel, distributed, and network-based processing (PDP), Weimar, Germany, Feb 18–20, 2009
Google Scholar
Phillips EH, Fatica M (2010) Implementing the Himeno benchmark with CUDA on GPU clusters. In: Proceedings of the 24th IEEE international symposium on parallel and distributed processing (IPDPS), Atlanta, GA, Apr 19–23, 2010
Google Scholar
Du Z, Yin Z, Bader DA (2010) A tile-based parallel Viterbi algorithm for biological sequence alignment on GPU with CUDA. In: Proceedings of the IEEE international symposium on parallel and distributed processing, workshops, and PhD forum (IPDPSW), Atlanta, GA, Apr 19–23, 2010
Google Scholar
Van Der Laan WJ, Jalba AC, Roerdink J (2011) Accelerating wavelet lifting on graphics hardware using CUDA. IEEE Trans Parallel Distrib Syst 22(1):132–146
Article Google Scholar
Han B, Taha TM (2010) Acceleration of spiking neural network based pattern recognition on NVIDIA graphics processors. Appl Opt 49(10):B83–B91
Article Google Scholar
Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Skadron K (2008) A performance study of general-purpose applications on graphics processors using CUDA. J Parallel Distrib Comput 68(10):1370–1380
Article Google Scholar
Komatitsch D, Michea D, Erlebacher G (2009) Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA. J Parallel Distrib Comput 69(5):451–460
Article Google Scholar
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423, 623–656
Article MATH MathSciNet Google Scholar
Katz SM (1987) Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans Acoust Speech Signal Process 35(3):400–401
Article Google Scholar
Schwenk H (2010) Continuous-space language models for statistical machine translation. Prague Bull Math Linguist 93:137–146
Article Google Scholar
Schwenk H (2013) CSLM: Continuous Space Language Model toolkit. LIUM, University of Le Mans, France, 11 Sept (2012). www-lium.univ-lemans.fr/cslm/. Accessed 3 Sept 2013
Schwenk H (2007) Continuous space language models. Comput Speech Lang 21:492–518
Article Google Scholar
Schwenk H, Dechelotte D, Gauvain J-L (2006) Continuous space language models for statistical machine translation. In: Proceedings of the joint conference ACL/Coling, July 2006
Google Scholar
Whaley RC, Petitet A (2013) Automatically Tuned Linear Algebra Software (ATLAS). SourceForge, 10 July (2012). http://math-atlas.sourceforge.net/. Accessed 3 Sept 2013
Thompson EA, Anderson T (2012) Use of CUDA for the continuous space language model. In: Proceedings of the IEEE high performance extreme computing conference (HPEC), Waltham, MA, Sept 10–12, 2012
Google Scholar
Vesely K, Burget L, Grezl F (2010) Parallel training of neural networks for speech recognition. In: Proceedings of the 11th annual conference of the international speech communication association (INTERSPEECH), Mukuhari, Chiba, Japan, Sept 26–30, 2010
Google Scholar
Raina R, Madhavan A, Ng AY (2009) Large-scale unsupervised learning using graphics processors. In: Proceedings of the 26th international conference on machine learning (ICML), Montreal, QC, Canada, June 14–18, 2009
Google Scholar
Lopes N, Ribeiro B, Goncalves J (2012) Restricted Boltzmann machines and deep belief networks on multi-core processors. In: Proceedings of the 2012 annual international joint conference on neural networks (IJCNN), part of the 2012 IEEE world Congress on computational intelligence (WCCI), Brisbane, QLD, Australia, June 10–15, 2012
Google Scholar
NVIDIA Performance Primitives (NPP) version 5.0. 7 Sept 2012. https://developer.nvidia.com/sites/default/files/akamai/cuda/files/CUDADownloads/NPP_Library.pdf. Accessed 3 Sept 2013
OpenCL programming guide for the CUDA architecture, version 2.3. NVIDIA, 27 Aug 2009. http://www.nvidia.com/content/cudazone/download/OpenCL/NVIDIA_OpenCL_ProgrammingGuide.pdf. Accessed 3 Sept 2013
Intel Math Kernel Library 11.0 (2013). http://software.intel.com/en-us/intel-mkl. Accessed 3 Sept 2013
BLAS (basic linear algebra subprograms). Based upon work supported by the National Science Foundation under Grant No. ASC-9313958 and DOE Grant No. DE-FG0-3-94ER25219, 29 June 2013. http://www.netlib.org/blas/. Accessed 4 Sept 2013
Dongarra JJ, Du Croz J, Hammarling S, Duff I (1990) A set of level 3 basic linear algebra subprograms. ACM Trans Math Softw 16(1):1–17
Article MATH Google Scholar
Multicore CPU: how to disable a core. Kioskea, Aug 2013. http://en.kioskea.net/faq/616-multicore-cpu-how-to-disable-a-core. Accessed 3 Sept 2013
Chen X, Eversole A, Li G, Yu D, Seide F (2012) Pipelined back-propagation for context-dependent deep neural networks. In: Proceedings of the 13th annual conference of the international speech communication association (INTERSPEECH), Portland, OR, Sept 9–13, 2012
Google Scholar
Barrachina S, Castillo M, Igual FD, Mayo R, Quintana-Orti ES (2008) Evaluation and tuning of the level 3 CUBLAS for graphics processors. In: Proceedings of the 22nd IEEE international parallel and distributed processing symposium (IPDPS), Miami, FL, Apr 14–18, 2008
Google Scholar

Download references

Acknowledgements

Many thanks to Mike Pressler, IPFW Manager Electronics and Computer Support Services, for his outstanding technical support.

Author information

Authors and Affiliations

Purdue University Fort Wayne, 2101 E. Coliseum Blvd., Fort Wayne, IN, 46805-1499, USA
Elizabeth A. Thompson
Air Force Research Laboratory, 711th Human Performance Wing, Wright Patterson Air Force Base, 2255 H Street, Dayton, OH, 45433-7022, USA
Timothy R. Anderson

Authors

Elizabeth A. Thompson
View author publications
You can also search for this author in PubMed Google Scholar
Timothy R. Anderson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elizabeth A. Thompson.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thompson, E.A., Anderson, T.R. A CUDA implementation of the Continuous Space Language Model. J Supercomput 68, 65–86 (2014). https://doi.org/10.1007/s11227-013-1023-7

Download citation

Published: 12 October 2013
Issue Date: April 2014
DOI: https://doi.org/10.1007/s11227-013-1023-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A CUDA implementation of the Continuous Space Language Model

Abstract

Access this article

Similar content being viewed by others

Performant implementation of the atomic cluster expansion (PACE) and application to copper and silicon

Portable C++ Code that can Look and Feel Like Fortran Code with Yet Another Kernel Launcher (YAKL)

FleCSI 2.0: The Flexible Computational Science Infrastructure Project

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A CUDA implementation of the Continuous Space Language Model

Abstract

Access this article

Similar content being viewed by others

Performant implementation of the atomic cluster expansion (PACE) and application to copper and silicon

Portable C++ Code that can Look and Feel Like Fortran Code with Yet Another Kernel Launcher (YAKL)

FleCSI 2.0: The Flexible Computational Science Infrastructure Project

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation