Statistical Methods of Natural Language Processing on GPU

Banasiak, Dariusz

doi:10.1007/978-3-319-23437-3_51

Dariusz Banasiak⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 391))

977 Accesses

Abstract

The following work investigates the subject of using GPGPU technology for natural language processing. Natural language processing involves analysing very large volumes of data based on sophisticated algorithms. This process can only be performed on computers with significant computing power. Parallel computing and utilisation of the processing capacity of graphics cards can help achieve the above requirements. The work presents the problem of building n-gram models of natural language based on specific text. Two algorithms were developed: a sequential one for a typical CPU and a parallel one, which uses the capacity of a GPU. The GPU algorithm was prepared using Nvidia CUDA technology. Experiments were carried out in order to compare the effectiveness of the developed algorithms depending on the size of the analysed text and the number of words in the n-grams. The results showed that a parallel type algorithm is better for a GPU environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Parallel CYK Membership Test on GPUs

Extended N-gram Model for Analysis of Polish Texts

N-Gram Collection from a Large-Scale Corpus of Polish Internet

References

Ghorpade, J., Parande, J., Kulkarni, M., Bawaskar, A.: GPGPU processing in CUDA architecture. Adv. Comput.: Int. J. 3(1), 105–120 (2012)
Google Scholar
Gupta, S., Rajasekhara, M.B.: Performance analysis of GPU compared to single-core and multi-core CPU for natural language applications. Int. J. Adv. Comput. Sci. Appl. 2(5), 50–53 (2011)
Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing. Pearson Prentice Hall, New Jersey (2008)
Google Scholar
Nagao, M., Mori, S.: A new method of N-gram statistics for large number of n and automatic extraction of words and phrases from large text data of Japanese. In: COLING 1994. vol. 1, pp. 611–615. Kyoto, Japan (1994)
Google Scholar
NVidia: CUDA C Programming Guide ver. 5.0 (2012)
Google Scholar
NVidia: CUFFT Library User Guide ver. 5.0 (2012)
Google Scholar
Shiwon, C., Dong-Wook, L.: High-performance Korean morphological analyzer using the mapreduce framework on the GPU. J. Electr. Eng. Technol. 6(4), 573–579 (2011)
Article Google Scholar
Xiwu, G., Ruixuan, L., Kunmei, W., Bei, P., Weijun, X.: A GPU-based accelerator for Chinese word segmentation. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds.) Web Technologies and Applications. LNCS, vol. 7235, pp. 231–242. Springer, Berlin (2012)
Chapter Google Scholar
Youngmin, Y., Chao-Yue, L., Slav, P., Keutzer, K.: Efficient parallel CKY parsing on GPUs. In: IWPT 2011. pp. 175–185. Dublin, Ireland (2011)
Google Scholar

Download references

Acknowledgments

This work was financed by Ministry of Science and Higher Education in Poland (research project no. N N516 499139).

Author information

Authors and Affiliations

Department of Computer Engineering, Wroclaw University of Technology, Wroclaw, Poland
Dariusz Banasiak

Authors

Dariusz Banasiak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dariusz Banasiak .

Editor information

Editors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Aleksandra Gruca
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Agnieszka Brachman
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Stanisław Kozielski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Tadeusz Czachórski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Banasiak, D. (2016). Statistical Methods of Natural Language Processing on GPU. In: Gruca, A., Brachman, A., Kozielski, S., Czachórski, T. (eds) Man–Machine Interactions 4. Advances in Intelligent Systems and Computing, vol 391. Springer, Cham. https://doi.org/10.1007/978-3-319-23437-3_51

Download citation

DOI: https://doi.org/10.1007/978-3-319-23437-3_51
Published: 09 September 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23436-6
Online ISBN: 978-3-319-23437-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Statistical Methods of Natural Language Processing on GPU

Abstract

Access this chapter

Similar content being viewed by others

Parallel CYK Membership Test on GPUs

Extended N-gram Model for Analysis of Polish Texts

N-Gram Collection from a Large-Scale Corpus of Polish Internet

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Statistical Methods of Natural Language Processing on GPU

Abstract

Access this chapter

Similar content being viewed by others

Parallel CYK Membership Test on GPUs

Extended N-gram Model for Analysis of Polish Texts

N-Gram Collection from a Large-Scale Corpus of Polish Internet

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation