Parallel CYK Membership Test on GPUs

  • Kyoung-Hwan Kim
  • Sang-Min Choi
  • Hyein Lee
  • Ka Lok Man
  • Yo-Sub Han
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8707)


Nowadays general-purpose computing on graphics processing units (GPGPUs) performs computations what were formerly handled by the CPU using hundreds of cores on GPUs. It often improves the performance of sequential computation when the running program is well-structured and formulated for massive threading. The CYK algorithm is a well-known algorithm for the context-free language membership test and has been used in many applications including grammar inferences, compilers and natural language processing. We revisit the CYK algorithm and its structural properties suitable for parallelization. Based on the discovered properties, we then parallelize the algorithm using different combinations of memory types and data allocation schemes using a GPU. We evaluate the algorithm based on real-world data and herein demonstrate the performance improvement compared with CPU-based computations.


Parallel Computing Context-Free Language Membership Test CYK Algorithm GPU Programming CUDA 


  1. 1.
    Aho, A.V., Ullman, J.D.: The theory of parsing, translation, and compiling (1972)Google Scholar
  2. 2.
    Bodenstab, N., Dunlop, A., Hall, K., Roark, B.: Beam-width prediction for efficient context-free parsing. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 440–449 (2011)Google Scholar
  3. 3.
    Bordim, J.L., Ito, Y., Nakano, K.: Accelerating the CKY parsing using fPGAs. In: Sahni, S.K., Prasanna, V.K., Shukla, U. (eds.) HiPC 2002. LNCS, vol. 2552, pp. 41–51. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  4. 4.
    Cai, L., Malmberg, R.L., Wu, Y.: Stochastic modeling of RNA pseudoknotted structures: a grammatical approach. Bioinformatics, 66–73 (2003)Google Scholar
  5. 5.
    Chomsky, N.: On certain formal properties of grammars. Information and Control, 137–167 (1959)Google Scholar
  6. 6.
    Cocke, J.: Programming languages and their compilers: Preliminary notes (1969)Google Scholar
  7. 7.
    D’Agostino, D., Clematis, A., Decherchi, S., Rocchia, W., Milanesi, L., Merelli, I.: Cuda accelerated molecular surface generation. Concurrency and Computation: Practice and Experience 26(10), 1819–1831 (2014)CrossRefGoogle Scholar
  8. 8.
    Dunlop, A., Bodenstab, N., Roark, B.: Efficient matrix-encoded grammars and low latency parallelization strategies for CYK. In: Proceedings of the 12th International Conference on Parsing Technologies, pp. 163–174 (2011)Google Scholar
  9. 9.
    Foster, J.: “cba to check the spelling” investigating parser performance on discussion forum posts. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 381–384 (2010)Google Scholar
  10. 10.
    Hopcroft, J.E., Ullman, J.D.: Introduction to Automata Theory, Languages, and Computation (1979)Google Scholar
  11. 11.
    Johnson, M.: Parsing in parallel on multiple cores and GPUs. In: Proceedings of the Australasian Language Technology Association Workshop 2011, pp. 29–37 (2011)Google Scholar
  12. 12.
    Kasami, T.: An efficient recognition and syntax analysis algorithm for context-free languages. Technical report, Air Force Cambridge Research Laboratory (1965)Google Scholar
  13. 13.
    Khronos OpenCL Working Group. The OpenCL Specification, version 1.0.29 (2008),
  14. 14.
    Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)Google Scholar
  15. 15.
    Nvidia Corporation. NVIDIA’s Next Generation CUDA Compute Architecture: Fermi. Technical report, Nvidia Corporation (2009)Google Scholar
  16. 16.
    Petrov, S., Barrett, L., Thibaux, R., Klein, D.: Learning accurate, compact, and interpretable tree annotation. In: Proceedings of the 21st International Conference on Computational Linguistics, pp. 433–440 (2006)Google Scholar
  17. 17.
    Sakakibara, Y.: Learning context-free grammars using tabular representations. Pattern Recognition 38(9), 1372–1383 (2005)CrossRefzbMATHGoogle Scholar
  18. 18.
    Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU Programming, 1st edn. Addison-Wesley Professional (2010)Google Scholar
  19. 19.
    Takashi, N., Kentaro, T., Taura, K., Tsujii, J.: A parallel CKY parsing algorithm on large-scale distributed-memory parallel machines. In: Proceedings of the 5th Pacific Association For Computational Lingustics, pp. 223–231 (1997)Google Scholar
  20. 20.
    Weese, J., Ganitkevitch, J., Callison-Burch, C., Post, M., Lopez, A.: Joshua 3.0: syntax-based machine translation with the thrax grammar extractor. In: Proceedings of the 6th Workshop on Statistical Machine Translation, pp. 478–484 (2011)Google Scholar
  21. 21.
    Yi, Y., Lai, C.-Y., Petrov, S.: Efficient parallel CKY parsing using GPUs. Journal of Logic and Computation 24(2), 375–393 (2014)CrossRefMathSciNetGoogle Scholar
  22. 22.
    Younger, D.H.: Recognition and parsing of context-free languages in time n 3. Information and Control 10, 189–208 (1967)CrossRefzbMATHGoogle Scholar
  23. 23.
    Vu, V., Cats, G., Wolters, L.: Graphics processing unit optimizations for the dynamics of the HIRLAM weather forecast model. Concurrency and Computation: Practice and Experience 25(10), 1376–1393 (2013)CrossRefGoogle Scholar

Copyright information

© IFIP International Federation for Information Processing 2014

Authors and Affiliations

  • Kyoung-Hwan Kim
    • 1
  • Sang-Min Choi
    • 1
  • Hyein Lee
    • 1
  • Ka Lok Man
    • 2
  • Yo-Sub Han
    • 1
  1. 1.Department of Computer ScienceYonsei UniversitySeoulRepublic of Korea
  2. 2.Department of Computer Science and Software EngineeringXian Jiaotong-Liverpool UniversitySuzhouChina

Personalised recommendations