The Journal of Supercomputing

, Volume 71, Issue 6, pp 2050–2065 | Cite as

O2WebCL: an automatic OpenCL-to-WebCL translator for high performance web computing

Article

Abstract

HTML5 has become very attractive for cross-platform applications on software-side. Likewise, GPU has increased in popularity due to its energy efficiency of parallel execution on hardware-side. JavaScript, which performs dynamic operations of HTML5, is natively slow. To resolve the performance problem, web computing language (WebCL) can be utilized. WebCL operates by adapting open computing language (OpenCL) codes for web execution. Programming of WebCL code can be quite challenging, however, for both OpenCL and web programmers. In this paper, we propose an OpenCL-to-WebCL translator infrastructure, called O2WebCL. O2WebCL consists of a fully automated OpenCL-to-WebCL translator and O2WebCL library. The O2WebCL translator converts OpenCL codes into WebCL codes and O2WebCL application programming interfaces. The O2WebCL library operates as a bridge between the OpenCL and WebCL libraries. We resolved some implementation issues for the bridge such as type conversion and indirect addressing. We evaluated the performance of our work and found that we could achieve, on average, 75 % of the performance of the equivalent OpenCL execution.

Keywords

OpenCL WebCL JavaScript OpenCL to WebCL translation 

1 Introduction

Being able to use an application on any device and at any place is highly sought after by many users. To meet these needs, programmers have simultaneously built several platform-dependent applications. This platform-dependent approach incurs high development and maintenance costs; therefore, researchers have adopted a write once, run anywhere [1, 2, 3, 4, 5] attitude to build cross-platform applications.

In order to resolve software compatibility problems among various platforms such as Windows, Linux, Android, iOS, and so on, HTML5-based [6] web applications have recently been widely used. HTML is a platform independent language that executes in any web browser on any platform. HTML5, in particular, is able to handle multimedia, graphics, and many other rich contents without using any plugins like Adobe Flash, Microsoft ActiveX, or Silverlight. Therefore, HTML5 applications are free of platform dependency. Despite the outstanding advantages of HTML5, the slow performance of JavaScript is a serious limiting factor to wider use of HTML5. JavaScript is in charge of performing dynamic operations of HTML5-based web applications and executes slowly due to its interpretation at runtime. As a result, several studies for accelerating JavaScript [7, 8, 9] have been conducted in order to alleviate the performance problem.

Over the last several years, GPU-based parallel computing has become an alternative solution for enhancing the performance of general-purpose applications. Open Computing Language (OpenCL) [10] is one of the best solutions for this task. OpenCL is a framework that provides task-parallelism, data-parallelism, and supports heterogeneous parallel computing on various hardware platforms from multiple manufacturers such as Intel, AMD, NVIDIA, ARM, and so on. It is not limited to a specific architecture unlike other programming toolkits such as threading building blocks (TBB) [11] and CUDA [12]. With the growth of OpenCL, web computing language (WebCL) [13, 14] was introduced to enable significant acceleration of web applications. WebCL is a JavaScript binding to OpenCL for providing the same capabilities of OpenCL but in a web environment. WebCL implementations can be built into a web browser and used to accelerate computationally intensive web applications like 3D/Full HD image processing, advanced physics, and virtual reality. Research in WebCL is still in the very early stages. Currently, only a few prototypes of WebCL have been publicly distributed by Samsung Electronics [14, 15], Nokia [16], and others [17, 18]. Furthermore, developers must understand both the JavaScript and OpenCL programming models in order to use WebCL.

In this paper, we present a translator infrastructure, named O2WebCL, in order to perform fully automated OpenCL-to-WebCL translation. This allows us to adapt many available OpenCL codes to Web environments without the knowledge of JavaScript programming.

The main contributions of this paper are as follows:
  • We propose a fully automated OpenCL-to-WebCL translator, called O2-WebCL. To the best of our knowledge, our O2WebCL is the first translator infrastructure for providing OpenCL-to-WebCL translation. The O2WebCL significantly improves programmability and code reusability for high performance web computing.

  • We solved various implementation issues for translation such as type conversion and indirect addressing. These issues come from the differentiation of syntax and semantics between JavaScript and OpenCL.

  • We evaluated the detailed performance of both OpenCL and WebCL benchmarks. Our evaluation shows that our O2WebCL achieves, on average, 75 % of the performance of the equivalent OpenCL execution.

This paper is organized as follows: In Sect. 2, we review related work. In Sect. 3, we introduce the overall structure of our O2WebCL translator, and in Sect. 4, we describe the design and implementation issues in detail. In Sect. 5, we analyze the O2WebCLs performance. Finally, we make conclusion in Sect. 6.

2 Related work

2.1 JavaScript acceleration

A considerable amount of research has been conducted on the fast execution of JavaScript. Two compiler approaches such as just-in-time (JIT) compilation and dynamic parallelization are generally employed. JIT accelerates the execution by translating JavaScript code into native code, and then executing the native code directly on a native processor. Many existing web browsers employ the JIT compilation scheme, including Google V8 [19], Firefox SpiderMonkey [20], WebKit [21], and so on. J. Ha and others [22] introduced a concurrent trace-based JIT compiler for JavaScript in order to eliminate the overhead of the dynamic translation by concurrently performing the trace and the translation on separate cores. In the multi-core environment, they improved performance by 6 % on average and up to 34 % over the conventional trace-based JIT. These history-based approaches, however, may not be applicable to the real world [23].

Along with JIT compilation, many others have proposed JavaScript acceleration by dynamic parallelization. Mehrara and others [8] proposed a lightweight speculation scheme for accomplishing dynamic parallelization of JavaScript applications. They developed ParaScript, an automatic runtime parallelization system for JavaScript, which performed 2.18 times faster on average over the Firefox browser. Martinsen and others [24] also dynamically parallelized JavaScript programs using thread level speculation (TLS) techniques. They implemented TLS in SFX, which performed up to 8.3 times faster without modifying any JavaScript code.

In spite of all these efforts, the performance improvements acquired by these approaches are not enough to satisfy the increasing demand for the high performance JavaScript execution in compute-intensive web applications like handling multimedia. Recently, in order to overcome this limitation, WebCL [13] has been introduced to significantly improve the performance by GPUs and multi-core systems. Nokia [16] and Samsung Electronics [14] have also presented their own WebCL prototypes.

2.2 Source-to-source translation for GPGPU computing

GPU has become more and more popular on devices due to its energy efficient execution by exploiting a large amount of data-level parallelism. As a result, the GPU programming models, CUDA [12] and OpenCL have been widely studied; however, these programming models require a deep understanding of the GPU architecture. Hence, various studies have been conducted on source-to-source translation in order to alleviate the problem.

C-to-CUDA [25] automatically transforms sequential C programs into parallel CUDA programs. The performance of automatically generated code is quite close to hand-tuned CUDA code. OpenMPC [26] is a fully automatic OpenMP-to-CUDA compiler and provides an abstraction of the complex CUDA programming model. OpenMPC has the ability to achieve 88 % of the performance of the hand-coded CUDA programs.

2.3 Source-to-source translation for HTML5 application

Solutions already exist to translate code from native to HTML5 applications in order to resolve the software compatibility problem among multi-platforms. Code conversion, however, requires in-depth human knowledge, and high development costs due to differences between programming languages and runtime environments. Automatic source-to-source translation is a useful solution to this problem.

Emscripten [27] is an LLVM to JavaScript compiler. The Emscripten compiler translates C and C++ code into JavaScript code that can be run on the web. For example, Mozilla and Epic teams ported Unreal Engine 3, one of the most popular 3D game engines, to the web in only 4 days using Emscripten [28]. The Intel HTML5 App Porter Tool [29] is a source-to-source translator from Objective-C into JavaScript/HTML5 and also includes the Apple iOS application programming interfaces (APIs).

3 O2WebCL translator infrastructure

3.1 Overall structure

Figure 1 shows the overall structure of the O2WebCL translator infrastructure that consists of the OpenCL-to-WebCL translator and the O2WebCL library. The OpenCL-to-WebCL translator is composed of LLVM-frontend, the Emscripten LLVM-to-JavaScript compiler, and the O2WebCL generator. The OpenCL host program (*.c), excluding the kernel code (*.cl), is translated into LLVM bitcode by the LLVM-frontend. The Emscripten compiler then translates the LLVM bitcode into JavaScript code. Our O2WebCL generator creates index.html that contains the kernel code and the references to both of the translated WebCL host program files and the O2WebCL library. The difference of syntax and semantics between OpenCL functions and WebCL methods are resolved by our O2WebCL library.
Fig. 1

Overall structure of OpenCL to WebCL translator framework

3.2 C-to-JavaScript translation

In our O2WebCL translator, the C-to-JavaScript translation is performed in sequence by LLVM-frontend and the Emscripten LLVM-to-JavaScript compiler. Clang compiler [30], the LLVM frontend for C, C++, and Object-C programming languages, is employed to generate LLVM bitcode from the OpenCL host program code excluding the kernel code. The LLVM bitcode is then consecutively translated into JavaScript code by the Emscripten LLVM-to-JavaScript compiler.

As shown in Fig. 2, most LLVM bitcodes can be simply translated into JavaScript code with one-to-one mapping. Because JavaScript has no data type for variables, parameters, and function returns, variables in LLVM bitcode are declared in JavaScript code by using the var keyword. Dynamic memory allocation like malloc, however, is not available in JavaScript. In this case, the Emscripten compiler was designed to use an emulated stack. For example, dynamically allocated memory resides on an emulated stack with 32-bit wide entries, named HEAP32. Its pointer variable has an index value of the HEAP32 entry (This will be described in Sect. 4.3 in detail). Most bitcode operators, including function calls, are directly translated into JavaScript operators since they are almost completely compatible. The Emscripten compiler is able to construct high-level JavaScript loops and if-statements from low-level branching information of the bitcode. Finally, the compiler generates some glue code written in JavaScript for both the emulated stack and the C system APIs used by the input program.
Fig. 2

An example of LLVM bitcode to JavaScript code translation. a C code, b LLVM bitcode, c JavaScript code

We did not modify the LLVM frontend. However, the code generation portion of Emscripten was modified to provide type information for WebCL. More details will be described in Sect. 4.2.

3.3 O2WebCL generator

The O2WebCL generator combines the translated JavaScript codes from C and kernel codes (*.cl) into WebCL codes (index.html), and the WebCL codes are dynamically linked with the O2WebCL library. The kernel code is copied into the html file (index.html) without any modification, and the copied kernel code is surrounded by client-side script tags, \(<\)script type=“text/x-opencl”\(>\) and \(<\)/script\(>\) as shown in Fig. 1.

The WebCL program accesses the kernel code wrapped with the script tags by using the loadKernel function in the O2WebCL library. The function returns the kernel code to the WebCL program in the form of a string.

4 O2WebCL library implementation

Our O2WebCL library was based on the Nokia WebCL prototype. Some features were added since the prototype methods and the translated OpenCL functions of the O2WebCL translator did not match on a one-to-one basis. In this section, we will explain these issues.

4.1 WebCL library interface

Figure 3a shows a sample of OpenCL host code and Fig. 3b shows the JavaScript code translated from the host code. Figure 3c shows wrapper functions, written in JavaScript, for interfaces between the translated codes and the WebCL libraries. Each wrapper function is defined by using a function keyword without a return type and has the same signature as the corresponding OpenCL function. In addition, each wrapper function has been implemented to perform the same functionality of the OpenCL functions using WebCL methods, especially the Nokia WebCL prototype methods. Because the WebCL methods take the form of JavaScript, which is an object-based language, the following issues should be carefully resolved. First, in each O2WebCL library function, a validity check on all the arguments should be performed. Second, if there is a pointer type argument indicating a specific memory space, code for transmitting data between caller and callee functions through an emulated stack needs to be appended (the emulated stack will be described in Sect. 4.3) Finally, in order to handle the new WebCL objects that are created by the WebCL methods, an object heap, HEAP32, is added.
Fig. 3

A code snippet of O2WebCL library for binding translated JavaScript to WebCL. a OpenCL host code, b translated JavaScript code, c O2WebCL library

Table 1 shows the differences between some OpenCL functions and their corresponding WebCL methods in the Nokia WebCL prototype. For example, the clGetPlatformIDs function is translated into the _clGetPlatformIDs function in JavaScript. The Nokia WebCL prototype, however, does not provide the _clGetPlatformIDs function. Therefore, in order to bind _clGetPlatformIDs with WebCL.getPlatformIDs in the Nokia WebCL prototype, the O2WebCL library provides the _clGetPlatformIDs wrapper function as shown in Fig. 3c.
Table 1

OpenCL and WebCL library comparison (partial)

OpenCL function

WebCL (Nokia prototype)

Class

Method

clGetPlatformIDs

WebCL

getPlatformIDs

clCreateContext

WebCL

createContext

clGetDeviceIDs

WebCLPlatform

getDevices

clBuildProgram

WebCLPlatform

buildProgram

clCreateKernel

WebCLPlatform

createKernel

clCreateProgramWithSource

WebCLContext

createProgramWithSource

clCreateCommandQueue

WebCLContext

createCommandQueue

clCreateBuffer

WebCLContext

createBuffer

clSetKernelArg

WebCLKernel

setKernelArg

clEnqueueReadBuffer

WebCLCommandQueue

enqueueReadBuffer

clEnqueueWriteBuffer

WebCLCommandQueue

enqueueWriteBuffer

clEnqueueNDRangeKernel

WebCLCommandQueue

enqueueNDRangeKernel

clEnqueueTask

WebCLCommandQueue

enqueueTask

clFinish

WebCLCommandQueue

finish

4.2 Type conversion

As mentioned in Sect. 3.2, the Emscripten compiler uses emulated heaps with specific data types like Int32Array to handle data at runtime. However, the arguments of the WebCL methods such as enqueueWriteBuffer and enqueueReadBuffer, for communicating data between host and device memories, use non-type arrays as opposed to emulated heap types.

In order to resolve the difference between data types, we implemented the clEnqueueWriteBuffer function in our O2WebCL library that copies data from a heap to a new JavaScript array without a data type. Although we should know which emulated heap has the data in order to read the data from the emulated heap, we never know where the data exists because there is no type information in function arguments. Therefore, we modified the code generation of the Emscripten compiler to provide type information. For example, the clEnqueueWriteBuffer function has an additional parameter containing the type information of the data. Through this modification, the data from the emulated heap can be easily copied to a non-typed array. As a result, the enqueueWriteBuffer method copies the new JavaScript array to a new OpenCL C array with a specific data type at runtime as shown in Fig. 4a. Finally, the clEnqueueWriteBuffer function of OpenCL library is executed.

The two layer copies obviously seem to be an unnecessary overhead. If the Nokia WebCL prototype can support typed arrays such as Int32Array and Float32Array [31], we can use enqueueWriteBuffer directly. Similarly, the clEnqueueReadBuffer function performs the data copying for type conversion as shown in Fig. 4b. The function copies the array data returned from enqueueReadBuffer to the heap being used by JavaScript code that was generated by the Emscripten compiler.
Fig. 4

Data copying for buffer write and read operations. a Write buffer operation, b read buffer operation

4.3 Indirect addressing of WebCL objects

In the JavaScript code translated by Emscripten, an emulated heap such as HEAP16 and HEAP32 is used for handling pointer type variables and heap memory because of the no-pointer concept in JavaScript. In order to manage WebCL objects in the wrapper functions, the emulated heap should address the WebCL objects in the same manner. However, the emulated heap cannot hold object type values because it is based on typed arrays such as Int32Array. In order to resolve this problem, we used an indirect addressing mechanism. Figure 5 shows how to indirectly address the objects with a pointer type argument. For example, $pPlatform is a pointer type variable that indicates platform information. $pPlatform has only the first address, named HEAP32, of the two entries of the emulated heap.
Fig. 5

An example of indirect addressing of WebCL object

In this case, the getPlatformIDs method returns two WebCLPlatform objects, platforms[0] and platforms[1]. Each entry contains the addresses of the object heap entries for the actual objects. Consequently, we can only access the actual objects from $pPlatform through the indirect addressing mechanism.

4.4 Buffered data transfer for using host pointer

Because the kernel code of a WebCL program can only use data on device memory, the input data to the kernel code should be provided from a host memory to the device memory prior to kernel execution. The clCreateBuffer function shown in Table 1 is an OpenCL function for allocating a data buffer on the device memory. It also can implicitly bring the input data, indicated by a host pointer, from the host memory to the buffer. If the host pointer is employed for the implicit data transfer, there will be no need to fill up the buffer using the command queue of OpenCL. The createBuffer method in the Nokia WebCL prototype corresponds to the clCreateBuffer function, but implicit data transfer is not supported since the method has no parameter for referring to the host pointer. Therefore, the wrapper function of clCreateBuffer is designed to temporarily keep the input data and transfer them to the data buffer on the device memory when the command queue becomes available. This buffered data transfer is performed by using the enqueueWriteBuffer method in the wrapper function of clCreateCommandQueue only when the host pointer is used.

5 Performance analysis

In order to evaluate the O2WebCL performance, we used several OpenCL programs as shown in Table 2. To compare the performance among O2WebCL, OpenCL, C, and JavaScript benchmarks, the OpenCL benchmarks were translated into O2WebCL benchmarks by our O2WebCL program. We also translated the OpenCL benchmarks into C codes manually. Finally, the C benchmarks were translated into JavaScript benchmarks using Emscripten. For JavaScript execution, we enabled JIT as usual. We used a NVIDIA GeForce GTX 550 [32] GPU which has 192 CUDA cores with a clock rate of 900 MHz and 1 GB of GDDR5. The host system consists of an Intel Core i7-2600K CPU @ 3.40 GHz.
Table 2

Benchmarks used for evaluating O2WebCL

Benchmarks

Description

Input data size and characteristics

Matrix transpose

In linear algebra, matrix transpose reflects each matrix element across the diagonal so that rows become columns and columns become rows

\(256\times 4{,}096\) matrix, memory intensive

Parallel reduce

In computer science, a reduce computes the global sum for a given large number of values. Parallel reduce is a parallel version of Reduce using OpenCL

\(1{,}024\times 1{,}024\) integer, compute intensive

Conjugate gradient

In mathematics, a conjugate gradient is one of the iterative methods which makes a series of guesses to approximate \(x\) in the equation \(Ax=b\)

\(612\times 612\) matrix, compute intensive

Steepest descent

In mathematics, a steepest descent is one of the iterative methods of solving sparse matrix systems to find \(x\) in the equation \(Ax=b\)

\(612\times 612\) matrix, compute intensive

RDFT

In mathematics, the discrete Fourier transform (DFT) converts the original domain function to the frequency domain function. Benchmark name is called RDFT because this performs a DFT on real-valued data

8,192 real value, compute intensive

Bitonic sort

In computer science, a Bitonic Sort is a comparison-based parallel algorithm for sorting. It is based on divide-and-conquer algorithms to convert a given random number of array into a bitonic sequence array, monotonically increases or decreases

2,097,152 ushort, memory intensive

For evaluating the performance of JavaScript and O2WebCL, Firefox 15.0 and its WebCL extension were used. We compiled C and OpenCL benchmarks with Microsoft C/C++ Optimizing Compiler Version 15.00.30729.01 for x64 using option -O2. We compiled the translated JavaScript and O2WebCL codes using Emscripten and O2WebCL with option -O1. We could not use option -O2 for JavaScript and O2WebCL codes because option -O2 omits most parts of benchmark source code by dead code elimination.

We divided execution time into several components for clear performance analysis as shown in Fig. 6CL, Execution indicates computation time in OpenCL and O2WebCL including memory transfer times for Host-to-GPU and GPU-to-Host memory transfers. CL Initiation is the preparation time for executing kernels such as getting device information, creating contexts, and loading kernel sources. Execution represents computation time in C and JavaScript corresponding to CL Execution. Finally, Other is the rest of the time such as memory allocation, data initiation, memory deallocation, and so on.
Fig. 6

Terminologies for performance analysis

Figure 7 compares O2WebCL with OpenCL execution. The figure shows that O2WebCL introduces only 34 % overhead on average. A more detailed performance analysis of each benchmark will be described in following subsections.
Fig. 7

Execution time of O2WebCL benchmarks. Labels indicate the normalized elapsed time of O2WebCL with respect to OpenCL

5.1 Matrix transpose

Figure 8a shows the execution performance of Matrix Transpose benchmark. When we compared C to JavaScript, JavaScript is approximately 4 and 5 times slower than C in Execution and Other respectively. This is a result of JavaScript being an interpreting language in the web browser. CL Initiation of O2WebCL was increased slightly compared to that of OpenCL due to calling O2WebCL APIs. Moreover, CL Execution of O2WebCLincurred more overhead than that of OpenCL. CL Execution consists of clEnqueueWriteBuffer time, clEnqueueReadBuffer time and actual GPU calculation time. Because clEnqueueWriteBuffer and clEnqueueReadBuffer in O2WebCL need type conversions such as converting typed to non-typed or non-typed to typed values, they consume additional time as mentioned in Sect. 4.4. As a result, O2WebCL achieved 1.49\(\times \) speedup whereas OpenCL achieves 2.07\(\times \) speedup.

5.2 Parallel reduce

Figure 8b represents the performance of Parallel Reduce benchmark. In general, OpenCL programs create and execute only one kernel. However, Parallel Reduce makes several kernels and executes the kernels many times. Therefore, CL Initiation of O2WebCL was increased slightly compared to that of OpenCL due to creating multiple kernels. Moreover, CL Execution of O2WebCL took much more time than that of OpenCL because of executing multiple kernels. As a result, in Parallel Reduce, O2WebCL achieved a 2.97\(\times \) increase in speed whereas OpenCL achieved a 4.33\(\times \) increase in OpenCL.

5.3 Conjugate gradient, steepest descent and RDFT

Figure 8c–e shows execution performance of Conjugate Gradient, Steepest Descent and RDFT benchmarks.These benchmarks create buffers with the clCreateBuffer function using option CL_MEM_COPY_HOST_PTR. As we mentioned in Sect. 4.4, Nokia WebCL prototype does not support clCreateBuffer using option CL_MEM_COPY_HOST_PTR. Therefore, we provided an alternative method in order to transfer buffered data for using the host pointer. The Host-to-GPU memory transfer belongs to CL Initiation in these benchmarks. Because of this, CL Initiation of these O2WebCL benchmarks was increased. As a result, Conjugate Gradient, Steepest Descent and RDFT achieved speedups of 1.32\(\times \), 1.62\(\times \), and 1.80\(\times \), respectively, with O2WebCL while we did speedups of 1.48\(\times \), 1.69\(\times \), and 2.22\(\times \), respectively, were achieved with OpenCL.
Fig. 8

Performance comparison between C, JavaScript, OpenCL, and O2WebCL execution. All performance values are normalized to the C execution. a Matrix transpose, b parallel reduce, c conjugate gradient, d steepest descent, e real-valued DFT, f bitonic sort

5.4 Bitonic sort

Bitonic Sort benchmark consists of six kernels with many executions at each kernel. The entire input data set is transferred from the host to GPU, and all of the output data set is transferred back to the host before and after GPU execution. Therefore, many type conversions are required. As such, many JavaScript codes are performed for executing multiple kernels many times and converting variable types. Consequently, O2WebCL performed 1.20 times faster, whereas OpenCL performed 2.38 times faster.

6 Conclusion

This paper introduced O2WebCL, consisting of a fully automated OpenCL-to-WebCL translator and O2WebCL library, for high performance web computing. O2WebCL can significantly improve code reusability and maintainability of existing OpenCL codes. OpenCL programmers can easily port their codes to HTML5 applications because the generated codes are encapsulated in *.js files.

In our O2WebCL translator, C-to-JavaScript translation was performed by LLVM-frontend, the Emscripten LLVM-to-JavaScript compiler, and an O2WebCL generator. The generated WebCL codes were dynamically linked with our O2WebCL library. We solved various issues occurring during translation such as type conversion, indirect addressing, data transfer, and so on. Finally, we evaluated the performance of WebCL benchmarks created by our O2WebCL. As a result, O2WebCL was only 1.34 times slower than OpenCL on average. Our results show that O2WebCL can be a powerful solution for porting OpenCL into HTML5 applications for web-based computing.

Notes

Acknowledgments

This work was supported by the Industrial Strategic Technology Development Program (10041664, The Development of Fusion Processor based on Multi-Shader GPU) funded by the Ministry of Trade, Industry and Energy (MI, Korea).

References

  1. 1.
    Worth C, Packard K (2003) Cairo: cross-device rendering for vector graphics. In: Proceedings of the 2003 Linux symposiumGoogle Scholar
  2. 2.
    Pennington H (1999) GTK+/Gnome application development. New Riders, IndianapolisGoogle Scholar
  3. 3.
    Dalheimer M (2002) Programming with QT: writing portable GUI applications on Unix and Win32. O’Reilly Media, Inc., USAGoogle Scholar
  4. 4.
    Neuburg M, Hayes S (1999) Real basic: the definitive guide. O’Reilly & Associates, Inc., USAGoogle Scholar
  5. 5.
    RAD Studio XE5. Available online http://www.embarcadero.com/products/rad-studio/
  6. 6.
    Berjon R, Faulkner S, Leithead T, Navara ED, O’Connor E, Pfeiffer S, Hickson I (2013) HTML5: A vocabulary and associated APIs for HTML and XHTML. W3C Candidate Recommendation World Wide Web Consortium, W3C (February 4, 2014)Google Scholar
  7. 7.
    Martinsen JK, Grahn H (2010) An alternative optimization technique for JavaScript engines. In: Proceedings of the third Swedish workshop on multi-core computing (MCC-10), pp 155–160Google Scholar
  8. 8.
    Mehrara M, Hsu P-C, Samadi M, Mahlke S (2011) Dynamic parallelization of JavaScript applications using an ultra-lightweight speculation mechanism. In: 2011 IEEE 17th international symposium on high performance computer architecture (HPCA). IEEE, pp 87–98Google Scholar
  9. 9.
    Fortuna E, Anderson O, Ceze L, Eggers S (2010) A limit study of JavaScript parallelism. In: 2010 IEEE international symposium on workload characterization (IISWC). IEEE, pp 1–10Google Scholar
  10. 10.
    Khronos OpenCL Working Group (2010) The OpenCL Specification, Version 1.1. Document Revision, 44Google Scholar
  11. 11.
    Pheatt C (2008) Intel threading building blocks. J Comput Sci Coll 23(4):298–298Google Scholar
  12. 12.
  13. 13.
    WebCL-Heterogeneous parallel computing in HTML5 web browsers. Available online http://www.khronos.org/webcl/
  14. 14.
    Jeon W, Brutch T, Gibbs S (2012) WebCL for hardware-accelerated web applications. In: WWW’12 Dev. Lyon, FranceGoogle Scholar
  15. 15.
    Vaughan-Nichols SJ (2008) The mobile web comes of age. IEEE Comput 41(11):15–17CrossRefGoogle Scholar
  16. 16.
    Nokia WebCL Extension for Firefox. Available online http://webcl.nokiaresearch.com/
  17. 17.
    FireFox WebCL Branch. Available online http://hg.mozilla.org/projects/webcl/
  18. 18.
    Node-webcl, an implementation of Khronos WebCL specification using Node.JS. Available online http://github.com/Motorola-Mobility/node-webcl/
  19. 19.
    V8 JavaScript Engine. Available online http://code.google.com/p/v8/
  20. 20.
    SpiderMonkey JavaScript Engine. Available online https://developer.mozilla.org/en-US/docs/SpiderMonkey
  21. 21.
    WebKit: an open source web browser engine. Available online http://www.webkit.org/
  22. 22.
    Ha J, Haghighat MR, Cong S, McKinley KS (2009) A concurrent trace-based just-in-time compiler for single-threaded JavaScript. PESPMA 2009:47Google Scholar
  23. 23.
    Lee S-W, Moon S-M (2011) Selective just-in-time compilation for client-side mobile javascript engine. In: Proceedings of the 14th international conference on compilers, architectures and synthesis for embedded systems. ACM, pp 5–14Google Scholar
  24. 24.
    Martinsen JK, Grahn H (2011) Thread-level speculation as an optimization technique in web applications—initial results. In: 2011 6th IEEE international symposium on industrial embedded systems (SIES). IEEE, pp 83–86Google Scholar
  25. 25.
    Baskaran MM, Ramanujam J, Sadayappan P (2010) Automatic C-to-CUDA code generation for affine programs. In: Compiler construction. Springer, Berlin, pp 244–263Google Scholar
  26. 26.
    Lee S, Eigenmann R (2010) OpenMPC: Extended OpenMP programming and tuning for GPUs. In: Proceedings of the 2010 ACM/IEEE international conference for high performance computing, networking, storage and analysis. IEEE Computer Society, pp 1–11Google Scholar
  27. 27.
    Zakai A (2011) Emscripten: an LLVM-to-JavaScript compiler. In: Proceedings of the ACM international conference companion on object oriented programming systems languages and applications companion. ACM, pp 301–312Google Scholar
  28. 28.
  29. 29.
  30. 30.
    Terry P (1985) CLANG—a simple teaching language. ACM SIGPLAN Not 20(12):54–63CrossRefGoogle Scholar
  31. 31.
    Typed Array Specification. Available online http://www.khronos.org/registry/typedarray/specs/latest/
  32. 32.

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringKorea UniversitySeoulRepublic of Korea
  2. 2.Department of Electronic EngineeringKyungil UniversityGyeongsan-siRepublic of Korea

Personalised recommendations