7.1 Introduction

The IEEE standard SystemC language [13] is widely used for the specification, modeling, validation, and evaluation of electronic system level (ESL) models. The Accellera Systems Initiative maintains not only the official SystemC language definition, but also provides an open source proof-of-concept library that can be used to simulate SystemC design models [1]. However, implementing the classic scheme of discrete event simulation (DES), this reference simulator runs sequentially and cannot utilize the parallel computing resources available on multi- and many-core processor hosts. This severely limits the execution speed of SystemC simulation.

In order to provide faster execution, parallel discrete event simulation (PDES) [8, 12] techniques can be applied. While significant obstacles exist specifically for the SystemC language [7], many parallel simulation approaches have been proposed [5, 11, 19, 21,22,23,24]. Beyond these synchronous PDES techniques, out-of-order PDES [6] is even more aggressive. By localizing the simulation time to individual threads and carefully handling events at different times, the simulator engine can issue threads in parallel and ahead of time, following a partial ordering without loss of accuracy. This results in better exploitation of the available parallelism and thus maximum simulation speed.

The Recoding Infrastructure for SystemC (RISC) project described in this paper implements out-of-order PDES for the IEEE SystemC language as open source. Specifically, RISC provides a dedicated SystemC compiler and corresponding out-of-order parallel simulator [2, 8, 16]. Compared to the other approaches, RISC automatically analyzes the SystemC source code, identifies all potential race conditions, and then instruments the model to prevent any conflicts. This transformation does not require any manual recoding or application-specific knowledge.

We share our RISC proof-of-concept implementation with the EDA community as an open source software project in order to facilitate evaluation, promote parallel SystemC simulation, and achieve fruitful collaboration [3, 4].

7.2 RISC Framework

While the RISC software framework may be used for many other analysis and transformation tasks on SystemC models, parallel simulation is the main purpose. To perform semantics-compliant parallel simulation with out-of-order scheduling, we introduce a dedicated SystemC compiler that works hand in hand with a new simulator. This is in contrast to the traditional SystemC simulation flow where a SystemC-agnostic C+ + compiler includes the SystemC headers and links the design model directly against the Accellera reference library.

As shown in Fig. 7.1, the RISC compiler acts as a frontend that processes the input model and generates an intermediate model with special instrumentation for conflict-free parallel execution. The instrumented model is then linked against the extended RISC SystemC library by the target compiler (a regular C+ + compiler, such as GNU gcc or Intel icpc) in order to produce the output executable model. Out-of-order parallel simulation is then performed simply by running the generated executable model.

Fig. 7.1
figure 1

RISC tool flow for out-of-order parallel simulation of SystemC models [16]

From the user perspective, we simply replace the regular C+ + compiler with the SystemC-aware RISC compiler (which in turn calls the underlying C+ + compiler). Otherwise, the overall SystemC validation flow remains the same as the traditional tool flow. Simulation is just faster due to the parallel execution. Note also that this process is fully automated. No user interaction or manual code transformation is necessary.

7.2.1 RISC Compiler

In order to produce a safe parallel model, the RISC compiler performs three major tasks, namely segment graph construction, conflict analysis, and finally source code instrumentation. Segment Graph Construction

A segment graph (SG) [6] is a directed graph that represents the source code segments executed during the simulation between scheduling steps. More specifically, every segment is associated with a corresponding scheduler entry point, namely a wait statement in SystemC. All other statements in the SystemC source code become part of those segment nodes where they are executed when the wait statement resumes its execution.

The segment graph construction is a fully automatic but complex process which we will not describe here (see [6] for detailed coverage). However, the RISC compiler must parse the SystemC input model first into an Abstract Syntax Tree (AST). Since SystemC is a syntactically regular C+ + code, RISC relies here on the ROSE compiler infrastructure [18]. The ROSE internal representation (IR) provides RISC with a powerful C/C+ + compiler foundation that supports AST generation, traversal, analysis, and transformation.

As illustrated with the RISC software stack shown in Fig. 7.2, the RISC compiler then builds a SystemC IR on top of the ROSE IR which accurately reflects the SystemC structures, including the module and channel hierarchy, port connectivity, and other SystemC-specific constructs. On top of the SystemC IR, the compiler architecture then builds the Segment Graph generator and data structures, as well as all other RISC analysis and transformation functions.

Fig. 7.2
figure 2

Software stack of the RISC compiler [8] Conflict Analysis

The segment graph data structure serves as the foundation for segment conflict analysis. At run time, the scheduler in the simulator must ensure that every parallel thread to be issued has no conflicts with any other threads currently in the READY  and RUN queues. For this we use the RISC compiler to detect any possible conflicts between these threads already at compile time.

Potential conflicts in SystemC include data hazards, event hazards, and timing hazards, all of which may exist among the segments executed by the threads considered for parallel execution. Again, we refer to [6] for a detailed discussion of these hazards and their static or dynamic detection in RISC. However, we note that if the hazards would be ignored, this would lead to race conditions at run time and jeopardize the correctness of the SystemC simulation. Source Code Instrumentation

As a result of the conflict analysis, the RISC compiler generates a set of conflict and timing tables that describe all possible hazards between any two threads. Using this conservative conflict information, the simulator can then at run time quickly determine by a simple table look-up whether or not it is safe to issue a given thread in parallel or ahead of time.

As shown above in Fig. 7.1, the RISC compiler and simulator work closely together. The compiler performs conservative conflict analysis and passes the analysis results to the simulator which then can make safe scheduling decisions quickly.

To pass information from the compiler to the simulator, we use automatic source code instrumentation. That is, the intermediate model generated by the compiler contains instrumented (automatically generated) code which the simulator can then safely rely on.

At the same time, the RISC compiler also instruments the SystemC wait statements with corresponding segment ID and furnishes user-defined channels with automatic protection against race conditions among communicating threads.

7.2.2 RISC Simulator

The RISC simulator supports out-of-order discrete event simulation (OoO PDES) [6] for fast SystemC simulation. In OoO PDES, we break the strict order of time (the synchronous barrier) by localizing time stamps to each thread. Since each thread has its own time stamp, the OoO PDES scheduler relaxes the event and simulation time updates, allowing more threads (at different simulation cycles) to run in parallel and ahead of time. This results in a higher degree of parallelism and thus higher simulation speed. We are using advanced static compile-time analysis to identify all such potential conflicts. Based on this information (a simple table look-up is sufficient), the OoO PDES scheduler can then at run time quickly decide whether or not a set of threads has any conflicts with each other.

7.2.3 RISC Analysis and Transformation Tools

As an example of other SystemC analysis tools built on top of RISC, visual [17] enables the user to visualize the SystemC module hierarchy. It supports a graphical user interface implemented with the Gtk API and renders a specified SystemC source file’s module hierarchy, which is drawn using the Cairo API. The tool obtains module data from the SystemC IR in the RISC software stack which contains information about nested modules and thus can recursively iterate through nested lists of child modules in order to obtain enough information to visualize the hierarchy of the entire SystemC source file. The input SystemC source file may contain thousands of lines of code which can make manually drawing a representation of the modules, ports, and channels described by the code a difficult and time-consuming task. Thus the visual tool was created to address this issue. It can automatically generate a visual representation of a SystemC model in a very short period of time. Figure 7.3 shows the module visualization of a Canny edge detector application.

Fig. 7.3
figure 3

Module hierarchy visualization of a SystemC model of a Canny edge detector [17]

7.3 Experiments

We will now evaluate the performance of the RISC simulator. The following experiments show the speedup on an Intel Xeon PhiTM Coprocessor 5110P many-core architecture. The coprocessor contains 60 cores where each core has a vectorization unit of 512 bit. To obtain unambiguous measurements, we turn CPU frequency scaling off for all experiments.

7.3.1 Mandelbrot Renderer

The Mandelbrot renderer is a parallel video application to compute the Mandelbrot set. Basically, the device under test (DUT) hosts a number of renderer units. Each unit computes a different slice of the Mandelbrot image. At compile time, the user defines how many slices are available.

Figure 7.4 shows the simulation results [20]. Due to the minimal communication needs in this application, highest speedups are reached. The vectorization unit with 512 bit can execute up to eight double-precision floating-point operations in parallel. A speedup M of 6.9x is achieved. The thread-level parallelization increases strongly on the 60 cores with a speedup N of 50x. Afterwards, the speed slows down due to the 60 physical cores and use of hyper-threads. Notably, the combination of the thread and data level parallelization N × M generates a speedup of up to 212x.

Fig. 7.4
figure 4

Speedup of the Mandelbrot Renderer [20]

7.4 RISC Open Source Project

We make the Recoding Infrastructure for SystemC (RISC) described in this article freely available online as a software artifact [9]. Generally, an artifact is a software program together with an applicable data set and test suite that accompanies a research publication for the purpose of independent evaluation.Footnote 1 The point here is that the proposed algorithms and data structures are made available as proof-of-concept implementation and can be used and evaluated by others. Experimental results may be replicated and validated. The proposed approach can also be compared against related work and in the presence of source code even be extended. Otherwise, great challenges are posed in repeatability [15].

Specifically, the presented RISC compiler and simulator are available as open source on the web [2] and can be used without restrictions (BSD license terms). RISC can be downloaded in both source code and binary format.

7.4.1 Open Source Code and Documentation

In its current version [4], the RISC open source package consists of approximately 162,000 lines of code and includes the C+ + source code for the RISC compiler and simulator, Linux build scripts and installation instructions, as well as comprehensive documentation of the compiler and simulator APIs and tool manual pages. Example SystemC models, such as an abstract DVD player and the Mandelbrot renderer, and a regression test bench are included as well.

Given a suitable Linux platform,Footnote 2 the RISC source code package can be easily installed and then tested. After downloading and adjusting the installation Makefile, a simple make all command builds and installs the RISC framework and runs several demo examples. The user can then fully evaluate the software with other SystemC examples and even extend our proof-of-concept implementation with new features.

7.4.2 Binary Image for “Plug-and-Play” Evaluation

For a quick test run without compilation and installation, we also provide a Docker container [3] for using RISC in “plug-and-play” fashion. The Docker image contains RISC (and all needed libraries) in binary format and allows the user to test it with just a few Linux commands, as shown in Fig. 7.5.

Fig. 7.5
figure 5

Linux commands to use RISC in a Docker container

7.5 Conclusion

The Recoding Infrastructure for SystemC (RISC) provides an automatic compiler-based framework to analyze and simulate IEEE SystemC models in parallel. In particular, we have introduced the RISC compiler and simulator. Using automatic conflict analysis based on segment graph (SG) abstraction, OoO PDES can execute threads safely in parallel and out-of-order (ahead of time) and thus achieves fastest simulation speed but nevertheless maintains the classic SystemC modeling semantics. In order to foster collaboration in the EDA community, we provide the RISC framework as a free open source artifact for full evaluation and possible extension.

For the future, we intend to expand our open source efforts and hope to involve other members of the EDA community to use, evaluate, and extend the RISC framework.