Pushing the Limits of Parallel Discrete Event Simulation for SystemC

The IEEE SystemC language is widely used in industry and academia to model and simulate system-level designs. Despite the availability of multi- and many-core host processors, however, the Accellera reference simulator is still based on sequential discrete event simulation, utilizing only a single core at any time. While many advanced parallel simulation approaches have been proposed, most require modification of the SystemC source code so that the model is free from parallel access conflicts and rely on the designer to manually perform this difficult transformation.

tions, and then instruments the model to prevent any conflicts. This transformation does not require any manual recoding or application-specific knowledge.
We share our RISC proof-of-concept implementation with the EDA community as an open source software project in order to facilitate evaluation, promote parallel SystemC simulation, and achieve fruitful collaboration [3,4].

RISC Framework
While the RISC software framework may be used for many other analysis and transformation tasks on SystemC models, parallel simulation is the main purpose. To perform semantics-compliant parallel simulation with out-of-order scheduling, we introduce a dedicated SystemC compiler that works hand in hand with a new simulator. This is in contrast to the traditional SystemC simulation flow where a SystemC-agnostic C++ compiler includes the SystemC headers and links the design model directly against the Accellera reference library.
As shown in Fig. 7.1, the RISC compiler acts as a frontend that processes the input model and generates an intermediate model with special instrumentation for conflict-free parallel execution. The instrumented model is then linked against the extended RISC SystemC library by the target compiler (a regular C++ compiler, such as GNU gcc or Intel icpc) in order to produce the output executable model. Out-of-order parallel simulation is then performed simply by running the generated executable model.
From the user perspective, we simply replace the regular C++ compiler with the SystemC-aware RISC compiler (which in turn calls the underlying C++ compiler). Otherwise, the overall SystemC validation flow remains the same as the traditional tool flow. Simulation is just faster due to the parallel execution. Note also that this process is fully automated. No user interaction or manual code transformation is necessary.

RISC Compiler
In order to produce a safe parallel model, the RISC compiler performs three major tasks, namely segment graph construction, conflict analysis, and finally source code instrumentation.

Segment Graph Construction
A segment graph (SG) [6] is a directed graph that represents the source code segments executed during the simulation between scheduling steps. More specifically, every segment is associated with a corresponding scheduler entry point, namely a wait statement in SystemC. All other statements in the SystemC source code become part of those segment nodes where they are executed when the wait statement resumes its execution. The segment graph construction is a fully automatic but complex process which we will not describe here (see [6] for detailed coverage). However, the RISC compiler must parse the SystemC input model first into an Abstract Syntax Tree (AST). Since SystemC is a syntactically regular C++ code, RISC relies here on the ROSE compiler infrastructure [18]. The ROSE internal representation (IR) provides RISC with a powerful C/C++ compiler foundation that supports AST generation, traversal, analysis, and transformation.
As illustrated with the RISC software stack shown in Fig. 7.2, the RISC compiler then builds a SystemC IR on top of the ROSE IR which accurately reflects the SystemC structures, including the module and channel hierarchy, port connectivity, and other SystemC-specific constructs. On top of the SystemC IR, the compiler architecture then builds the Segment Graph generator and data structures, as well as all other RISC analysis and transformation functions.

Conflict Analysis
The segment graph data structure serves as the foundation for segment conflict analysis. At run time, the scheduler in the simulator must ensure that every parallel thread to be issued has no conflicts with any other threads currently in the READY and RU N queues. For this we use the RISC compiler to detect any possible conflicts between these threads already at compile time.
Potential conflicts in SystemC include data hazards, event hazards, and timing hazards, all of which may exist among the segments executed by the threads considered for parallel execution. Again, we refer to [6] for a detailed discussion of these hazards and their static or dynamic detection in RISC. However, we note that if the hazards would be ignored, this would lead to race conditions at run time and jeopardize the correctness of the SystemC simulation.

Source Code Instrumentation
As a result of the conflict analysis, the RISC compiler generates a set of conflict and timing tables that describe all possible hazards between any two threads. Using this conservative conflict information, the simulator can then at run time quickly determine by a simple table look-up whether or not it is safe to issue a given thread in parallel or ahead of time.
As shown above in Fig. 7.1, the RISC compiler and simulator work closely together. The compiler performs conservative conflict analysis and passes the analysis results to the simulator which then can make safe scheduling decisions quickly.
To pass information from the compiler to the simulator, we use automatic source code instrumentation. That is, the intermediate model generated by the compiler contains instrumented (automatically generated) code which the simulator can then safely rely on.
At the same time, the RISC compiler also instruments the SystemC wait statements with corresponding segment ID and furnishes user-defined channels with automatic protection against race conditions among communicating threads.

RISC Simulator
The RISC simulator supports out-of-order discrete event simulation (OoO PDES) [6] for fast SystemC simulation. In OoO PDES, we break the strict order of time (the synchronous barrier) by localizing time stamps to each thread. Since each thread has its own time stamp, the OoO PDES scheduler relaxes the event and simulation time updates, allowing more threads (at different simulation cycles) to run in parallel and ahead of time. This results in a higher degree of parallelism and thus higher simulation speed. We are using advanced static compile-time analysis to identify all such potential conflicts. Based on this information (a simple table look-up is sufficient), the OoO PDES scheduler can then at run time quickly decide whether or not a set of threads has any conflicts with each other.

RISC Analysis and Transformation Tools
As an example of other SystemC analysis tools built on top of RISC, visual [17] enables the user to visualize the SystemC module hierarchy. It supports a graphical user interface implemented with the Gtk API and renders a specified SystemC source file's module hierarchy, which is drawn using the Cairo API. The tool obtains module data from the SystemC IR in the RISC software stack which contains information about nested modules and thus can recursively iterate through nested lists of child modules in order to obtain enough information to visualize the hierarchy of the entire SystemC source file. The input SystemC source file may contain thousands of lines of code which can make manually drawing a representation of the modules, ports, and channels described by the code a difficult and time-consuming task. Thus the visual tool was created to address this issue. It can automatically generate a visual representation of a SystemC model in a very short period of time. Figure 7.3 shows the module visualization of a Canny edge detector application.

Experiments
We will now evaluate the performance of the RISC simulator. The following experiments show the speedup on an Intel Xeon Phi TM Coprocessor 5110P manycore architecture. The coprocessor contains 60 cores where each core has a vectorization unit of 512 bit. To obtain unambiguous measurements, we turn CPU frequency scaling off for all experiments.

Mandelbrot Renderer
The Mandelbrot renderer is a parallel video application to compute the Mandelbrot set. Basically, the device under test (DUT) hosts a number of renderer units. Each  [20] unit computes a different slice of the Mandelbrot image. At compile time, the user defines how many slices are available. Figure 7.4 shows the simulation results [20]. Due to the minimal communication needs in this application, highest speedups are reached. The vectorization unit with 512 bit can execute up to eight double-precision floating-point operations in parallel. A speedup M of 6.9x is achieved. The thread-level parallelization increases strongly on the 60 cores with a speedup N of 50x. Afterwards, the speed slows down due to the 60 physical cores and use of hyper-threads. Notably, the combination of the thread and data level parallelization N × M generates a speedup of up to 212x.

RISC Open Source Project
We make the Recoding Infrastructure for SystemC (RISC) described in this article freely available online as a software artifact [9]. Generally, an artifact is a software program together with an applicable data set and test suite that accompanies a research publication for the purpose of independent evaluation. 1 The point here is that the proposed algorithms and data structures are made available as proof-ofconcept implementation and can be used and evaluated by others. Experimental results may be replicated and validated. The proposed approach can also be compared against related work and in the presence of source code even be extended. Otherwise, great challenges are posed in repeatability [15].
Specifically, the presented RISC compiler and simulator are available as open source on the web [2] and can be used without restrictions (BSD license terms). RISC can be downloaded in both source code and binary format.

Open Source Code and Documentation
In its current version [4], the RISC open source package consists of approximately 162,000 lines of code and includes the C++ source code for the RISC compiler and simulator, Linux build scripts and installation instructions, as well as comprehensive documentation of the compiler and simulator APIs and tool manual pages. Example SystemC models, such as an abstract DVD player and the Mandelbrot renderer, and a regression test bench are included as well.
Given a suitable Linux platform, 2 the RISC source code package can be easily installed and then tested. After downloading and adjusting the installation Makefile, a simple make all command builds and installs the RISC framework and runs several demo examples. The user can then fully evaluate the software with other SystemC examples and even extend our proof-of-concept implementation with new features.

Binary Image for "Plug-and-Play" Evaluation
For a quick test run without compilation and installation, we also provide a Docker container [3] for using RISC in "plug-and-play" fashion. The Docker image contains RISC (and all needed libraries) in binary format and allows the user to test it with just a few Linux commands, as shown in Fig. 7.5.

Conclusion
The Recoding Infrastructure for SystemC (RISC) provides an automatic compilerbased framework to analyze and simulate IEEE SystemC models in parallel. In particular, we have introduced the RISC compiler and simulator. Using automatic conflict analysis based on segment graph (SG) abstraction, OoO PDES can execute threads safely in parallel and out-of-order (ahead of time) and thus achieves fastest simulation speed but nevertheless maintains the classic SystemC modeling semantics. In order to foster collaboration in the EDA community, we provide the RISC framework as a free open source artifact for full evaluation and possible extension.
For the future, we intend to expand our open source efforts and hope to involve other members of the EDA community to use, evaluate, and extend the RISC framework.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.