International Journal of Parallel Programming

, Volume 41, Issue 6, pp 806–824

Extensible Recognition of Algorithmic Patterns in DSP Programs for Automatic Parallelization

Authors

  • Amin Shafiee Sarvestani
    • Department of Computer and Information ScienceLinköping University
    • Department of Computer and Information ScienceLinköping University
  • Christoph Kessler
    • Department of Computer and Information ScienceLinköping University
Article

DOI: 10.1007/s10766-012-0229-2

Cite this article as:
Shafiee Sarvestani, A., Hansson, E. & Kessler, C. Int J Parallel Prog (2013) 41: 806. doi:10.1007/s10766-012-0229-2
  • 248 Views

Abstract

We introduce an extensible knowledge based tool for idiom (pattern) recognition in DSP (digital signal processing) programs. Our tool utilizes functionality provided by the Cetus compiler infrastructure for detecting certain computation patterns that frequently occur in DSP code. We focus on recognizing patterns for for-loops and statements in their bodies as these often are the performance critical constructs in DSP applications for which replacement by highly optimized, target-specific parallel algorithms will be most profitable. For better structuring and efficiency of pattern recognition, we classify patterns by different levels of complexity such that patterns in higher levels are defined in terms of lower level patterns. The tool works statically on the intermediate representation. For better extensibility and abstraction, most of the structural part of recognition rules is specified in XML form to separate the tool implementation from the pattern specifications. Information about detected patterns will later be used for optimized code generation by local algorithm replacement e.g. for the low-power high-throughput multicore DSP architecture ePUMA.

Keywords

Automatic parallelizationAlgorithmic pattern recognitionCetusDSPDSP code parallelizationCompiler frameworks

1 Introduction

Modern special-purpose chip multiprocessor architectures designed for special-purpose application areas such as digital signal processing (DSP) are highly optimized, heterogeneous and increasingly parallel systems that must fulfill very high demands on power efficiency. Architectural features include advanced instructions, SIMD computing, explicitly managed on-chip memory units, complex addressing modes, on-chip networks and reconfiguration options, all of which are exposed to the programmer and compiler. One example is the ePUMA architecture [9] being developed at Linköping University, a low-power high-throughput multicore DSP architecture designed for emerging applications in mobile telecommunication and multimedia.

This high architectural complexity makes it very hard for programmers and especially for compilers to generate efficient target code if starting from (even well-written) sequential legacy C code. Domain-specific languages for tools such as SPIRAL [22] are one possible way of enabling efficient platform-specific code generation but require rewriting of the program code from scratch in a new language. In this work, we propose instead an approach for automatic porting of applications by statically analyzing their source code for known frequently occurring programming idioms captured as patterns, with the goal of replacing recognized code parts by an equivalent implementation that is highly optimized for the target architecture, such as library code or code generated by autotuning tools such as SPIRAL.

For our work as presented here, we started from an earlier approach for pattern recognition in scientific source codes, the Paramat approach [13]. Our tool initially uses the same principle and matching algorithm as Paramat and also re-uses many of the lower level patterns specified in that earlier work; our tool is however much more modular and extensible, and accepts unrestricted standard C programs as input. Our tool is implemented in object oriented language, namely Java, allowing a modern object oriented design for the specification of patterns and use of e.g. reflection for convenient expression of recognition rules. We also developed an XML format for specifying patterns and the structural parts of their matching rules; this pattern specification parameterizes the matching tool, which separates the specification of recognition rules from the implementation of the matching tool itself, as no pattern is hardcoded and the list of patterns can be extended independently from the tool. Some of the patterns adopted from Paramat are modified and new patterns are defined to also cover (part of) the DSP domain. Moreover, our tool is built on top of Cetus [12], a modern industrial-strength compiler framework with better support for source-level program analysis and transformation.

The rest of this paper is organized as follows: Sect. 2 contains a brief overview of related work, Sect. 3 describes our pattern recognition approach and how the patterns themselves are organized into a pattern hierarchy modeled as a graph. Section 4 describes how our tool is implemented on top of the Cetus compiler framework. Section 5 contains an evaluation of our tool. Finally, we conclude and give some ideas of future work in Sect. 6.

2 Related Work

Pattern recognition in the sense of programming construct detection and source code manipulation has been an ongoing and extensive research. While its motivation could be quite unique per project, the global intention usually falls under the category of information gathering or execution performance enhancement [17]. The first phase usually tries to understand the program by collecting information through the detection of certain constructs and abstractions while the other makes an attempt to improve the efficiency and performance by rewriting or replacing the recognized code segments [17]. Paramat [13], PAP [16], XARK [1] and MPIIMGEN [27] introduce various tools and frameworks developed in the last two decades which focus on the enhancement of the existing parallelizing compilers.

Paramat [13] describes a hierarchical knowledge based algorithmic pattern recognition framework which discovers instances of frequently occurring programming constructs in the source code. The tool makes an attempt to solve the automatic parallelization problem by gathering as much information as possible about the source code through the instances of defined patterns, so they can be used as source for code generation instead of the original source code. The Paramat approach is the foundation of our work which is described in this article.

PAP recognizer [16] is a Prolog-based prototype tool for automatic parallelization with the emphasis on distributed-memory architectures. It introduces a plan-based technique to discover certain instances of programming concepts in code through a hierarchical process by exploiting the deductive inference-rule engine in Prolog. The tool which was integrated in the Vienna Fortran Compilation System [5] was designed to address the shortcomings of the existing tools at the time. Both Paramat and PAP recognizer take a hierarchical approach towards pattern recognition, but differ significantly in objectives and the applied methods. Martino and Kessler compare both frameworks in full detail in [7].

XARK [1] is another parallelizing compiler which applies a variation of concept recognition in source code with the goal to recognize a collection of computational kernels occurring in applications. Instead of directly working on the intermediate representation (IR), the framework conducts a demand-driven analysis on top of the Gated Single Assignment (GSA) form of the IR tree in two levels. In the first level, the tool inspects the data dependencies among statements within each strongly connected component (SCC) [14] in the IR to identify the computational kernels corresponding to the execution of the statements in each SCC. The kernels are later reexamined to check if they could be combined and form more complex kernels. This framework could target a wide collection of computational kernels while other tools are usually bound to certain isolated kernels [1].

MPIIMGEN [27] introduces a code transformer framework that takes advantage of a pattern driven approach to parallelize sequential code in the scope of image and video processing. This tool is possibly the closest pattern recognition tool to our work in case of the general concept and the approach taken, they actually use technology inspired by Paramat. Their approach formulates a specific language for pattern definition where patterns could be extended hierarchically on top of each other. The defined patterns target image processing concepts which are categorized based on the similarity of the operations. For example the max, min and the median concepts are placed in a pattern group called Neighborhood. Each group is then represented by a specific generic pattern which refers to the whole group. The tool then applies a bottom-up traversal on the abstract syntax tree to find a sub-tree that matches with one of the generic patterns and then compares the sub-tree in detail with all patterns in the category represented by the matched generic pattern.

Other works in this area include the Fortran idiom recognizer developed by Pottenger and Eigenmann [21] for the Polaris parallelizing compiler [3] which is a predecessor of Cetus, or the algorithm recognition approaches taken by Metzger and Wen [17] which tries to prove the equivalence of two sub-programs through a set of semantics-preserving transformation techniques [17]. For a more extensive survey of related work see e.g. the related work sections in [13] or [1].

3 Patterns and Pattern Hierarchy

A programming construct or abstraction is the manifestation of programmers’ goals and intentions through certain language and domain-specific terms and notions. As these notions are unique and shared among developers, their ultimate code constructs tend to resemble each other, especially when they target similar concepts within the same domain such as DSP or linear algebra.

A pattern is an abstraction of a computation that generally can be expressed in many different ways using a given programming language such as C, even if some restrictions (such as absence of pointers) apply. A pattern can be as simple as an assignment between two variables or as complex as certain specific DSP concepts such as a FIR filter. By specifying known recognition rules for patterns in terms of language elements such as for-loops, assignment statements and expression operators, a tool can, as far as enabled by the specified rules, identify occurrences of patterns in source code. Recognition is conservative, i.e. a pattern matches only if one of its recognition rules completely applies; if some constraint in a recognition rule can not be checked statically, the entire rule fails. If the patterns and their recognition rules are carefully defined, the matching process is deterministic. This implies that any given snippet of source program code can match at most one pattern.

Many of the patterns that are candidates for replacement by optimized target-specific code are characterized by for-loops. As recognition works step-wise bottom-up and is conservative, all statements in a loop body must be matched by some pattern before the entire loop can be recognized.1 Hence, we also need to define patterns for elementary program constructs such as constants or variable and array accesses, operators and intrinsic functions, expressions and assignments.

3.1 Pattern Categories

Patterns are categorized in different levels based on their complexity. Complexity in the scope of pattern categories refers to the number of nested loop levels in straightforward implementations of a pattern. At the lowest level, simple data accesses such as constant, variable and array accesses2 are found which we call trivial patterns. The recognition of trivial patterns is important as they are the basis for the definition of patterns at higher level. However, as they are quite simple, they are not included in the output of detected patterns after the recognition process.

Level0 patterns introduce simple programming operations such as assignments, simple mathematical functions, binary operations, etc. For example, a simple assignment of a constant to a variable such as
https://static-content.springer.com/image/art%3A10.1007%2Fs10766-012-0229-2/MediaObjects/10766_2012_229_Figa1_HTML.gif
is easily recognized as an occurrence of a Level0 pattern called SINIT (scalar initialization). The assignment statement is later annotated with a pattern instance which could (e.g., for debugging purposes) be shown as annotation of the program code in the following way:
https://static-content.springer.com/image/art%3A10.1007%2Fs10766-012-0229-2/MediaObjects/10766_2012_229_Figa2_HTML.gif
Recognition rules are formulated to match the bottom-up recognition process. Hence, rules for higher level patterns are defined in terms of instances of lower level patterns. Patterns in Level1 and Level2 often have rules that expect a lower level pattern inside a for- loop body, for example if the SINIT pattern occurs inside a for-loop, we could find an instance of the VINIT pattern (vector initialization) as in the following example:
https://static-content.springer.com/image/art%3A10.1007%2Fs10766-012-0229-2/MediaObjects/10766_2012_229_Figa3_HTML.gif
where _i0 and _x0 are newly created symbols constructed by the recognizer that denote a loop range object and a vector container, respectively, which internally hold the parameters about loop and access bounds etc. On the next level the MINIT pattern (matrix initialization) is built on top of VINIT in a for-loop as in the following example:
https://static-content.springer.com/image/art%3A10.1007%2Fs10766-012-0229-2/MediaObjects/10766_2012_229_Figa4_HTML.gif
In general each pattern refers to a valid concept within the C language and the hierarchical process of pattern definition binds the new patterns to be built on top of the old ones; however more complex patterns require extra (new) low level patterns to be defined before they could be matched. These extra “artificial” patterns may not necessarily form a concept of interest of their own and are only defined to facilitate the recognition process of other patterns; hence they are called help patterns. Help patterns are treated the same way as any other pattern during the recognition process, however they are not shown in the output. For example consider the following statement: z=x+A1*A2*A3*A4;

In order to detect the whole statement as a pattern known as ADDMULTIMUL3 in our bottom-up approach, the first step is to recognize the A1*A2*A3*A4 expression. As the concept of multiplication with several factors is not in itself interesting in our domain, we do not want to have our code annotated with this pattern in the case that this expression occurs independently in the code; as a result, this pattern exists only to help the recognition process in an intermediate step in bottom-up recognition.

3.2 Pattern Hierarchy Graph

For each pattern \(p\), there exists a list of next-level patterns or “super-patterns” which can be built from an occurrence of \(p\). This next-level relationship forms a graph known as pattern hierarchy graph (PHG) in which in principle every edge connects a pattern at level \(i\) to one of its super-patterns at level \(i+1\). The PHG acts as a guide during the pattern recognition process by providing in each step the list of candidate patterns which could be matched on the next level from the current existing patterns.

3.3 Pattern Definition

The platform independent nature of patterns together with their hierarchical and extensible structure forces any implementation choice for pattern definition to be simple, highly dynamic and language-independent. As XML is extremely portable and extendable and can represent most of the data structures, it was selected as the syntax for pattern definition.

Each pattern recognition rule is defined as an XML node in the pattern specification file. The key features in each element are the name, the level and the structure of the recognition rule of each pattern. As a convention, the name of each pattern must be unique as each pattern might be used in a recognition rule for other patterns in the hierarchy. At the lowest level (trivial patterns), the name represents the name of base node types in Cetus (such as Identifier, ArrayAccess and Literal) for comparison purposes. The structure tag encodes the structural4 part of one recognition rule for the pattern, and describes the list of children (sub-patterns) in the syntactic hierarchy that the current pattern is built upon which acts as a blue-print both for the dynamic generation of the PHG and the pattern matching process. Figure 1 illustrates the XML definition (excerpt) of the Scalar Initialization pattern (SINIT). The definition shows that SINIT is a Level0 pattern with two children where the first child is either Identifier or ArrayAccess and the second child is either FloatLiteral or IntegerLiteral.
https://static-content.springer.com/image/art%3A10.1007%2Fs10766-012-0229-2/MediaObjects/10766_2012_229_Fig1_HTML.gif
Fig. 1

Excerpt of the definition of a recognition rule for the SINIT pattern in XML. An instance definition element is not necessary here because SINIT is so simple that the parameters are inferred from the tree structure in the recognition rule

The already defined patterns could later be used to define patterns in higher levels in the hierarchy. Figure 2 shows the excerpt of the definition of VINIT (vector initialization) in XML which is defined based on the SINIT pattern.
https://static-content.springer.com/image/art%3A10.1007%2Fs10766-012-0229-2/MediaObjects/10766_2012_229_Fig2_HTML.gif
Fig. 2

Excerpt of the definition of one recognition rule for the VINIT pattern in XML. Also here, the instance creation can be implicit because of the simplicity of the recognition rule

3.4 Pattern Variation

Certain computational concepts (patterns) can occur in multiple forms of syntactically different code snippets, see Fig. 3 for an example. These syntactically different forms which represent the same concept are known as the variations of that specific pattern. The definition of recognition rules for each pattern can be extended for any number of variations by simply adding new entries corresponding to new recognition rules for those variations in the pattern definition file.
https://static-content.springer.com/image/art%3A10.1007%2Fs10766-012-0229-2/MediaObjects/10766_2012_229_Fig3_HTML.gif
Fig. 3

Different variations of the vector summation pattern. a Array based variation. b Pointer based variation

Another alternative to perform pattern matching and handle pattern variations would be to apply the pattern matching process on the SSA (static single assignment) representation of the program (as e.g. done in XARK [1]) rather than working on the AST (Abstract syntax tree). However, as we reconstruct data flow information from partially matched code on the fly, we won’t get any special benefits from using the SSA form. In addition, as the emphasis in our tool is on the usability and extensibility, the introduction of the SSA concept in structural recognition rules would require specific compiler knowledge which excludes a large group of potential users. Keeping instead the recognition rule definitions close to coding variants facilitates and simplifies the pattern definition process for users. Moreover, tree matching is easier to implement than graph (SSA) matching.

3.5 Run-Time Generation of the PHG

The tool introduces an automatic mechanism to generate the PHG at (tool) run-time based on the structure of patterns defined in an XML file. Before the recognition process starts, the XML file is parsed and each node is converted into a custom Java object of the type Pattern. The hierarchical relation (sub-pattern, super-pattern) among patterns is extracted from the structure tags of all recognition rules using the following formula:
$$\begin{aligned} Superpattern(A)=\{ Pattern\; p: A \in p.Children\} \end{aligned}$$
(1)
where \(p.Children\) denotes the possible sub-patterns of \(p\) in all recognition rules for \(p\).

The system then utilizes a hash-table to hold the hierarchical relationship among the newly generated pattern objects by keeping a reference to the list of super-pattern objects for each pattern object. This hash table, which acts as the PHG for the pattern recognition process, could be extended at (tool) run-time by adding further relations or pattern objects e.g. from further XML files.

4 Implementation

Cetus [12] is an open source Java based compiler infrastructure for source to source transformation of various software programs. The motivation behind Cetus development refers to the lack of parallelizing compilers for modern languages such as C and C++ that introduce a certain analysis complexity at compile time due to the presence of pointers and user specific types [12].

Cetus includes a C parser based on ANTLR [18] which exploits object oriented properties in Java in order to provide a high-level and user friendly API (Application Programming Interface) for transformation pass generation and IR manipulation. It presents optimization and analysis features such as data dependency, induction variable recognition, reduction variable recognition, Points-to and Alias Analysis.

4.1 System Architecture

Our recognition tool is built on top of Cetus v1.2.1, which was the latest version at the beginning of the development phase. The tool architecture makes an attempt to distribute the recognition task among modular components which interact directly with various Cetus components in order to achieve their goals. Independent implementation of individual components makes them more adaptable to future changes without jeopardizing the whole tool, for example, the syntax of pattern definition (XML) could be internally interchanged without any modification in other modules.

Despite Cetus’ open source nature, it is treated as a black box by our tool to make it more compatible with the upcoming versions of Cetus, however any major change in the methods and functionality would require certain modifications to the components.

Figure 4 shows the internal architecture of both Cetus and our tool and the way that they interact. The pattern recognition process begins with the generation of the IR tree from the input program source file by the Cetus parser. The pattern recognition tool then loads the patterns and creates the pattern hierarchy graph from the XML specification file. The generated IR tree and PHG are sent to the pattern matcher as inputs, but before the matching process starts, certain normalization transformations such as loop distribution and expression simplification are applied which may modify the structure of the IR tree. The pattern matcher module then starts the bottom-up recognition process on the modified IR tree with the help of the generated PHG and saves any instance of the detected pattern for the corresponding node. The annotation module later retraverses the IR tree and annotates each node with a comment statement built from the corresponding detected pattern. The process finishes by calling the Cetus specified print function for the IR tree which writes the modified source code and the annotation statements to the output file.
https://static-content.springer.com/image/art%3A10.1007%2Fs10766-012-0229-2/MediaObjects/10766_2012_229_Fig4_HTML.gif
Fig. 4

Tool Architecture

Extensibility Extensibility refers to the process of adding new patterns without the need to modify the general architecture. As patterns are defined as XML elements which are managed independently from the matching process, the extension of the tool by new patterns can be accomplished easily. Each new recognition rule is added by defining its structural part in XML format which is automatically loaded during the PHG generation; however, if certain additional checking is required we only need to implement a function and define the set of additional rules to be checked. One example of semantic rule checking using reflection is the one for the recognition rule of the VINIT pattern shown in Fig. 2, where we check among other things that the loop index occurs in a linear index expression on the left hand side of the SINIT pattern instance. During the pattern matching process, if any pattern is flagged for semantic rule checking, the matching process calls the specific defined auxiliary matching function for this pattern by reflection in order to check the explicitly written semantic rules.

4.2 Pattern Matching Process

The pattern matcher module is responsible for the general matching process which applies the algorithm introduced in PARAMAT [13] on the IR tree. At this point, the pattern hierarchy graph has been generated and any normalizing transformations are already applied to the IR tree. The tool then traverses the IR tree in post-order, first considering the leaves of the tree.

Leaf nodes might match trivial patterns (of type Identifier, ArrayAccess or Literals in Cetus) which are the basis for the definition and recognition of other patterns. The recognition process then continues in other leaves until the root of the current sub-tree is visited. At the root level, the tool makes an attempt to match the whole sub-tree with an instance of a pattern. It first fetches the list of patterns already detected in the children nodes and then inspects the PHG to get the list of candidate patterns that can be built atop the previously detected patterns. In general, the previously detected children patterns could belong to various levels in the pattern hierarchy, as a result, the tool selects the highest level pattern among children patterns and passes it to the PHG in order to make sure that recognition process continues in the right direction. The tool then compares the recognition rules defined in each candidate pattern with the current sub-tree and children patterns. If any pattern matches, a new data structure is created which holds a reference to the current sub-tree and another to the instance (summary data structure) of the detected pattern. If the matching fails for all candidates, this sub-tree is no longer considered and the matching process continues in other branches. This process continues until the whole tree is traversed.

Horizontal Pattern Matching Beyond vertical (normal) patterns, there are also patterns that cover (a merge of) several sibling nodes and thus require horizontal pattern matching [13]. For example, three SCOPY pattern instances in the following code can be merged into an instance of a SWAP pattern on variables x and y using z.5
https://static-content.springer.com/image/art%3A10.1007%2Fs10766-012-0229-2/MediaObjects/10766_2012_229_Figa5_HTML.gif

Before the pattern recognition process at the root of any sub-tree starts, the list of the children patterns is independently examined for horizontal patterns and the possibility of being merged. The merging process updates the references for children patterns by removing the old pattern references and pointing to the newly detected horizontal pattern. Details of guiding horizontal matching by data flow edges can be found in [13].

4.3 Structure Comparison Between Patterns and Sub-Trees

The process of comparing a whole sub-tree with the structural constraints (extracted from XML tags) of a candidate pattern is done in multiple steps. It starts by comparing the type of the root node with the root-name tag in the XML definition of a recognition rule of the current candidate pattern. The root-name tag refers to one of the abstract syntax tree node types defined in Cetus, and is set based on the concept represented by the pattern, for example the root-name for the SINIT pattern is assigned to AssignmentExpression as SINIT is essentially an assignment.

This specific convention simplifies the comparison process by introducing a one to one mapping between the XML definition of patterns and the nodes in the IR tree. A simple mismatch between the root-name and the type of root node of the sub-tree causes the comparison process to fail for this candidate pattern.

After passing the first step successfully, the tool compares the set of detected children patterns against the set of children found in the XML definition of the current candidate pattern. Each pattern defines the recognition rules by specifying the children that occur in the pattern. For example Fig. 1 shows that the SINIT pattern expects two children (sub-patterns) in the children pattern list in order to be matched with the current sub-tree; if a mismatch is found the comparison process fails.

The last step involves checking every single detected pattern in the children list against each child in the structure of a pattern. The tool starts from the first detected child pattern and compares it against all the possible variances of the first child as specified in the XML tag; if it matches any of the variations, the process will pass for current children pattern. Only if all children patterns pass this process, the comparison mechanism succeeds.

4.4 Pattern Instance Creation

Our system supports implicit instance declarations that are automatically inferred from the (single) recognition rules structure in simple cases, as a default mechanism to save some XML writing. In general, instance declarations can always be done explicitly in XML, although this is not done in the example of Fig. 2 for brevity. For such pattern instance, the name, the type and a set of references are kept in memory which refer to AST nodes that map to elements in the structure tag for that pattern. For example, for a subtree that maps to an instance of an SINIT pattern, there exist two references to the nodes in this subtree that refer to the elements on the left and on the right side of the assignment. These references can later be exploited to help with the implicit process of pattern instance creation. By default, the tool uses a general convention to generate the pattern instance and parameters implicitly from the pattern instance information kept in memory. In case of most Level 1 and Level 2 patterns, the information regarding the loop variable and its boundaries is also preserved which is used during the printing process in the later steps. Any needed information regarding the parameters belonging to a sub-pattern can be easily retrieved by following their specified references.

There are also cases such as horizontal and certain complex patterns in which the recognition rules and their parameters cannot be fully expressed in the XML file due to their complex nature. In these cases instance creation is accomplished with special XML tags and the help of explicit user provided auxiliary matching functions which conform to a certain naming convention. This naming convention is used during the recognition process to find and invoke the user provided function for a specific pattern by reflection.

5 Evaluation

In this section we focus on the analysis and the evaluations derived from testing the tool on various C files with emphasis on the precision and the efficiency of the recognition process. The term precision (in the scope of pattern recognition) refers to the degree of detection of instances of specified patterns and their variations, while always rejecting the invalid cases. For this task, the tool was evaluated by a set of manually written test-cases which included examples of defined patterns. In total we have currently defined 140 different patterns, \(85\) at Level0, \(36\) at Level1 and \(14\) at Level2 and the rest are trivial patterns. In total, 153 recognition rules have been specified.

Table 1 shows a summary of all implemented patterns, for the details on each specific pattern we refer to [24].
Table 1

Summary of currently implemented patterns on each level

Level

Example of patterns

# patterns

0

Scalar arithmetics, etc

85

1

Elementwise vector operations (VADD, VMUL, VAADD, ...), Scalar plus vector (VINC), Scalar times vector (SV) Vector swap (VSWAP), DOTPRODUCT, 1D-reduction (VSUM), Vector maximization, Vector minimization, Vector initialization (VINIT, VASSIGN), minumum/maximum value or location (VMAXLOC, VMINLOC, VMAXVL, VMINVL, ...)

36

2

Elementwise matrix operations (MADD, MADD, MMUL, ...), Scalar plus matrix (MINC), Scalar times matrix (SM), Matrix copy (MCOPY), Matrix initialization (MINIT), OUTERPRODUCT, TRANSPOSE, FIR filter

14

5.1 Relevant Code Blocks

Our first analysis of the results showed that there exists a group of statements in C which occur frequently in various source files, but due to the computational nature of our specified patterns, they are not covered by pattern instances. These statements which include I/O operations, type casting expressions, function calls and return statements can be ignored for the pattern recognition process as they make no contribution to the pattern matching on higher levels. We call such statements irrelevant statements.

The occurrence of irrelevant statements inside loops and code blocks could also make that particular block irrelevant. We extend the definition of the irrelevant statement to cover blocks and loops. A loop or a block of code is considered relevant if it contains at least one relevant statement.

Statements The cetus.hir.Statement type which maps to the statement type in the C grammar has been chosen as the basis for the analysis in the evaluation section. The tool calculates the occurrence of both relevant and irrelevant statements as they are visited during the post-order recognition process.

Relevant Code Ratio The Relevant Code Ratio is defined as the ratio of relevant statements to the total statements in a C source file which presents an estimation of the domain in which potential patterns could be found. Benchmarks and test cases with low average relevant code ratio are not good candidates for evaluation as the number of possibly parallelizable statements is probably insignificant in comparison to the total number of statements.

Relevant Statement Coverage Relevant Statement Coverage is defined as the percentage of relevant statements that were fully matched by patterns during the recognition process, which gives us an overview of how our tool operates on real data.

As the matching process works in post-order it makes an attempt to match every low level node or sub-tree with a pattern before going up towards the root. A statement is fully matched by a pattern if the highest detected pattern refers to the root-node of the statement’s sub-tree. As a convention a statement with the highest detected pattern referring to a non-root node is considered as a partially matched statement. A 100% relevant statement coverage means that all the statements in the relevant code domain have been fully matched. More details about the fully and partially matched statements can be found in [24].

Other metrics calculated for each source file are the matched for-loop percentage (MFP), the transformation time and the recognition time. The matched for-loop percentage presents the percentage of for-loops matched to Level1 and Level2 pattern instances while the other two metrics show the time requirements of the tool while applying loop distribution (transformation time) and the execution of the pattern recognition process (recognition time), respectively.

Finally, it should be reminded that the emphasis in this project is on correctly recognizing the instances of defined patterns rather than focusing on the relevant statement coverage ratio. The low or high coverage ratio can only indicate the relevance of the defined patterns over the DSP domain.

5.2 Results and Discussion

A simple test suite has been composed which gets the path of the source files and runs the pattern recognition tool individually on each file. The Cetus parser can handle any C program that follows the ANSI C89/ISO C90 standard and requires all header files included in the code to be present during the parsing process, otherwise the parsing module of Cetus fails which, in return, terminates the whole tool even before our pattern recognition module is invoked.

The test case packages for this project are selected among certain open-source DSP applications which have been developed in the last two decades based on different C compilers and different target platforms such as Windows or Linux. As a result, passing the (Cetus) parsing step successfully depends on finding the necessary header files and installing required libraries before starting the pattern recognition process. In addition, simple source code modifications such as removing the header-file statements have been applied in order to remedy the parsing errors in the absence of specific header-files. However, with all the effort put into resolving the parsing errors, there were still certain files in different packages which did not pass the parsing step and thus have been excluded from the analysis results.

Table 2 presents the different test case packages with the number of files in each package.
Table 2

Test case packages

Name

Description

# files

# C statements

LastWave

Wavelet oriented signal processing software [2]

161

56,357

BruteFIR

Program for applying long FIR filters to multi-channel digital audio [25]

14

9,634

DRC

Program for generating correction filters for acustic compensation of HiFi and audio systems [23]

8

6,564

Fiview

Software for designing and viewing digital filters [19]

6

2,464

Vocoder

Free channel vocoder software [4]

10

936

Libsamplerate

Sample rate converter for audio [6]

19

1,639

AlmusVCU

Converts a multi-channel sound card into a real-time versatile convolver unit [26]

23

17,300

CIPS

Source code from the book Image Processing in C: Analyzing and Enhancing Digital Images [20]

120

11,206

Due to the large number of files and packages we describe two specific projects in detail and for the rest of the results we refer to [24].

5.2.1 The CIPS Package

Tables 3 and 4 present the analysis result from a few files in this package. The statement-count column shows the total number of statements (both relevant and irrelevant) traversed during the recognition process. The matched pattern column indicates the number of non-trivial patterns recognized among the statements. Multiple pattern instances at different levels can be detected for a single statement as the recognition process moves up the AST for that statement. Since the matched patterns metric expresses the total number of non-trivial patterns, there might be cases where the total number of matched patterns exceeds the number of statements.

As discussed earlier the relevant code ratio metric can provide some hints regarding the type of operations that exist in a source file. For example in Table 3 the file HISTEQ.C has only 11.54 % relevant code ratio which indicates the presence of a relatively large number of function calls, return and casting expressions.

Table 4 presents the analysis information regarding the percentage of the matched for-loops, the transformation and the recognition time.
Table 3

Analysis result for the CIPS package

File

Statement count

Matched patterns

Relevant code ratio (%)

Relevant statement coverage (%)

flip.c

58

62

67.2

76.9

CIPS5.C

542

474

54.4

64.8

TXTRSUBS.C

322

386

73.9

68.5

CIPS3.C

82

79

68.3

67.9

xemboss.c

77

46

52.0

67.5

HISTEQ.C

26

4

11.5

66.7

EDGE3.C

73

80

86.3

65.1

Table 4

For-loop analysis results for the CIPS package

File

Matched for-loops

Transformation time (sec)

Recognition time (sec)

flip.c

18 out of 18 (100.0 %)

2.81

0.11

CIPS5.C

21 out of 62 (33.9 %)

8.06

0.12

TXTRSUBS.C

12 out of 40 (30.0 %)

6.20

0.06

CIPS3.C

18 out of 22 (81.8 %)

0.25

0.02

xemboss.c

3 out of 8 (37.5 %)

0.74

0.02

HISTEQ.C

1 out of 1 (100.0 %)

0.01

0.01

EDGE3.C

4 out of 12 (33.33 %)

1.70

0.02

5.2.2 The LastWave Package

Tables 5 and 6 show the result of applying the recognition tool to a few files in the LastWave package. As we can see the analysis results could be quite different from one package to another package. Low matched loop percentage for some of the files are due to the existence of pointers and specific function calls.
Table 5

Analysis results for the LastWave package

File

Statement count

Matched patterns

Relevant code ratio (%)

Relevant statement coverage (%)

cv_misc.c

459

350

45.9

55.5

nr_utilities.c

189

175

61.4

55.2

stft_tabulate.c

318

166

44.3

49.7

ext2_proj.c

156

144

69.2

49.1

image_matrix.c

479

344

60.8

48.5

owavelet2.c

319

253

52.0

47.6

pf_lib.c

1,155

421

53.0

46.9

Table 6

For-loop analysis results for the LastWave package

File

Matched for-loops

Transformation time (sec)

Recognition time (sec)

cv_misc.c

15 out of 25 (60.00 %)

1.25

0.41

nr_utilities.c

0 out of 13 (0.00 %)

6.50

0.30

stft_tabulate.c

5 out of 21 (23.8 %)

2.93

0.56

ext2_proj.c

4 out of 14 (28.6 %)

3.27

0.28

image_matrix.c

14 out of 66 (21.2 %)

12.17

0.42

owavelet2.c

24 out of 38 (63.2 %)

5.43

0.37

pf_lib.c

7 out of 41 (17.1 %)

2.41

0.57

5.2.3 Loop Distribution

The main reason for applying loop distribution as a normalization transformation is to separate independent statements in the body of for-loops and shrink the body which enhances the chances of pattern recognition (as our defined for-loop patterns only include few statements in the body). However as the loop distribution is applied before the recognition process, all possible loops are distributed which do not necessarily result in detection of new patterns and only increase the number of for-loops in the file which has a negative effect on our defined MFP (Matched For-loop Percentage) metric.

6 Conclusion and Future Work

This new pattern recognition tool exploits the power of a high level compiler framework and implementation language together with the extensibility and flexibility of XML tags to create an adaptable architecture for pattern recognition in the C domain. The XML pattern specification language strikes a pragmatic balance between the tool extensibility and the complexity of developing a full pattern language. On the other hand the hierarchical structure of pattern definitions facilitates the specification and the extension of new patterns by simply combining the already defined patterns. We conducted a thorough analysis on any C based benchmark and presented the results in form of both statistics about matched relevant statements and loops. All the effort in this project has been directed towards the creation of an extensible architecture which can be easily maintained and extended without the need to apprehend the low level details of the underlying compiler framework.

Future work can focus on the enhancement of the tool and on the implementation of code generation modules, to target e.g. the ePUMA architecture.

6.1 Optimized Code Generation from Pattern Instances

Our tool enables in an extensible way, by introducing patterns, to automatically recognize loop based algorithms written in the C programming language. This is the first stage in a platform specific optimizer. The second stage is to generate parallelized, optimized target specific code in a fully automatic way. Today there already exist tools for the second stage that generate high quality optimized parallel code, one example is SPIRAL [22]. Instead of starting from (legacy) source code the tools expect a program in a domain specific language such as the operator language, OL, [8] or the signal processing language, SPL, [22] which put the burden on the application programmer to manually port (legacy) code to these specific languages. These languages may in some cases not be suitable for an average programmer since they describe the calculations in a highly mathematical way which can be unintuitive for programmers who work with standard procedural languages such as C. It is possible to implement patterns for our tool that match the operators defined in the OL. In fact, we have already implemented patterns that recognize the four (basic) OL operators in [8]; \(\mathbf{r } = \mathrm{P }_{n}(\mathbf{x , \mathbf y })\), \( r=\mathrm{S }_{n}(\mathbf{x , \mathbf y })\) (dot product), \(\mathbf{r } = \mathrm{K }_{n \times m}(\mathbf{x , \mathbf y })\) (outer product) and \(\mathbf{r } = \mathrm{L }_{m}^{mn}(\mathbf{x })\) (transposition) in C code.

Our pattern recognition tool is planned to be part of a compiler tool chain for the low-power high-throughput multicore DSP architecture ePUMA that is currently being developed at Linköping University for emerging applications in mobile telecommunication and multimedia [9]. Since the ePUMA architecture is extremely complex and there exists at the moment no optimizing compiler for it, the programming has to be done manually at assembly level, which is tedious and error prone. The pattern information derived by the tool will help with optimized ePUMA code generation by automatically replacing the (nontrivial) pattern instances with expert-written computation kernels that are highly tuned for ePUMA. Although our pattern recognition tool is aimed for the ePUMA tool chain, it is totally target independent.

6.2 Pointer Analysis

Pointer or points-to analysis refers to the static process of resolving possible values that a pointer may carry at run-time. Due to the significance of points-to analysis for the optimization of high-level languages, many efforts have been directed towards this subject, however as this problem is NP-hard [11] and static analysis is generally undecidable [15], any suggested algorithm is implemented with the emphasis on either effectiveness or precision [10]. While the current version of our program is restricted to pointer-free expressions and statements, Cetus presents an interprocedural points-to analysis framework which takes a flow sensitive approach to calculate the set of memory locations referred by each pointer. The next version of the tool should investigate how the currently defined patterns could be extended over the pointer domain by taking advantage of the pointer and alias analysis feature of Cetus.

Footnotes
1

Certain transformational techniques such as loop distribution are applied in order to enhance the recognition process in the presence of multiple statements in a for-loop body that together do not match any defined single pattern. Loop distribution factors out statements of the loop body with no cyclic data dependency into separate loops, which enhances the recognition process. The details of the whole process can be found in [24].

 
2

Although arrays are considered as variables in most of the programming languages, Cetus handles them in a different manner by defining a specific type (ArrayAccess) for them.

 
3

The ADDMULTIMUL pattern supports an arbitrary number of factors.

 
4

The non-structural part of recognition rules will be handled by separate auxiliary matching functions called by reflection, which will be explained in Sect. 4.4.

 
5

Generally, such merging of siblings is only possible if interferences by other read or write accesses by other siblings “in between” can be statically excluded; see [13] for details.

 

Acknowledgments

This project was supported by SSF and SeRC. We would also like to thank the anonymous reviewers for their constructive comments.

Copyright information

© Springer Science+Business Media New York 2012