Search
Search Results
-
Interactive Performance Visualization and Analysis of Execution Traces for Pattern-Based Parallel Programming
We introduce the design and implementation of a performance visualization system for high-level programming of heterogeneous parallel systems. The...
-
Analysis of Model Parallelism for AI Applications on a 64-core RV64 Server CPU
Massive Data Parallel workloads, driven by inference on large ML models, are pushing hardware vendors to develop efficient and cost-effective...
-
Simulation-Based Parameter Optimization for Self-adaptive HPL on Parallel Systems
Computational benchmarks are essential for dependable systems, applications, and technologies across multiple domains. However, traditional...
-
Generating Sparse Matrices for Large-Scale Spectral Clustering on a Single GPU
Spectral clustering has many fundamental advantages over k-means clustering, but comes at much higher time complexity and memory requirements mainly...
-
Celerity-RSim: Porting Light Propagation Simulation to Accelerator Clusters Using a High-Level API
Time-of-Flight (ToF) camera systems are increasingly capable of analyzing larger 3D spaces and providing more detailed and precise results. To...
-
Advancing Interactive Parallelization: iCetus
Despite advancements in parallelization tools, optimizing scientific applications remains a complex and time-consuming task due to the iterative...
-
pi-par: A Dependently-Typed Parallel Language with Algorithmic Skeletons
Algorithmic skeletons are an effective, pattern-based approach for parallelising software. However, despite implementations for a range of languages...
-
Automatic Heterogeneous Runtime Using Signal Processing Domain-Specific and Parallel Patterns
Parallel and signal processing patterns for large-scale radio data applications have been captured with a new domain-specific language (DSL),...
-
Parallelizing RNA-Seq Analysis with BioSkel: A FastFlow Based Prototype
Over the past decade, the widespread adoption of RNA-seq methodology for transcript-level monitoring has resulted in a surge of biological data...
-
Fast Parallel CPU-GPU Approximate Spectral Clustering for Transcriptomics Data
Spectral clustering algorithms have been used in various research domains to discover structure and patterns in data. However, high computational and...
-
DyG-DPCD: A Distributed Parallel Community Detection Algorithm for Large-Scale Dynamic Graphs
Dynamic (Temporal) graphs capture the valuable evolution of real-world systems, from the continuously evolving patterns of social interactions and...
-
PragFormer: Data-Driven Parallel Source Code Classification with Transformers
Multi-core shared memory architectures have become ubiquitous in computing hardware nowadays. As a result, there is a growing need to fully utilize...
-
Optimizing Three-Dimensional Stencil-Operations on Heterogeneous Computing Environments
Complex algorithms and enormous data sets require parallel execution of programs to attain results in a reasonable amount of time. Both aspects are...
-
High-Level Programming of FPGA-Accelerated Systems with Parallel Patterns
As a result of frequency and power limitations, multi-core processors and accelerators are becoming more and more prevalent in today’s systems. To...
-
A Hybrid Machine Learning Model for Code Optimization
The complexity of programming modern heterogeneous systems raises huge challenges. Over the past two decades, researchers have aimed to alleviate...
-
Calculation of Distributed-Order Fractional Derivative on Tensor Cores-Enabled GPU
Due to an increased computational complexity of calculating the values of the distributed-order Caputo fractional derivative compared to the...
-
Distributed Calculations with Algorithmic Skeletons for Heterogeneous Computing Environments
Contemporary HPC hardware typically provides several levels of parallelism, e.g. multiple nodes, each having multiple cores (possibly with...
-
Portable C++ Code that can Look and Feel Like Fortran Code with Yet Another Kernel Launcher (YAKL)
This paper introduces the Yet Another Kernel Launcher (YAKL) C++ portability library, which strives to enable user-level code with the look and feel...
-
Generic Exact Combinatorial Search at HPC Scale
Exact combinatorial search is essential to a wide range of important applications, and there are many large problems that need to be solved quickly....
-
Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems
We analyze the performance portability of the skeleton-based, single-source multi-backend high-level programming framework SkePU across multiple...