Comprehensive analysis of energy efficiency and performance of ARM and RISC-V SoCs

Suárez, Daniel; Almeida, Francisco; Blanco, Vicente

doi:10.1007/s11227-024-05946-9

Comprehensive analysis of energy efficiency and performance of ARM and RISC-V SoCs

Open access
Published: 20 February 2024

Volume 80, pages 12771–12789, (2024)
Cite this article

Download PDF

You have full access to this open access article

The Journal of Supercomputing Aims and scope Submit manuscript

Comprehensive analysis of energy efficiency and performance of ARM and RISC-V SoCs

Download PDF

Daniel Suárez¹^na1,
Francisco Almeida¹^na1 &
Vicente Blanco¹

1256 Accesses
Explore all metrics

Abstract

Over the past few years, ARM has been the dominant player in embedded systems and System-on-Chips (SoCs). With the emergence of hardware platforms based on the RISC-V architecture, a practical comparison focusing on their energy efficiency and performance is needed. In this study, our goal is to comprehensively evaluate the energy efficiency and performance of ARM and RISC-V SoCs in three different systems. We will conduct benchmark tests to measure power consumption and overall system performance. The results of our study are valuable to developers and researchers looking for the most appropriate hardware platform for energy-efficient computing applications. Our observations suggest that RISC-V Instruction Set Architecture (ISA) implementations may demonstrate lower average power consumption than ARM, but this does not automatically imply a superior performance per watt ratio for RISC-V. The primary focus of the study is to evaluate and compare these ISA implementations, aiming to identify potential areas for enhancing their energy efficiency. Furthermore, to ensure the practical applicability of our findings, we will use the Computational Fluid Dynamics software OpenFOAM. This step serves to validate the relevance of our results in real-world scenarios. It allows us to fine-tune execution parameters based on the insights gained from our initial study. By doing so, we aim not only to provide meaningful conclusions but also to investigate the transferability of our results to practical applications. Our analysis will also scrutinize the capabilities of these SoCs when handling nonsynthetic software workloads, thereby broadening the scope of our evaluation.

Investigation of Micro-Parameters Towards Green Computing in Multi-Core Systems

A Custom Designed RISC-V ISA Compatible Processor for SoC

Performance Evaluation of Various RISC Processor Systems: A Case Study on ARM, MIPS and RISC-V

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Modern high-performance computing (HPC) systems exhibit substantial energy demands when operating at peak capacity. To illustrate, consider the top five systems listed on the Top500 [1] list, which consume between 15 and 21 megawatts of power, exclusive of the energy required for cooling infrastructure. This level of energy consumption presents significant challenges both economically and environmentally. The search for solutions to improve energy efficiency has become a priority for the HPC community.

In response to the evolving demands in computing, there is growing interest in alternative hardware platforms that implement different Instruction Set Architectures (ISAs) with superior performance per watt. RISC-V [2] and ARM [3] are particularly noteworthy in this context. Both offer advantages over traditional architectures such as x86, including simplicity, scalability, and a reduced instruction set. In particular, RISC-V distinguishes itself as an open specification.

The primary objective of this document is to perform a comprehensive comparison of the energy efficiency and performance of hardware platforms implementing RISC-V and ARM ISAs. It is essential to understand that while ISAs are fundamentally software constructs, they indirectly influence the performance and power consumption of the systems through their hardware implementations. Factors such as instruction efficiency, the complexity of executing certain commands, and the overall architectural design associated with an ISA implementation can significantly impact a system’s performance and power usage. Our study aims to delve into these dynamics by evaluating SoCs based on ARM and RISC-V across various tests.

To achieve this goal, we will perform and analyze a range of benchmark tests to assess the performance and energy efficiency of these hardware implementations. The findings of this study will be invaluable to developers and researchers who are in the process of selecting a hardware platform for energy-efficient computing applications. Ultimately, our aim is to provide a thorough and rigorous analysis of hardware platforms implementing RISC-V and ARM ISAs and to determine which of these is more suitable for energy-efficient computing.

Expanding on this objective, we also look at the practical application of these architectures in the context of OpenFOAM [4], a Computational Fluid Dynamics (CFD) software that plays a pivotal role in industries ranging from aerospace engineering to automotive engineering. It allows simulation of fluid flow and heat transfer phenomena, helping to optimize designs, predict performance, and solve complex fluid dynamics problems.

By incorporating OpenFOAM into our analysis, we aim to validate the applicability of our findings in the real world. This validation process not only helps us fine-tune the execution parameters but also underscores the importance of our results in solving practical problems. Furthermore, it allows us to assess the capabilities of these System-on-Chips (SoCs) when tackling compute-intensive applications like OpenFOAM.

The structure of this work is divided as follows: In Sect. 2, we provide a brief overview of the current state of energy efficiency studies in RISC-V architectures. In Sect. 3, we detail the methodology used and the benchmarks conducted in our comparative analysis.

Next, in Sect. 4, we present the experimental results obtained from comparing all evaluated System-on-Chip (SoC) implementations on various benchmarks. Furthermore, Sect. 5 reveals the performance of our SoCs when tested with the CFD OpenFOAM application. Finally, in Sect. 6, we present the conclusions drawn and discuss potential avenues for future work to further enhance the energy efficiency of these architectures.

Our approach is designed to advance energy-efficient computing by offering valuable insights that assist industry professionals and researchers in selecting the most appropriate hardware implementations of ISAs for their projects. This aims to maximize performance while minimizing energy consumption.

2 Related work

In recent years, there has been a growing interest in the research and analysis of energy efficiency and performance in ARM-based systems. Studies such as the one conducted by Simakov and Deleon [5] have provided valuable insights into the current state of energy efficiency in ARM architectures. In their study, they presented a comprehensive analysis of performance and energy efficiency using various benchmarks and applications running on high-performance ARM systems. These applications covered a variety of computational paradigms, including HPCC [6] (various HPC benchmarks), NWChem [7] (ab initio chemistry), OpenFOAM [4] (partial differential equation solver), GROMACS [8] (molecular simulation), AI Benchmark Alpha [9] (AI benchmark), and Enzo [10] (adaptive mesh refinement). Although ARM performance is generally considered to be slower than current x86 counterparts, it has been shown in many cases to be comparable and sometimes even surpass previous generations of x86 CPUs, as seen in [11]. Moreover, in terms of energy efficiency, considering both power consumption and execution time, ARM has proven to be more energy efficient than x86 processors in most instances. In our research, we expand this comparative analysis by introducing RISC-V architectures alongside ARM, thus broadening the scope of our investigation into energy efficiency and performance in these systems.

With our attention turned to RISC-V, studies in this field have also gained prominence. A notable study conducted by Zaruba [12] analyzed the performance and energy efficiency of a RISC-V core specifically designed for Linux systems. Using Ariane [13], an open-source implementation of the 64-bit variant of RISC-V, the results demonstrated exceptional energy efficiency, reaching up to 40 GOp/sW compared to other similar cores mentioned in the scientific literature. This study emphasized the significant role of instruction extensions in improving computational performance, rather than relying solely on high-frequency operation.

In a more recent study by Elsadek and Tawfik [14], an extensive examination of open-source RISC-V cores was conducted, categorizing them into high-performance and resource-constrained categories. Subsequently, the most optimized cores for resource-constrained devices were selected, and comparisons were made on the basis of resource utilization and energy consumption. The results of this study identified the PicoRV32 [15] core as the most energy efficient option for resource-constrained devices. These studies collectively highlight the potential of RISC-V as an open and scalable processor instruction set architecture, enabling the design of highly energy-efficient cores.

In the context of this discussion, it is pertinent to refer to a research paper [16] that offers a comparative survey of open-source RISC-V processor implementations of the application-class RISC-V. The authors of this paper conduct an in-depth analysis of the most prominent open-source RISC-V projects, assessing them using identical benchmarks and configuration settings. Their analysis includes factors such as academic impact, community engagement, technology support, evaluation platforms for both FPGA and ASIC implementations, as well as performance, area, power consumption, and energy efficiency metrics. The findings of this study identify Rocket [17] and CVA6 [13] (formerly known as Ariane) as the most successful implementations of RISC-V, with relevance to both commercial and academic projects. The insights gleaned from this research are instrumental in guiding decisions pertaining to RISC-V processor implementation across a diverse range of systems.

The combination of these studies, along with other research efforts, has generated a deeper understanding of the advantages of both ARM and RISC-V-based cores in terms of energy efficiency. This has further fueled interest in exploring and harnessing the full potential of these open and scalable architectures. However, it is essential to note that there is currently a gap in research directly comparing the energy efficiency of ARM and RISC-V architectures. This research gap serves as motivation for our study, which seeks to address this void and provide a comprehensive and up-to-date insight into the energy efficiency of both ARM and RISC-V systems. Through a detailed exploration and analysis of energy efficiency in both architectures, our aim is to make a significant contribution to existing knowledge and offer valuable insights for decision-making in the design and optimization of ARM and RISC-V-based systems.

3 Methodology

We compare the performance and energy efficiency of ARM and RISC-V-based implementations. To do this, we ran benchmark tests on three different hardware platforms with specifications shown in Table 1.

Table 1 Hardware platform specifications

Full size table

Although the specifications of the hardware platforms differ in some aspects, we will try to be as fair as possible in our tests so that they have minimal impact on the results.

3.1 Benchmarks description

We used two benchmark tests: the NAS Benchmark [18] (version 3.2, SER) and the TFLite Benchmark [19] (models from MobileNet [20] v1 to v3).

The NAS Benchmark is a set of parallel programs that measure the performance of parallel computers. It includes different computing tasks used to evaluate the performance of various parallel algorithms. On the other hand, the TFLite Benchmark is a tool that is used to evaluate the performance of machine learning models on mobile and embedded devices. It measures how fast deep learning models converted to TensorFlow Lite format can make predictions.

In the context of NAS Benchmarks, we conducted tests utilizing the SER version, optimized for sequential execution. Due to the Nezha D1’s single-core architecture, parallel execution of algorithms on this machine is not feasible. We ran tests with Classes W, A, and B to obtain more results based on the size of the problem. The sizes of these problems can be seen in Table 2.

Table 2 Problem sizes and parameters for Classes W, A, B in NPB 3.3

Full size table

In the case of TFLite, we conducted tests using models with an input size of 224\(\times \)224 (ranging from MobileNet v1 to v3), without any limitation on the number of threads. This approach maximized each hardware platform’s utilization of available threads on its respective processor. Additionally, we integrated the Xnnpack [21] delegate, a highly optimized library of floating-point neural network inference operators, into our testing methodology to assess its impact on energy consumption and performance.

These benchmarks serve as standard tools in computational assessments, widely recognized for gaging the performance of various computer systems, including processors, GPUs, and distributed systems.

3.2 A case of study of a real application: OpenFOAM

We harnessed the capabilities of OpenFOAM [4], an open-source Computational Fluid Dynamics (CFD) software renowned for its versatility and robust capabilities in solving complex fluid dynamics problems. OpenFOAM is an invaluable tool widely adopted across various engineering fields, including automotive, aerospace, and environmental engineering. It excels in simulating and analyzing fluid flow and heat transfer phenomena.

In our study, we specifically utilized the OpenFOAM version v1906 [22] for our evaluations. This choice of version was deliberate, as it was precompiled for the platforms we used, thereby eliminating the need for the tedious process of compiling for three different architectures. The package we employed was obtained directly from the Debian package repository, further ensuring the reliability and convenience of our computational setup. Through OpenFOAM, we conducted simulations of both the “motorBike” case and the “rotorDisk” case from official examples with minor modifications to ensure that they run sequentially instead of parallel.

These simulations are pivotal benchmarks in our study, representing real-world scenarios that require substantial computational resources. They enable us to evaluate the performance and energy efficiency of the hardware platforms for the handling of computationally intensive applications, showcasing their capabilities for addressing the challenges posed by fluid flow and heat transfer analysis.

The “motorBike” problem [23] in OpenFOAM is a simulation that computes steady flow patterns around a motorcycle and its rider, with fluid entering at a speed of 20 m/s from the “inlet” region and leaving from the “outlet” region. The motorcycle’s surface is modeled as a no-slip wall, while the ground is assigned a velocity of 20 m/s. In particular, this simulation dynamically adjusts the number of parallel subdomains based on the selected number of processing cores.

In contrast, the “rotorDisk” [24] problem in OpenFOAM involves the application of cell-based momentum sources on velocity within a specified cylindrical region to approximate the mean effects of rotor forces. Here, the fluid flows in from the “inlet” region at 1 m / s in the direction of the Y-axis, exits from the “outlet,” and features a “rotating Zone” that spins at a rate of 1000 rpm around the Y-axis.

Although OpenFOAM is not a commonly employed tool on System-on-Chips (SoCs), it is more typically associated with high-performance computing environments because of its considerable computational requirements. Consequently, this case study not only delves into the intricacies of the problem itself but also extends our understanding of how SoCs can be harnessed to address these demanding computational challenges. Moreover, this research breaks new ground by exploring the utilization of OpenFOAM in architectures such as RISC-V, where research in this context is relatively limited. This highlights the adaptability of these applications and unveils fresh opportunities for their integration into less-explored domains of computing, further emphasizing their potential within the field of SoCs.

3.3 Test execution

To ensure the impartiality of our tests, we developed Python scripts capable of running tests concurrently on all hardware platforms. These scripts supervised the testing procedures and maintained a consistent temperature across all hardware platforms throughout the testing process. Before commencing each batch of tests, the scripts continuously monitored the hardware platform’s temperature and waited until all reached a predetermined base temperature. This base temperature represented the point when the hardware platforms were in an idle state, ensuring a uniform starting point for all tests. Only after reaching this base temperature did the scripts initiate the next set of tests. This strategy improves the fairness of our testing process and acts as a preventive measure against frequency fluctuations in our hardware platforms caused by temperature increases. It ensures our results remain highly reproducible and aligned with the specified standards.

The tests were run three times on each hardware platform, and we averaged the results to get the final values. We plotted the results and performed statistical analysis to compare the energy efficiency and performance of the different hardware platforms. To acquire precise energy data for our study, we used AccelPowerCape [25], a combination of BeagleBone Black [26] and the Accelpower module [27]. This module incorporates INA219 [28] sensors to measure current, voltage, and power consumption. To achieve high-precision data collection, we used a customized version of the pmlib library [29], a server daemon purposely designed for monitoring energy consumption. This implementation was accessed through the EML [30] pmlib driver.

Our methodology involved establishing a physical connection by connecting a cable from the power source of the devices under scrutiny to the AccelPowerCape, which facilitates real-time monitoring of energy consumption on our target devices. This approach ensured the acquisition of accurate and reliable energy-related insights within the context of our study.

4 Benchmark results

This section will present the results of the NAS Benchmark and TFLite Benchmark tests on the SoCs mentioned above. We will analyze the performance and energy efficiency of these SoCs, providing a clear comparison of their capabilities.

4.1 Power consumption analysis

In the experimental results of the power consumption displayed in Fig. 1a–d, we meticulously measured and compared the average power consumption in watts for each hardware platform. These figures specifically illustrate the average rate at which each device consumes power during the running of the benchmark tests. It provides insight into the typical power usage of each platform under test conditions, providing a clear comparison of their power consumption profiles.

Our observations reveal that the Nezha D1 consistently consumed the fewest watts in both the TFLite Benchmark and the NAS Benchmark, without necessarily implying a more efficient use of power in terms of performance per watt. In this context, we specifically refer to the raw power consumption figures, and not the effectiveness or productivity of each platform.

The Odroid XU4 ranked second in power consumption, while the Rock960 generally drew the most power, referring solely to the average power draw without considering the output.

It should be noted that the power consumption of the Nezha D1 remained stable throughout all scenarios, in contrast to the other hardware platforms, whose power consumption exhibited variations depending on the specific benchmark being executed.

Although these measurements offer crucial information on the raw power usage of each platform, they do not directly translate into assessments of power efficiency in terms of performance per watt. Performance per watt is a distinct metric that evaluates the computational output relative to energy consumption. Therefore, a lower power consumption, as observed on some platforms, does not necessarily imply a higher efficiency in this specific metric. This distinction is crucial for a comprehensive understanding of the energy characteristics of each platform.

4.2 Performance analysis

In this section, we present the results of our performance analysis, which involved measuring the time each hardware platform takes to complete the tests, as detailed in Fig. 2a–d.

Upon rigorous scrutiny of execution times, a significant disparity emerged. Nezha D1 consistently exhibited the longest execution times among the hardware platforms tested. This finding suggests that Nezha D1 may not be the most efficient choice for tasks where rapid completion is a critical requirement, which warrants consideration of alternative options.

Remarkably, Odroid XU4, while not the fastest in the whole test, demonstrated its capability by ranking as the second-slowest performer in our tests. This observation suggests that the Odroid XU4 may not excel in scenarios that require swift processing. However, it is noteworthy that the Odroid XU4 outperformed the Rock960 in the CG problem test, indicating its superior performance in specific computational tasks.

Conversely, the Rock960 emerged as the overall performance leader, surpassing the other hardware platforms evaluated in most scenarios. Its commendable speed makes the Rock960 an attractive choice for applications requiring rapid processing and execution.

4.3 Total energy consumption analysis

In this section, we will examine and present the results focusing on the total energy consumption, quantified in joules, for each of the hardware platforms evaluated. For more granular details, we refer the reader to Fig. 3a–d. This metric accounts for the cumulative amount of energy used by each device throughout the entire execution of the benchmark tests. It reflects the total energy expenditure, combining both the rate of power consumption and the time over which the device took to complete each test.

Our analysis reveals compelling insights into the energy consumption profiles of these hardware platforms. In particular, Nezha D1 consistently exhibited the highest total energy consumption across all benchmarks. This outcome can be attributed to the extended completion times required by the Nezha D1, resulting in greater overall energy consumption than the other hardware platforms.

In contrast, the Odroid XU4 and the Rock960 displayed similar energy consumption patterns, with occasional variations where one platform consumed marginally more energy than the other and vice versa. This observation highlights a degree of parity in energy efficiency between the Odroid XU4 and the Rock960 despite differences in their performance characteristics.

By scrutinizing the total energy consumption figures, our analysis provides valuable insight into the power profiles of these hardware platforms and aids in selecting the most suitable platform based on the specific energy constraints and requirements of the intended application. These findings underscore the importance of considering power efficiency and performance when making hardware choices in computational scenarios.

4.4 Operations per second analysis

In the context of the NAS Benchmark, “operations per second” refers to the number of floating-point operations executed by the hardware platform per second. In contrast, the TFLite Benchmark indicates the number of inferences conducted by the hardware platform per second.

Upon a comprehensive examination of the results of the NAS test presented in Fig. 4a–c, it becomes evident that the hardware platform that exhibits the highest number of operations per second is Rock960. In most cases, it surpasses the Odroid XU4 in this regard. In particular, the Nezha D1, while conforming to its specified capabilities, demonstrated comparatively lower performance than its competitors.

Shifting our focus to the TFLite Benchmark results, as shown in Fig. 4d, we can observe substantial disparities in the number of operations per second among the evaluated hardware platforms. Rock960 emerges as the best performing, consistently outperforming its rivals. This underscores the Rock960’s remarkable ability to achieve more inferences per second, rendering it as an enticing choice for applications necessitating swift data processing. On the contrary, Nezha D1 consistently exhibited a lower number of operations per second than the other hardware platforms, indicating a substantial deficit.

It should be noted that the inclusion of the Xnnpack delegate did not appear to significantly influence the overall results, suggesting that its impact on the number of operations per second was relatively negligible within the scope of these evaluations.

These insights from our analyses offer valuable guidance for selecting the most suitable hardware platform, considering the number of operations per second as a critical performance metric. Such considerations are crucial in a broad spectrum of computational applications, where optimizing resource utilization and achieving the desired level of operations per second are important.

4.5 Energy efficiency analysis

In the NAS and TFLite Benchmarks, it is important to distinguish the concept of “energy efficiency.” In the NAS Benchmark, this term refers to the number of floating-point operations performed per second per watt, while in the TFLite Benchmark, it implies the number of inferences made per second per watt.

Examining the results of the NAS test presented in Fig. 4a–c, it is apparent that the energy efficiency varies between the evaluated hardware platforms. In particular, the Odroid XU4 and the Rock960 exhibit competitive energy efficiency metrics in different tests. On the contrary, the Nezha D1 consistently displays a lower energy efficiency than the other hardware platforms.

A more detailed exploration of the results of the TFLite test in Fig. 5d reveals a consistent pattern. Odroid XU4 emerges as the hardware platform with superior energy efficiency in this benchmark, although it does not have the highest raw performance in this test. This observation adds an intriguing dimension to our findings, highlighting that raw performance does not always correlate directly with energy efficiency. On the contrary, Nezha D1 consistently lags in energy efficiency, indicating a notable shortfall in this crucial metric compared to its counterparts. Additionally, it should be noted that including the Xnnpack delegate did not substantially influence the results, occasionally yielding lower energy efficiency results than its counterpart without using this delegate.

These results in terms of energy efficiency might seem surprising when considering that they do not align with the average power consumption observed for each hardware platform, as seen in Fig. 1a–d. This discrepancy arises because although one hardware platform may have consumed less power on average, if the number of operations performed per second is significantly lower, the energy efficiency will also be reduced. This highlights the importance of not only evaluating power consumption in isolation but also considering the overall performance in terms of operations per second to gain a true understanding of energy efficiency.

These insights from our analyses offer valuable guidance for selecting the most suitable hardware platform, focusing on energy efficiency as a pivotal metric. Such considerations are paramount in the context of a broad spectrum of computational applications where optimizing resource utilization and achieving the desired level of energy efficiency are critical objectives.

4.6 Temperature analysis

In pursuing a comprehensive assessment of hardware platforms, we recognized the critical importance of incorporating temperature measurements into our analysis. Temperature, often overlooked but profoundly influential, can significantly affect a machine’s frequency, performance, and energy efficiency, making it pivotal for understanding how these platforms behave under different conditions. Detailed results of these temperature measurements, available in Fig. 6a–d, offer insight into how each platform’s temperature fluctuated during our experiments, providing valuable context for interpreting hardware performance under varying workloads and environmental conditions.

In particular, Nezha D1 consistently maintained the lowest temperatures in all tests, with modest temperature increases compared to other platforms. This suggests efficient thermal management. Surprisingly, despite active cooling fans, the Odroid XU4 experienced significant temperature spikes, likely due to its higher number of CPU cores. The Rock960 recorded the second-lowest temperatures, highlighting its thermal efficiency.

We conducted prestress tests to establish the upper temperature threshold at which these platforms could operate safely without any performance degradation. This limit is visually represented by the distinctive dashed lines within the figures. As we can see in the results, it was never reached during our experiments, which confirms the absence of thermal bottlenecks.

5 Computational results for the OpenFOAM case study

We conducted experiments using OpenFOAM software to validate the transferability of our benchmark results to a real-world application. The results of these experiments are visually depicted in Fig. 7a–d, which include metrics related to power consumption (referred to as “Power OpenFOAM”), performance (“Performance OpenFOAM”), energy consumption (“Energy OpenFOAM”), and temperature measurements (“Temperature OpenFOAM”). Our results align closely with our expectations, particularly in the context of the two simulations, namely “motorBike” and “rotorDisk.” In terms of average power consumption, as illustrated in Fig. 7a, the Nezha platform demonstrated the lowest power usage, followed by the Odroid. At the same time, the Rock960 registered the highest power consumption.

Moreover, in terms of execution time, as depicted in Fig. 7b, the Nezha D1 exhibited the longest duration, with the Odroid XU4 ranking as the second slowest and the Rock960 showcasing the fastest execution times. Interestingly, Nezha D1 emerged as the highest consumer in energy consumption, closely followed by Odroid, although the disparity was not as significant as observed in previous benchmark tests.

Regarding energy usage, as presented in Fig. 7c, it is worth noting that Nezha D1 displayed the highest energy consumption, followed by Odroid. However, their difference was not as pronounced as in our earlier benchmarking experiments. Finally, with respect to the temperature measurements featured in Fig. 7d, consistent with our previous experiments, the Nezha platform maintained the lowest temperature readings. At the same time, the Odroid recorded the highest temperatures, with the Rock960 falling in between.

These findings collectively yield valuable insights into the performance and energy efficiency of these hardware platforms in the real world, strengthening the trends observed in our benchmarking exercises. They demonstrate the adaptability of these applications to less powerful machines, revealing their ability to perform effectively, even if they do not achieve peak execution speed. In particular, its successful implementation on relatively recent architectures such as RISC-V underscores their versatility and potential for deployment across diverse computing environments. This exploration sheds light on the resilience of these applications and opens exciting avenues for harnessing their capabilities in a broader spectrum of computing systems.

6 Conclusion

In conclusion, our benchmark results have consistently reflected the anticipated characteristics of each hardware platform. Both the Odroid XU4 and the Rock960 have demonstrated superior performance in the NAS and TFLite tests compared to the Nezha D1. However, it is essential to acknowledge that this improved performance also comes at the cost of higher average power consumption compared to the Nezha D1.

Interestingly, our analysis suggests that architectural differences did not play a dominant role in determining the outcomes. Instead, the variations in performance and power consumption are predominantly attributed to the unique features of each device rather than the underlying architecture. Notably, RISC-V architecture implementations are still in a relatively nascent stage compared to their more established counterparts. As RISC-V continues to evolve and undergo optimization, further enhancements can be anticipated in both performance and energy efficiency in the future.

When selecting a device, it is crucial to consider the specific use case and environmental constraints. For scenarios where power consumption is a critical concern, such as in environments with limited power supply, the Nezha D1 emerges as a more favorable choice. On the contrary, when prioritizing performance or energy efficiency in contexts where power constraints are not an issue, the Odroid XU4 and Rock960 devices represent more suitable alternatives.

Furthermore, our investigation revealed that the inclusion of the Xnnpack delegate did not exert a substantial influence on our benchmark results. In some instances, its usage even yielded lower results compared to the baseline, indicating that the impact of this delegate may vary depending on the specific application and hardware configuration. Thus, careful consideration of the delegate’s utility should be exercised when incorporating it into similar benchmarking and computational tasks.

In the context of practical applications and simulations, our OpenFOAM case study closely echoed our benchmark findings, underscoring the valuable insights these benchmarks offer when assessing hardware performance and energy efficiency. This synergy between synthetic benchmarks and real-world tasks provides a comprehensive perspective for developers and researchers looking to make informed decisions about the selection of hardware for their specific computational needs. Furthermore, our study contributes to the expansion of knowledge on the utilization of CFD tools like OpenFOAM, particularly on SoCs that use ARM and RISC-V architectures, which have been relatively less explored in the existing research landscape.

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Meuer H, Strohmaier E, Dongarra J, Simon H (2023) TOP500 June 2023 List. http://www.top500.org/
Waterman A, Lee Y, Patterson DA, Asanovi K (2014) The risc-v instruction set manual. volume 1: user-level isa, version 2.0. Technical report, Berkeley University. https://doi.org/10.21236/ADA605735. https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-118.pdf
Limited A Arm Architecture Reference Manual for A-profile architecture. https://developer.arm.com/documentation/ddi0487/latest/
Jasak H (2009) Openfoam: open source cfd in research and industry. Int J Naval Archit Ocean Eng 1(2):89–94. https://doi.org/10.2478/IJNAOE-2013-0011
Article Google Scholar
Simakov NA, DeLeon RL, White JP, Jones MD, Furlani TR, Siegmann E, Harrison RJ (2023) Are we ready for broader adoption of ARM in the HPC community: Performance and energy efficiency analysis of benchmarks and applications executed on high-end ARM systems. In: Proceedings of the HPC Asia 2023 Workshops, HPC Asia 2023, Singapore, 27 February 2023–2 March 2023, pp 78–86. ACM. https://doi.org/10.1145/3581576.3581618
Luszczek PR, Bailey DH, Dongarra JJ, Kepner J, Lucas RF, Rabenseifner R, Takahashi D (2006) The hpc challenge (hpcc) benchmark suite. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing. SC ’06, p. 213. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1188455.1188677
Aprà E, Bylaska EJ (2020) NWChem: past, present, and future. J Chem Phys 152(18):184102 https://pubs.aip.org/aip/jcp/article-pdf/doi/10.1063/5.0004997/16684361/184102_1_online.pdf. https://doi.org/10.1063/5.0004997
Abraham MJ, Murtola T, Schulz R, Páll S, Smith JC, Hess B, Lindahl E (2015) Gromacs: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1–2:19–25. https://doi.org/10.1016/j.softx.2015.06.001
Article Google Scholar
Ignatov A. AI Benchmark Alpha. https://ai-benchmark.com/alpha.html
Bryan GL, Norman ML, O’Shea BW, Abel T, Wise JH, Turk MJ, Reynolds DR, Collins DC, Wang P, Skillman SW, Smith B, Harkness RP, Bordner J, Kim J-H, Kuhlen M, Xu H, Goldbaum N, Hummels C, Kritsuk AG, Tasker E, Skory S, Simpson CM, Hahn O, Oishi JS, So GC, Zhao F, Cen R, YL (2014) ENZO: an adaptive mesh refinement code for astrophysics. Astrophys J Suppl Ser 211(2):19. https://doi.org/10.1088/0067-0049/211/2/19
Gupta K, Sharma T (2021) Changing trends in computer architecture : a comprehensive analysis of arm and \(\times \)86 processors. Int J Sci Res Comput Sci Eng Inf Technol, pp 619–631. https://doi.org/10.32628/CSEIT2173188
Zaruba F, Benini L (2019) The cost of application-class processing: energy and performance analysis of a linux-ready 1.7-ghz 64-bit risc-v core in 22-nm fdsoi technology. IEEE Trans Very Large Scale Integr Syst 27(11):2629–2640. https://doi.org/10.1109/TVLSI.2019.2926114
Group, O Ariane (cva6) public repository. https://github.com/openhwgroup/cva6
Elsadek I, Tawfik EY (2021) Risc-v resource-constrained cores: a survey and energy comparison. In: 2021 19th IEEE International New Circuits and Systems Conference (NEWCAS), pp 1–5. https://doi.org/10.1109/NEWCAS50681.2021.9462781
YosysHQ: PicoRV32 public repository. https://github.com/YosysHQ/picorv32
Dörflinger A, Albers M, Kleinbeck B, Guan Y, Michalik H, Klink R, Blochwitz C, Nechi A, Berekovic M (2021) A comparative survey of open-source application-class risc-v processor implementations. In: Proceedings of the 18th ACM International Conference on Computing Frontiers. CF ’21. Association for Computing Machinery, New York, NY, USA, pp 12–20. https://doi.org/10.1145/3457388.3458657
Asanović K. Avizienis R, Bachrach J, Beamer S, Biancolin D, Celio C, Cook H, Dabbelt D, Hauser J, Izraelevitz A, Karandikar S, Keller B, Kim D, Koenig J, Lee Y, Love E, Maas M, Magyar A, Mao H, Moreto M, Ou A, Patterson DA, Richards B, Schmidt C, Twigg S, Vo H, Waterman A (2016) The rocket chip generator. Technical Report UCB/EECS-2016-17, EECS Department, University of California, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-17.html
Division NAS Nas Parallel Benchmark reference. https://www.nas.nasa.gov/software/npb.html
Team GB. Tflite Benchmark reference. https://www.tensorflow.org/lite/performance/measurement
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Google: Xnnpack public repository. https://github.com/google/XNNPACK
Weller H, Jasak H. OpenFOAM v1906. https://www.openfoam.com/news/main-news/openfoam-v1906
Weller H, Jasak H. OpenFOAM motorbike example. https://develop.openfoam.com/Development/openfoam/-/tree/master/tutorials/incompressible/simpleFoam/motorBike
Weller H, Jasak H. OpenFOAM rotor disk example. https://develop.openfoam.com/Development/openfoam/-/tree/master/tutorials/incompressible/simpleFoam/rotorDisk
Group of Architecture and Technology of Computing Systems (ArTeCS) of the Complutense University of Madrid, T.: AccelPowerCape reference. https://artecs.dacya.ucm.es/tools/accelpowercape/
Coley G. Beaglebone black system reference manual. https://www.farnell.com/datasheets/1685587.pdf
Llamas C, Ottogalli K, Hernández C, González M, Vegas J (2015) Sistema móvil basado en open source hardware para la adquisición de datos de movimiento humano
Adafruit: INA219 public repository. https://github.com/adafruit/Adafruit_INA219
Barreda M, Barrachina Mir S, Catalán S, Dolz MF, Fabregat G, Mayo R, Orti E (2013) An integrated framework for power-performance analysis of parallel scientific applications
Cabrera A, Almeida F, Arteaga J, Blanco V (2014) Measuring energy consumption using eml (energy measurement library). Comput Sci - Res Dev 30. https://doi.org/10.1007/s00450-014-0269-5

Download references

Funding

Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work has been supported by the Spanish Ministry of Science and Innovation with projects PID2019-107228RB-I00, TED2021-131019B-I00, and PDC2022-134013-I00; and the Spanish network CAPAP-H.

Author information

Daniel Suarez and Francisco Almeida have contributed equally to this work.

Authors and Affiliations

Computer Science and Systems Department, Universidad de La Laguna (ULL), San Francisco de Paula s/n, 38270, La Laguna, Spain
Daniel Suárez, Francisco Almeida & Vicente Blanco

Authors

Daniel Suárez
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Almeida
View author publications
You can also search for this author in PubMed Google Scholar
Vicente Blanco
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

These authors contributed equally to this work.

Corresponding author

Correspondence to Vicente Blanco.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Suárez, D., Almeida, F. & Blanco, V. Comprehensive analysis of energy efficiency and performance of ARM and RISC-V SoCs. J Supercomput 80, 12771–12789 (2024). https://doi.org/10.1007/s11227-024-05946-9

Download citation

Accepted: 28 January 2024
Published: 20 February 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s11227-024-05946-9

Comprehensive analysis of energy efficiency and performance of ARM and RISC-V SoCs

Abstract

Similar content being viewed by others

Investigation of Micro-Parameters Towards Green Computing in Multi-Core Systems

A Custom Designed RISC-V ISA Compatible Processor for SoC

Performance Evaluation of Various RISC Processor Systems: A Case Study on ARM, MIPS and RISC-V

1 Introduction

2 Related work