Abstract
With the arrival of the opensource RISCV processor architecture, there is the chance to rethink Deep Neural Networks (DNNs) and information representation and processing. In this work, we will exploit the following ideas: i) reduce the number of bits needed to represent the weights of the DNNs using our recent findings and implementation of the posit number system, ii) exploit RISCV vectorization as much as possible to speed up the format encoding/decoding, the evaluation of activations functions (using only arithmetic and logic operations, exploiting approximated formulas) and the computation of core DNNs matrixvector operations. The comparison with the wellestablished architecture ARM Scalable Vector Extension is natural and challenging due to its closedness and mature nature. The results show how it is possible to vectorize posit operations on RISCV, gaining a substantial speedup on all the operations involved. Furthermore, the experimental outcomes highlight how the new architecture can catch up, in terms of performance, with the more mature ARM architecture. Towards this end, the present study is important because it anticipates the results that we expect to achieve when we will have an open RISCV hardware coprocessor capable to operate natively with posits.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In the latest years, RISCV has started to emerge as an opensource alternative CPU architecture [4, 7, 27]. Being it also royaltyfree, it is the rising star competitor of Intel, AMD and ARM CPUs (both for 32 and 64bit variants). Important software and hardware industries have endorsed and funded the project, including Intel, Microsoft and ST Microelectronics [6].
The main feature of RISCV is its open instruction set architecture. This means that any user can extend it by adding his own instructions and functionalities: this possibility is strategic to design very lowlatency coprocessors and accelerators without having to treat them as external devices with memory mapping and interrupts. Furthermore, with the latest advancements of the vector extension development, RISCV processors are able to accelerate the processing of several kernels for machine and deep learning (e.g., dot products, vectormatrix multiplications and image filtering). Lately, several real number representations have been proposed by industry and research such as Intel with Flexpoint [23, 26], Google with BFLOAT16 [8] and Facebook AI [22]. Another very promising alternative to IEEE 32bit Floatingpoint standard is the posit\(^{\text {TM}}\) number system, proposed by Gustafson [19]. This format has been proven to match single precision accuracy performance with only 16 bits used for the representation [9, 12, 16, 17, 24]. Furthermore, the first hardware implementations of this novel type are very promising in terms of energy consumption and area occupation [10, 20, 28].
In this work, we envision the adoption of these two disruptive innovations (vector extension and posit arithmetic) within the same architecture. Our ultimate goal is to extend RISCV to be able to use a Posit Processing Unit (PPU) as a coprocessor, by extending the processor instruction set architecture (ISA). While going towards this end, we can anyway gain great benefits from the posit format. It is of particular interest knowing the potential benefits of positbased coprocessor in a killer application such as Deep Neural Networks.
In this work, we assess the quality of the vectorization of RISCV operations when using posit numbers and we compare it with ARM SVE. The paper is organized in the following way. In Sect. 2, we briefly present an overview of the posit format, along with recent improvements and findings achieved at University of Pisa, in collaboration with MMI spa, on information processing using posit numbers. In Sect. 3, we summarise the core aspects of the RISCV architecture. In particular, we focus on the RISCV vector extension, showing the principal component used in the rest of the work. In Sect. 4, we present the implementation of vectorized posit operations inside the cppPosit library (a C++ Posit library developed and maintained by the authors) for RISCV, following the same approach of our previous implementation of the ARM SVE vectorized operations [14]. In particular, we focus on the implementation of posit encoding and decoding from/to the floatingpoint format. In Sect. 5, we illustrate the benchmarks used to test our implementation. The tests represent different core operations of DNNs, including convolutions, dotproducts and matrixmatrix multiplications as well as activation functions. In particular, the proposed activation functions are fast approximated versions of the real ones; these functions can be computed just by an arithmeticlogic unit, thus being highly vectorizable. In Sect. 6, we outline our vision to enable hardware accelerators for posit operations for a RISCV processor focusing on the latency and implementation complexity of the different solutions. Finally, we also present a comparison with ARM SVE and future works. In Sect. 7, we draw some conclusions.
2 Posit arithmetic
The posit format [11, 12, 16, 19] is a configurable fixed length format for real number representation; the format configuration involves the number of overall bits (nbits) and the maximum number of exponent bits (esbits).
2.1 Format overview
As shown in Fig. 1, a posit number is composed by a maximum of 4 fields: i) sign (1bit), ii) regime (variable length), iii) exponent (maximum of esbits) and iv) fraction (variable length). Note that posits are encoded using 2’s complement. The regime field is a particular one; its length is identified by a series of bit equal to 1 (or 0) terminated by a stopbit with the opposite value. The value of the regime field is then the number of equal bits in the so discovered bitstring.
Expression (1) shows the mathematical relation between the posit bit content (v) and the represented value (x). k is the regime value computed as described before and \(useed = 2^{2^{esbits}}\). e and f are, respectively, the exponent and fraction values decoded from the base2 representation (Fig. 2).
2.2 Advantages over IEEE 32bit Floats
As widely described in [18, 19] several issues are afflicting the IEEE Float32 format that are addressed by this novel format:

Waste of bit patterns: IEEE Float32 wastes millions of patterns for NaN values;

Mathematically incorrect: two representation of 0 (\(\pm 0\));

Nonconfigurable accuracy: predetermined number of exponent and mantissa bits;

Being a more nonlinear and compressed representation, it allows more powerful bit manipulations (changing the bit string of a posit in an appropriate way, using the ALU alone, can lead to interesting nonlinear transformations of the posit number itself).
The application of posit numbers to Deep Neural Networks (DNN) has been independently proven to this authors and others, to perform as good as float numbers [15, 25] with half the bits (or even less), as reported in Table 1. The table reports the comparison between different configurations of posit with a 32bit float, on the German Traffic Sign Recognition Benchmark (GTRSB) dataset. In the Table 1, a 10bit posit arithmetic allows for the same detection and classification accuracy of 32bit float, while the accuracy reduction of a 8bit posit is limited to 0.2% (for a data size saving of a factor 4 vs 32bit float).
2.3 No exponent bit case
As shown by the authors in [13], the posit format gains interesting properties when configured with \(esbits=0\). This particular configuration allows the implementation of fast versions of common operations (a little approximation is introduced in some of them); the new versions can be computed just by using the arithmeticlogic unit (ALU) of the CPU since they only involve bit manipulation and integer operations. Among the basic operations that can be accelerated this way, there are the double and half operators (2x and x/2), the inverse operator (1/x) and the one’s complement (\(1x\)).
Furthermore, some common use activation functions for Deep Neural Networks (DNNs) can be implemented this way. The basic building block is represented by the fast approximation of the Sigmoid function (from [19]). Starting from this one, we developed the others as a simple combination of the previous operators and the Sigmoid.
What we obtained is the fast approximation of the hyperbolic tangent (FastTanh [11, 13]):
and the fast approximation of the extended linear unit (FastELU [13]):
The possibility to implement several operations as simple ALU instructions leads to some interesting aspects:

It allows the ALU emulation of posit operations without specific hardware support.

It allows an operator to be implemented as a sequence of ALU instructions; this means that every operation implemented this way can be easily vectorizable.
2.4 Past achievements concerning positbased DNNs
In previous work, we have been able to:

develop our posit software library (cppPosit, see Sect. 4);

implement fast approximated activation functions, only possible when using posits, which exploit the ALU (no PPU necessary for it): [13, 15];

fast matrixvector multiplication, thanks to vectorization (demonstrated on ARM CPUs, using SVE: [14]).
In this paper, we aim the present what we did to obtain the same achievements, but this time on an open processor, the RISCV.
3 The RISCV architecture
The RISCV [4] architecture is a modular, opensource and royaltyfree instruction set architecture (ISA) and comprises both 32 and 64bit flavours. The overall ISA is composed of smaller subISAs among which there are the base subsets. These subsets are referred as base integer instruction sets and identified by the letter I. Besides, a RISCVbased architecture can present some other extensions; some of them are referred to as frozen. This means that their encoding and behaviour has been ratified and will not change during the current draft of the architecture. These extensions include integer multiplication/division operations (M), single (F), double (D) precision floating point operations (following the IEEE 754 Float standard) and atomic instructions (A).
3.1 The RISCV vector extension
A very interesting and under development extension is the vector one (V). This extension aims to provide singleinstruction multipledata (SIMD) capabilities to the RISCV architecture. By design, this extension can seamlessly exploit either the CPU registers or a special vector coprocessor for hardware acceleration. Any RISCVbased architecture implementing this extension will define some parameters:

Number of vector registers (standard is 32)

vlen: size (in bits) of the vector registers (e.g., 256)

elen: maximum supported size for a single element (e.g., 64 for a 64bit integer or double)
The idea behind the vector extension is the same of the ARM scalable vector extension (SVE) architecture [1]; there is not a predetermined vector length (as happens in the Intel SIMD extensions) but a special instruction vsetvl. This instruction takes as input a requested vector length vreql and returns the granted vector length vgrant as in next expression:
This design allows porting an application between RISCV architectures, without rewriting a single line of code and, in case of furtherly compatible architectures, without recompilation. Moreover, this will help us later when simulating the same program with different vector configurations.
4 The cppPosit library
The support for posit arithmetic is offered by the cppPosit library. This library has been developed in Pisa, and it is maintained by the authors of this work. The library uses advanced templatization techniques from C++14 to ease the definition of posit configurations at compile time. Posit operations are classified into four different levels (\(\mathcal {L}1\)\(\mathcal {L}4\)) with increasing computational complexity [13]. The simplest and fastest level is called \(\mathcal {L}1\) and comprises all the operators described in Sect. 2.3. The library also offers three backends to rely on for posit operations that cannot be emulated via ALU:

Floating point backend, using the standard FPU support for operations;

Fixed point backend, exploiting biginteger support (64 or 128 bits) for operations;

Tabulated backend, generating lookup tables for most of the operations (suitable for Posit\(\left<[8,12],*\right>\) due to table sizes).
4.1 Posits and RISCV vectorization
Vectorization of posit operations was already proved to be interesting in the ARM SVE environment in [14]. As shown in that work, each function aimed at vectorizing posit operations has three main parts: i) prologue, ii) body, iii) epilogue.
In the prologue, we need to prepare the vector containing posit data referring to the underlying vector architecture. This phase varies whether we are implementing \(\mathcal {L}1\) function or not. In the former case, the prologue is just a reinterpret_cast from the posit type to the underlying integer holder type. Instead, when dealing with non\(\mathcal {L}1\) function, this phase is also devoted to the conversion from posit type to a suitable backend type (e.g., for RISCV we choose to convert it to IEEE Float32). In the body, there is the actual implementation of the vectorized operation. Finally, in the epilogue, we need to build back the posit vector (the same considerations of the prologue hold here).
An important focus must be put on the epilogue and prologue phases: since these phases employ posits with an overall size of 16 or 8, we are performing a data compression by a factor 2 or 4. Moreover, even if we use 32bit floats to perform computations, the data expansion is performed only inside the vector processing unit. Therefore, we are transferring compressed data from the scalar CPU registers to the vector ones. We can thus transfer from twice to quadruple the data with a single vector load instruction. This means that just by using posits as a compressed information storage can lead to great optimizations in data transfer. Figure 3 shows the overall idea behind posit vectorization in cppPosit.
5 Experimental results
In this section, we will present the benchmarks for the novel RISCV“V” vectorized approach on several benchmark kernels. Instead of relating our benchmarks to a particular DNN library or implementation, we decided to present the benchmarks on simple and core building blocks for DNNs.
Firstly, we will present benchmarks of vectorized \(\mathcal {L}1\) activation functions such as sigmoid, hyperbolic tangent and extended linear unit. Secondly, we will present benchmarks on matrix and vector operations such as dot product, convolution and general matrixmatrix multiplication.
The RISCV benchmark binaries were generated using the Barcellona Supercomputing Centre (BSC) LLVM cross compiler (clang++ 11.0). As for now, this is the only compiler providing highlevel intrinsics for the RISCV vector extension [3]. The RISCV binaries were then executed on the RISCV Spike simulator (riscvisasim RVV version 0.8) [5].
The ARM benchmarks binaries were generated using the upstream branch of the GCC compiler (GCC 10.0) with SVE intrinsic support. The examples were executed on a static, userspace QEMU (QEMU 5.0) installation for ARMv8.2 that supports SVE instructions.
All the simulations were executed on a 8 core Intel(R) Core(TM) i79700 CPU with 3.00GHz base frequency.
For each benchmark result, we analyzed the timing performance of prologue, body and epilogue to address the overhead that is introduced by the first and the last part (as already discussed in Sect. 4.1. Moreover, we compared the vectorized performance varying the vector length against the nonvectorized implementation (called naive from now on).
5.1 Vectorization of posit encoding and decoding
Since we could not rely on autovectorization due to compiler limitations there was the need to provide a vectorized implementation of posit decoding and decoding operations for prologue and epilogue blocks.
Completely decoding a posit means taking its representing integer and extract the four fields described in Sect. 2.1. This can be accomplished using just arithmetic and logic integer operations. Algorithm 1 shows the steps we used to unpack a posit into its (at most) four fields. Given these fields, we can build a float number.
Let F be a IEEE754 FP32 number:
The represented value (see (1)) is:
Let P be a posit\(\left<nbits,esbits\right>\):
where \(es \le esbits\).
The represented real value is:
where \(fracbits = nbits1regbitses\).
For the conversion, we want that \(x_P = x_F.\) This implies that:
Encoding a posit means taking any real number representation (e.g., IEEE Float32) and computing the integer that represents the posit as described in Sect. 2.1. This can be accomplished using just arithmetic and logic integer operations. Algorithm 2 shows the steps we used to unpack a posit into its (at most) four fields. Once we decode the floating point into its 3 fields we can reason on how to retrieve posit fields. If we look at Eq. (2), we can see that \(k_P\) is the result of an integer division between \(es_F\) and \(2^{esbits}\) and \(es_P + 127\) is the remainder of the integer division. This means that we can retrieve the two fields as:
Now, we consider the fraction part. As shown in Eq. (3), retrieving \(m_P\) is equivalent to a logical right shift of \(23fracbits\) on \(m_F\), thus obtaining the fracbits most significant bits.
We implemented both algorithms using handvectorization for the RISCV platform, using the “V” extension intrinsics. The findLeftMostSet is particularly interesting, since there was not a native vectorized implementation in the RISCV “V” extension at the moment of the development. Algorithm 3 shows our choice for the implementation. We implemented also this algorithm using the RISCV vectorization intrinsics.
5.2 Vectorized activation function benchmarks
In this subsection, we report in Fig. 4 and 5 the results of the benchmarks concerning the vectorization of the activation functions (Sigmoid and ELU, respectively). These benchmarks consisted in the execution of the vectorized activation functions on random vectors of 8192 items varying the vector length and using Posit\(\left<16,0\right>\) and Posit\(\left<8,0\right>\).
These results are particularly groundbreaking; we are indeed reducing the processing time of an activation function by a factor 10 at least. This is easily obtained by just applying posit properties. Moreover, there is nothing similar that can be obtained with the floating point format to achieve similar speedup while varying vector length and data size.
This is extremely important, since modern neural network architectures can require the computation of nonlinear activation functions like ELU on vectors with up to tens of thousans of elements.
5.3 Vectorized matrixvector operation benchmarks
In this subsection, we show the results concerning vectorized matrixvector operation benchmarks. The benchmark parameters are the following:

Dot product: timing performance on a dot product between vectors of size 64.

Convolution: timing performance on a \(3\times 3\) convolution over matrices of size \(64\times 64\).

Matrixmatrix multiplication (GEMM): timing performance on a matrix multiplication between matrices of size \(32\times 32\).
For each benchmark, we report two different types of result:

Performance comparison (Figs. 6, 7, 8): timing performance comparison varying the vector length and disabling vectorization. We used a logarithmic scale to represent these results in order to increase plot readability.

posit decoding/encoding overhead evaluation (Figs. 9, 10, 11): timing performances are decomposed to evaluate the contribution of each part of the algorithm.
5.4 Analysis of results and discussions
In this section, we will analyse and discuss the results shown before.
Firstly, as reported in Figs. 4 and 5, the speedup gained by the vectorization of activation functions is impressive in both cases, with vectorized performance being one order of magnitude smaller than the nonvectorized ones. Moreover, it is evident how much the algorithm benefits from increasing the vector length and halving the data size. Note that this speedup could not be achieved using the IEEE Float32 format; in fact we could apply the series of arithmetic and logic operations thanks to the posit representation with zero exponent bits.
Secondly, speaking of matrixvector operations, again the speedup when using vectorized algorithms is massive, reducing processing time by more than 2 orders of magnitude (note the logarithmic scale of the plots in Figs. 5, 6 and 7).
Thirdly, the same benchmarks were executed on the ARM environment explained at the beginning of this section. In particular, we considered the 512bit vector length for the comparison to match the first hardware ARM SVE platform on the market (the Fujitsu A64FX processor [2]). As shown in Fig. 12, if we compare the vectorization on the RISCV platform to the ARM SVE platform, the results are comparable, basically resulting in a draw.
Finally, the newly proposed approach for decoding and encoding vectorization of posits brought a substantial speedup in the prologue and epilogue phases. As we can see from Figs. 9, 10 and 11 the vectorized phases benefit from increasing the vector register lengths. However, there is still some overhead that afflicts the epilogue part due to costly conversions between 32bit integers (for Float representation) and posits holder integers (16bit integers in this case).
6 Future work
Having a dedicated posit processing unit is critical to increase the architecture performance removing the software emulation bottleneck. There are three main ways to equip a RISCV CPU with a hardware PPU:

1.
include the PPU within the processor [21]. This requires the introduction of an additional instruction set and the instrumentation of the compiler.

2.
use the PPU as a slave peripheral. The peripheral can be an external FPGA board connected via PCI Express bus. This is similar to how GPUs are connected to CPUs nowadays. This solution has the highest latency, since we need to communicate with an external peripheral via bus communication.

3.
design the PPU as an IPcore to be included within the processor chip. This can be viewed as a coprocessor approach, thus having PPU and CPU on the same SoC (systemonchip). In terms of latency, this solution is an intermediate one.
We are currently working to implement a PPU for RISCV following option 1. However, even without a hardware PPU, the solution proposed in this paper (vectorized posit operations, emulated on ALU of FPU) is still of interest in particular situations, as discussed below.
Figure 13 shows some computing environments, with increasing computing power, cost and energy consumption: i) microcontrollers, ii) CPUs without the FPU, iii) CPUs with the FPU, iv) Many Core CPUs without GPUs, and v) Multi (or Many) Core CPUs with GPUs. With “Many Core” CPU (MaCPU), we designate a CPU having more than 100 cores, while with “Multi Core” CPUs we indicate those having up to 100 cores.
The solution proposed in this work is of interests for cases ii), iii) and iv). Indeed, In case ii) the use of ALUemulated PositDNNs is particularly interesting and justified (see [13, 15]). Also in case iii), the approach proposed here is a viable and appealing solution, provided that the CPU supports vectorization. Case iv) is the situation where the solution proposed here is expected to obtain the best speedup, since we can exploit the massive data and instruction parallelism introduced by many core architectures to increase the DNN processing capabilities.
In all the cases, the use of posit numbers can at least halve (when using posit\(\left<16,x\right>\)) the representation size, thus doubling the bandwidth of the information and improving the usage of the cache. Moreover, reducing the information size is extremely useful when combined with the vector engines. With a half of the element size, we can fit double the elements inside a vector register, as demonstrated in [14] for ARM CPUs and here for RISCV CPUs. Finally, when posit\(\left<8,x\right>\)based DNN reaches a satisfying accuracy, the benefit with respect to 32bit floats is much more impactful.
7 Conclusions
In this paper, we presented the implementation of posit vector operations for DNNs using the RISCV opensource hardware platform. We proposed an extension of our cppPosit library enabling dotproduct, matrixmatrix and convolution operations in the RISCV Vector extension environment. Moreover, we provided the implementation of fast approximated activation functions only using vectorized integer arithmetic and logic. As reported in the experimental results, we gained a significant speedup from the handvectorization of posit operations, including the encoding and decoding of the novel format to the standard IEEE 32bit floatingpoint format. Furthermore, we managed to catch up the ARM SVE results when vectorizing the same operations. The promising results may indicate that opensource hardware platform like RISCV, along with opensource DNN software implementations, may enable a brand new class of completely open DNN computing environments.
References
ARM HPC tools for SVE. https://developer.arm.com/toolsandsoftware/serverandhpc/armarchitecturetools/documentation/introducingscalablevectorextensionsve. Accessed 7 July 2020
PostK Supercomputer with Fujitsu’s Original CPU, A64FX Powered by ARM ISA. (2019) https://www.fujitsu.com/global/Images/postk_supercomputer_with_fujitsu’s_original_cpu_a64fx_powered_by_arm_isa.pdf. Accessed 7 July 2020
V for vector: software exploration of the vector extension of riscv. (2020) https://www.europeanprocessorinitiative.eu/vforvectorsoftwareexplorationofthevectorextensionofriscv/. Accessed 7 July 2020
RISCV ISA Specification. https://riscv.org/specifications/isaspecpdf/. Accessed 11 March 2020
Spike, a RISCV ISA Simulator. https://github.com/riscv/riscvisasim. Accessed 7 July 2020
RISCV History. https://riscv.org/riscvhistory/. Accessed 28 May 2020
Asanović K, Patterson DA.:(2014) Instruction sets should be free: The case for RISCV. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS2014146
Burgess N, Milanovic J, Stephens N, Monachopoulos K, Mansell D. (2019): Bfloat16 processing for neural networks. In: 2019 IEEE 26th Symposium on Computer Arithmetic (ARITH), pp. 88–91. https://doi.org/10.1109/ARITH.2019.00022
Carmichael Z, Langroudi HF, Khazanov C, Lillie J, Gustafson JL, Kudithipudi D (2019) Deep positron: A deep neural network using the posit number system. In: 2019 Design, Automation Test in Europe Conference Exhibition (DATE), pp 1421–1426
Chaurasiya R, Gustafson J, Shrestha R, Neudorfer J, Nambiar S, Niyogi K, Merchant F, Leupers R (2018) Parameterized posit arithmetic hardware generator. In: 2018 IEEE 36th International Conference on Computer Design (ICCD), pp. 334–341. https://doi.org/10.1109/ICCD.2018.00057
Cococcioni M, Rossi F, Ruffaldi E, Saponara S (2019) A fast approximation of the hyperbolic tangent when using posit numbers and its application to deep neural networks. In: Int. Workshop on Applic. in Electronics Pervading Ind., Envir. and Society (ApplePies’19) https://doi.org/10.1007/9783030372774_25
Cococcioni M, Rossi F, Ruffaldi E, Saponara S (2019) Novel arithmetics to accelerate machine learning classifiers in autonomous driving applications. In: In Proc. of the 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS’19), pp. 779–782. https://doi.org/10.1109/ICECS46596.2019.8965031
Cococcioni M, Rossi F, Ruffaldi E, Saponara S (2020) Fast approximations of activation functions in deep neural networks when using posit arithmetic. Sensors 20(5) https://www.mdpi.com/14248220/20/5/1515
Cococcioni M, Rossi F, Ruffaldi E, Saponara S (2020) Fast deep neural networks for image processing using posits and ARM Scalable Vector Extension. Journal of RealTime Image Processing pp. 1–13. https://doi.org/10.1007/s1155402000984x
Cococcioni M, Rossi F, Ruffaldi E, Saponara S (2021) Novel arithmetics in deep neural networks signal processing for autonomous driving: Challenges and opportunities. IEEE Signal Processing Magazine, Special Issue on Autonomous Driving, vol. 38, no. 1, pp. 97–110. https://doi.org/10.1109/MSP.2020.2988436
Cococcioni M, Ruffaldi E, Saponara S (2018) Exploiting posit arithmetic for deep neural networks in autonomous driving applications. In: In Proc. of the 2018 IEEE International Conference of Electrical and Electronic Technologies for Automotive (Automotive’18), pp. 1–6. https://doi.org/10.23919/EETA.2018.8493233
Fatemi Langroudi SH, Carmichael Z, Gustafson J, Kudithipudi D (2019) Positnn framework: Tapered precision deep learning inference for the edge. pp. 53–59. https://doi.org/10.1109/SpaceComp.2019.00011
Gustafson JL (2015) The end of error: unum computing. Chapman and Hall/CRC, Cambridge
Gustafson JL, Yonemoto IT (2017) Beating floating point at its own game: posit arithmetic. Supercomput Front Innov 4(2):71–86
Jaiswal MK, So HKH. (2018): Universal number posit arithmetic generator on FPGA. In: 2018 Design, Automation Test in Europe Conference Exhibition (DATE), pp 1159–1162. https://doi.org/10.23919/DATE.2018.8342187
Jaiswal MK, So HKH (2019) PACoGen: A hardware posit arithmetic core generator. IEEE Access 7:74586–74601
Johnson J (2018) Rethinking floating point for deep learning. CoRR. http://arxiv.org/abs/1811.01721. Accessed 7 July 2020
Köster U, Webb T, Wang X, Nassar M, Bansal AK, Constable W, Elibol O, Gray S, Hall S, Hornof L. et al. (2017): Flexpoint: An adaptive numerical format for efficient training of deep neural networks. In: In Proceedings of teh 31st Conference on Neural Information Processing Systems (NIPS’17), pp. 1742–1752
Lu J, Fang C, Xu M, Lin J, Wang Z (2020) Evaluations on deep neural networks training using posit number system. IEEE Transactions on Computers, pp 0–1
Murillo R, Del Barrio AA, Botella G (2020) Deep pensieve: a deep learning framework based on the posit number system. Digital Signal Process 102:102762
Popescu V, Nassar M, Wang X, Tumer E, Webb T. (2018): Flexpoint: Predictive numerics for deep learning. In: Proceedings of the 25th IEEE Symposium on Computer Arithmetic (ARITH’18), pp 1–4. https://doi.org/10.1109/ARITH.2018.8464801
Waterman A, Lee Y, Patterson DA, Asanovic K. (2011): The RISCV instruction set manual, volume I: Base userlevel ISA. EECS Department, UC Berkeley, Technical Report UCB/EECS201162 116
Zhang H, Ko S (2020) Design of power efficient posit multiplier. IEEE Trans Circuits Syst II Express Briefs 67(5):861–865
Acknowledgements
The present study has been possible thanks to the support of the team of the Barcelona Supercomputing Center dedicated to RISCV extension. We have conducted this work with their invaluable support, as a common partner of the same H2020 EPI project. To them goes our most sincere appreciation.
Funding
Open access funding provided by Università di Pisa within the CRUICARE Agreement..
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Cococcioni, M., Rossi, F., Ruffaldi, E. et al. Vectorizing posit operations on RISCV for faster deep neural networks: experiments and comparison with ARM SVE. Neural Comput & Applic 33, 10575–10585 (2021). https://doi.org/10.1007/s00521021058140
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521021058140