Advertisement

A Fast Global AVF Calculation Methodology for Multi-core Reliability Assessment

  • Jiajia JiaoEmail author
  • Dezhi Han
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 931)

Abstract

Soft error induced bits upset has received increasing attention in reliable processor design. To measure the processor reliability, Architectural Vulnerability Factor (AVF) is often calculated by fast Architectural Correct Execution (ACE) analysis or accurate fault injection in a CPU core (e.g., alpha, x86, ARM) processor or GPU. However, the AVF calculation in the entire multicore system composed of several CPU cores, GPU, caches and memory banks, mostly depends on time consuming realistic laser tests or complex fault injection (days or years). To shorten the evaluation time, this paper presents a partition-based AVF calculation methodology from local to global. This approach combines local AVF for each component and Input-Output Masking (IOM) calculation between components to calculate the global AVF fast using probabilistic theory in a cost-effective way. The comprehensive simulation results of a 7-way parallelized motion detection application demonstrate the error location and error propagation path affects global AVF values. The probabilistic theory driven global AVF estimation time is only the order of magnitude in seconds.

Keywords

Multicore Soft error Architectural Vulnerability Factor Input-Output Masking Reliability assessment 

Notes

Acknowledgement

This work is supported by National Natural Science Foundation, numbered 61502298 and Innovation Program of Shanghai Maritime University.

References

  1. 1.
    Mukherjee, S.S., Weaver, C., Emer, J., et al.: A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In: 36th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 29–40. IEEE (2003)Google Scholar
  2. 2.
    Mukherjee, S.S., Emer, J., Reinhardt, S.K.: The soft error problem: an architectural perspective. In: 11 International Symposium on High-Performance Computer Architecture, 2005. HPCA-11, pp. 243–247. IEEE (2005)Google Scholar
  3. 3.
    Fu, X., Li, T., Fortes, J.: Sim-SODA: a unified framework for architectural level software reliability analysis. In: Workshop on Modeling, Benchmarking and Simulation (2006)Google Scholar
  4. 4.
    Tan, J., Yi, Y., Shen, F., et al.: Modeling and characterizing GPGPU reliability in the presence of soft errors. Parallel Comput. 39(9), 520–532 (2013)CrossRefGoogle Scholar
  5. 5.
    Jiao, J., Juan, D.C., Marculescu, D., et al.: Exploiting component dependency for accurate and efficient soft error analysis via probabilistic graphical models. Microelectron. Reliab. 55(1), 251–263 (2015)CrossRefGoogle Scholar
  6. 6.
    Srinivasan, S., Koren, I., Kundu, S.: Online mechanism for reliability and power-efficiency management of a dynamically reconfigurable core. In: 33rd IEEE International Conference on Computer Design (ICCD), pp. 327–334. IEEE (2015)Google Scholar
  7. 7.
    Namazi, A., Abdollahi, M., Safari, S., et al.: LORAP: low-overhead power and reliability-aware task mapping based on instruction footprint for real-time applications. In: Euromicro Conference on Digital System Design (DSD), pp. 364–367. IEEE (2017)Google Scholar
  8. 8.
    Pouyan, F., Azarpeyvand, A., Safari, S., et al.: Reliability aware throughput management of chip multi-processor architecture via thread migration. J. Supercomput. 72(4), 1363–1380 (2016)CrossRefGoogle Scholar
  9. 9.
    Vargas, P.F.R.: Evaluation of the SEE sensitivity and methodology for error rate prediction of applications implemented in Multi-core and Many-core processors. Université Grenoble Alpes (2017)Google Scholar
  10. 10.
    Jiao, J., Fu, Y., Wen, S.: Accelerated assessment of fine-grain AVF in NoC using a multi-cell upsets considered fault injection. Microelectron. Reliab. 54(11), 2629–2640 (2014)CrossRefGoogle Scholar
  11. 11.
    Tselonis, S., Gizopoulos, D.: GUFI: a framework for GPUs reliability assessment. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 90–100. IEEE (2016)Google Scholar
  12. 12.
    Hari, S.K.S., Tsai, T., Stephenson, M., et al.: SASSIFI: an architecture-level fault injection tool for GPU application resilience evaluation. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 249–258. IEEE (2017)Google Scholar
  13. 13.
    Previlon, F.G., Egbantan, B., Tiwari, D., et al.: Combining architectural fault-injection and neutron beam testing approaches toward better understanding of GPU soft-error resilience. In: IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 898–901. IEEE (2017)Google Scholar
  14. 14.
    Vega, A., Buyuktosunoglu, A., Bose, P.: Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems. In: 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 245–256 (2013)Google Scholar
  15. 15.
    Sun, Y., Gong, X., Ziabari, A.K., et al.: Hetero-mark, a Benchmark suite for CPU-GPU collaborative computing. In: IEEE International Symposium on Workload Characterization (IISWC), pp. 1–10. IEEE (2016)Google Scholar
  16. 16.
    Rosa, F., Ost, L., Reis, R., et al.: Fast fault injection to evaluate multicore systems soft error reliability (2017)Google Scholar
  17. 17.
    Vargas, V., Ramos, P., Velazco, R., et al.: Evaluating SEU fault-injection on parallel applications implemented on multicore processors. In: IEEE 6th Latin American Symposium on Circuits and Systems (LASCAS), pp. 1–4. IEEE (2015)Google Scholar
  18. 18.
    Haghdoost, A., Asadi, H., Baniasadi, A.: Using input-to-output masking for system-level vulnerability estimation in high-performance processors. In: 15th CSI International Symposium on Computer Architecture and Digital Systems (CADS), pp. 91–98. IEEE (2010)Google Scholar
  19. 19.
    Jiao, J., Han, X., Fu, Y.: A PGM based multi-level reliability analysis method for data cache. IEICE Electron. Express 12(16), 20150453 (2015)CrossRefGoogle Scholar
  20. 20.
    Boutellier, J., Nyländen, T.: Design flow for GPU and multicore execution of dynamic dataflow programs. J. Signal Process. Syst. 89(3), 469–478 (2017)CrossRefGoogle Scholar
  21. 21.
    Jiao, J., Marculescu, D., Juan, D.C., Fu, Y.: A two-level approximate model driven framework for characterizing multi-cell upsets impacts on processors. Microelectron. J. 48, 7–17 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Shanghai Maritime UniversityShanghaiChina

Personalised recommendations