Software Schemes for Resilience Against Soft Errors Exponentially growing rate of soft errors makes reliability a major concern in modern processor design. Initial efforts to protect computation from soft errors were at the hardware level. However, software solutions offer flexible protection (protect only critical applications or critical parts of an application), and can protect computation on even existing off-the-shelf processors. As a result, several software approaches for detecting soft errors have been developed over the last 2-3 decades. In this decade, we have been busy double-checking the software techniques, and re-evaluating how good they are? Existing software detection techniques can be classified into two main groups, i) duplication-based, and ii) control flow checking. We have demonstrated that existing control flow checking techniques are not effective in detecting SDCs; in fact, they may actually increase the program’s vulnerability to soft errors. We have also shown that the state-of-the-art instruction duplication techniques like SWIFT also are unable to detect many SDCs, and are not as effective as advertised. Through all this, we gather insights into the vulnerability windows of programs and components, and propose a technique nZDC (nearly Zero silent Data Corruption) to detect (almost) all soft errors. Extensive fault injection experiments on almost all the un-protected microarchitectural components in a gem5 simulated ARM Cortex A53 demonstrate that nZDC is extremely effective, without incurring any more performance penalty than the state-of-the-art. Further, I will talk about our techniques to detect and recover from soft errors.
Prof. Aviral Shrivastava is Associate Professor in the School of Computing Informatics and Decision Systems Engineering at the Arizona State University, where he has established and heads the Compiler and Microarchitecture Labs (CML).
He received his Ph.D. and Masters in Information and Computer Science from the University of California, Irvine, and bachelors in Computer Science and Engineering from Indian Institute of Technology, Delhi. He is a 2011 NSF CAREER Award Recipient, and recipient of 2012 Outstanding Junior Researcher in CSE at ASU. His works have received several best paper nominations, including at DAC 2017, and a best student paper award at VLSI 2016. His students have received outstanding Ph.D student award in CSE@ASU in 2017, and outstanding MS student award in CSE at ASU in 2012 and 2010.
Prof. Shrivastava’s research lies in the broad area of “Software for Embedded and Cyber-Physical Systems.” More specifically, Prof. Shrivastava is interested in topics around i) Compilers and microarchitectures for heterogeneous and many-core computing, ii) protecting computation from soft errors, and iii) Precise timing for Cyber-Physical Systems.
His research is funded by NSF, DOE, NIST, and several industries including Microsoft, Raytheon Missile Systems, Intel, Nvidia, etc. He is currently serving as associate editor for ACM Transactions Embedded Computing Systems (ACM TECS), IEEE Transactions on MultiScale Computing (IEEE TMSC), IEEE Transactions on Computer-Aided Design (IEEE TCAD), and Springer International Journal on Parallel Processing (Springer IJPP), and Springer Design Automation for Embedded Systems (Springer DAEM), and guest editor in ACM Transactions of Cyber-Physical Systems (ACM TCPS). He has served as the program chair of CODES+ISSS 2017 and 2018, and LCTES 2019.
Photos from the talk (click to enlage)