Abstract
Throughout technology scaling, circuit reliability has become extraordinary challenging. In fact, we are at an inflection point where technology scaling for many designs is hardly feasible, documented by various SoC vendors that do not go beyond 14nm for some of their design products. Although technology in the nano-CMOS era enables the design of more powerful on-chip systems, designers must face the inevitable challenge regarding the fact that their circuits may not function as intended due to a wide range of degradation effects that occur during lifetime and even from the very first moment.
We focus on the following key categories of degradation effects:
1) Time-dependent Degradation Effects: They are due to aging phenomena that shift the electrical characteristics of transistors causing unreliable behavior of circuits over time.
2) Time-independent Degradation Effects: They are due to manufacturing variability that causes fluctuations in the geometry and electrical characteristics of transistors. In addition, noise due to extrinsic sources (e.g., energetic particles) or intrinsic sources causes unpredictable fluctuations that may suddenly degrade the reliability.
3) Degradation-induced Security Attacks: Technology scaling is approaching its limit in which displacing a few atoms within transistors due to aging phenomena may permanently or temporally endanger the functionality of the entire on-chip system leading to new kinds to security attacks. Such attacks are, unlike the traditional security attacks that have been intensively studied in the last few decades, cannot be traced back because the underlying mechanisms can naturally heal and recover after the attack ends. Therefore, such emerging security attacks due to degradation effects are significantly more challenging and research is still in its infancy [9, 10].
Research Group Leader:
Dr.-Ing. Hussam Amrouch
Our goal is studying the aforementioned degradations effects from the physical level to the system level [1, 2] towards providing designers at the systems level with abstracted, yet sufficiently accurate reliability estimations. To achieve that, we collaborate with different international research groups with varied expertise.
We are the first who:
- Investigated the instantaneous aging effects revealing that aging effects may occur in a μs domain [3, 4], shifting aging from a sole long-term (as treated by state of the art) to a short and long-term reliability challenge.
- Investigated the interdependencies of degradations effects at the physical level [5, 6] demonstrating their impact on computing (i.e. designing more effecting guardbands).
- Developed an infrared measurement setup for multi-core chips that provides accurate online thermal measurements as it allows the infrared camera to directly measure the thermal emissions of chip without any layer in between [7, 8].
Released Software Tools:
Our developed software tools, libraries and models are publically available, under the academic license for non-commercial use, allowing other researchers to directly employing them without requiring any further modifications. To visit our download page, please click .
Selected Publications:
Copyright disclaimer: The pdf documents contained in this page are included to ensure timely dissemination of scholarly. Copyright and all rights therein are maintained by the authors or by other copyright holders such as IEEE, ACM, and others.
- [1] Hussam Amrouch, Behnam Khaleghi, Andreas Gerstlauer, Jörg Henkel “Reliability-Aware Design to Suppress Aging” in ACM/EDAC/IEEE 53rd Design Automation Conference (DAC’16), Austin, TX, USA, June 5-9, 2016 (accepted), PDF.
- [2] Hussam Amrouch, Javier Martin-Martinez, Victor van Santen, Miquel Moras, Rosana Rodriguez, Montserrat Nafria and Jörg Henkel, “Connecting the Physical and Application Level Towards Grasping Aging Effects”, in IEEE 53rd International Reliability Physics Symposium (IRPS’15), CA, USA, pp. 3D.1.1-3D.1.8, 2015, PDF.
- [3] Victor M. van Santen, Hussam Amrouch, Javier Martin-Martinez, Montserrat Nafria, Jörg Henkel “Designing Guardbands for Instantaneous Aging Effects” in ACM/EDAC/IEEE 53rd Design Automation Conference (DAC’16), Austin, TX, USA, June 5-9, 2016 (accepted), PDF.
- [4] Victor M. van Santen, Hussam Amrouch, Narendra Parihar, Souvik Mahapatra, Jörg Henkel, “Aging-Aware Voltage Scaling” in IEEE/ACM 19th Design, Automation and Test in Europe Conference (DATE’16), Dresden, Germany, pp. 576-581, 2016, PDF.
- [5] Hussam Amrouch, Victor van Santen, Jörg Henkel “Interdependencies of Degradation Effects and their Impact on Computing” in IEEE Design & Test, 2016 (to appear), PDF.
- [6] Hussam Amrouch, Victor M. van Santen, Thomas Ebi, Volker Wenzel, Jörg Henkel, “Towards interdependencies of aging mechanisms”, IEEE/ACM 33rd International Conference on Computer-Aided Design (ICCAD’14), San Jose, USA, pp. 478-485, 2014, PDF.
- [7] Hussam Amrouch, Jörg Henkel “Lucid Infrared Thermography of Thermally-Constrained Processors” in ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED’15), Rome, Italy, July 22-24, 2015, PDF.
- [8] Alok Prakash, Hussam Amrouch, Muhammad Shafique, Tulika Mitra, Jörg Henkel “Improving Mobile Gaming Performance through Cooperative CPU-GPU Thermal Management” in ACM/EDAC/IEEE 53rd Design Automation Conference (DAC’16), Austin, TX, USA, June 5-9, 2016 (Best Paper Candidate), PDF.
- [9] Hussam Amrouch, Prashanth Krishnamurthy, Naman Patel, Jörg Henkel, Ramesh Karri, Farshad Khorrami, "Emerging (Un-)Reliability Based Security Threats and Mitigations for Embedded Systems" in IEEE/ACM International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'17), OCTOBER 15-20, Seoul, South Korea, 2017, PDF.
- [10] Interview in the LOOKKIT Magazine, PDF
Research Projects
Security for Cyber-physical Systems
There have been repeated documented cyber attacks on control processes in cyber-physical systems (CPSs) and this trend is on the rise. Attackers could inject unwanted/malicious software that run in the background in the embedded control system. Such software could leak data, corrupt processes, provide access to unauthorized users and lead to catastrophic problems. In fact, Cyber Security is the biggest concern nowadays for researchers, industries as well as governments.
In our project, we are the first who demonstrate thermal side channel attacks using infrared images of processor chips. To achieve that, a unique setup, wherein the processor chip is cooled from underneath, has been built to obtain accurate, high-resolution thermal heat images of the die of a multi-core processor. Our project aims at providing the industry of processor chip design with new methodology on how attacks can be captured based on the run-time thermal behavior of processors. We collaborate in our project with researchers at the New York University who conduct research in cyber-physical systems.
In addition, we also investigate from the physical level all the way to the system level how reliability degradations due to e.g., aging mechanisms can open the door for emerging hardware security threats and attacks in the current and upcoming technology nodes.
This is because technology scaling is approaching its limit in which displacing a few atoms within transistors due to aging phenomena may permanently or temporally endanger the functionality of the entire on-chip system leading to new kinds to security attacks. Such attacks are, unlike the traditional security attacks that have been intensively studied in the last few decades, cannot be traced back because the underlying mechanisms can naturally heal and recover after the attack ends. Therefore, such emerging security attacks due to degradation effects are significantly more challenging and research is still in its infancy.
Contact:
Research Group Leader: Dr.-Ing. Hussam Amrouch
Publication:
[1]Hussam Amrouch, Prashanth Krishnamurthy, Naman Patel, Jörg Henkel, Ramesh Karri, Farshad Khorrami, "Emerging (Un-)Reliability Based Security Threats and Mitigations for Embedded Systems" in IEEE/ACM International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'17), OCTOBER 15-20, Seoul, South Korea, 2017.
[2] Hussam Amrouch, Jörg Henkel, "Lucid Infrared Thermography of Thermally-Constrained Processors"
in ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'15),
Rome, Italy, July 22-24, 2015.
[3] Interview in the KITLOOK Magazine
Instantaneous Aging Effects
Instantaneous aging is a recent discovery that bears a large potential for circuits reliability since it is hardly explored until now. Though aging in general is known for a long time and the physical causes behind it such as bias temperature instability (BTI) have been extensively studied in last decade, investigating the impact of instantaneous aging on circuits’ reliability is still in its infancy. Traditionally, aging is treated as a long-term degradation of reliability in which it gradually degrades the chip’s performance over months or years. However, it has been recently reported that the effects of BTI may also be observed in a significantly shorter time domain, i.e. in the order of milliseconds or even microseconds. This is because technology scaling reaches its limit where displacing a few atoms within a transistor due to aging may endanger its entire functionality. In fact, this is a paradigm shift in aging from a sole long-term reliability degradation, as in the traditional view, to a short and even instantaneous reliability degradation.
Our model estimates the instantaneous effects of aging jointly with its long-term effects. The model can be employed within our SRAM reliability estimation tool to explore the impact of instantaneous effects aging on the reliability of SRAM cells.
Further details can be found in the following paper:
- Victor M. van Santen, Hussam Amrouch, Javier Martin-Martinez, Montserrat Nafria, Jörg Henkel “Designing Guardbands for Instantaneous Aging Effects” in ACM/EDAC/IEEE 53rd Design Automation Conference (DAC’16), Austin, TX, USA, June 5-9, 2016, PDF
Reliability-Aware Cell Libraries
Due to technology scaling, circuit reliability has become extraordinary challenging. Reliability-aware circuit design flows do virtually not exist and even research is in its infancy. In this project, we bring reliability awareness to EDA tool flows based on so-called degradation-aware cell libraries. These libraries include detailed delay and power information of gates/cells under the impact that degradation effects have on the electrical characteristics of pMOS and nMOS transistors such as Threshold Voltage (Vth), carrier mobility (µ), Drain Current (Id), etc. Our modeling of degradation effects starts from the lowest level where the physical mechanisms behind them do originate, i.e. from the physical level, all the way up to the system level in which degradation effects manifest themselves as failures.
Degradation-aware cell libraries and tool flows are indispensable for not only accurately estimating the required timing guardbands of circuits, but also efficiently containing them. By considering degradation effects during logic synthesis, significantly more resilient circuits can be obtained.
In the current version of our cell libraries, we consider the Bias Temperature Instability (BTI) aging phenomenon. The degradation-aware cell libraries have been created under varied aging stress scenarios. Libraries are ready to be used within existing tool flows (e.g., Synopsys).
In the near future, we plan to include other degradations effects and release updated versions of our degradation-aware cell libraries.
Further details can be found in the following paper:
- Hussam Amrouch, Behnam Khaleghi, Andreas Gerstlauer, Jörg Henkel “Reliability-Aware Design to Suppress Aging” in ACM/EDAC/IEEE 53rd Design Automation Conference (DAC), Austin, TX, USA, June 5-9, 2016, PDF
Interdependencies of Degradations Effects
Transistor degradation like aging, variability, etc. is known for a long time and some of the effects have even been discovered more than a century ago. Nevertheless, it was only around a decade ago when degradation effects became relevant within the reliability and lifetime of circuits since feature sizes of transistors began to approach atomic levels.
But still in its infancy is research that investigates the interdependencies of these and other degradation effects. Since the causes are of physical origin, it cannot be excluded that degradation effects influence each other i.e. amplify or delete each other - at least to some degree - and hence degradation effects should jointly and not separately be investigated.
In this work, we are reporting that there are indeed interdependencies of degradation effects that are non-negligible. We quantify for the first time the interdependencies and draw their impact on computing
Our developed model considers the joint impact (from a physical point of view) of the following degradation effects:
- Bias Temperature Instability (BTI).
- Random Telegraph Noise (RTN).
- Hot Carrier Induced Degradation (HCID).
- Random Dopant Fluctuation (RDF).
- Process Variation in Transistor Geometry (PV).
- Intrinsic Noise Source: 1/f, Thermal, Shot Noise.
- Extrinsic Noise Source due to energetic particles (radiation).
Further details can be found in the following paper:
- Hussam Amrouch, Victor van Santen, Jörg Henkel “Interdependencies of Degradation Effects and their Impact on Computing” in IEEE Design & Test, 2016 (to appear).
Reliability Estimation Tool for SRAM Cells
In this tool, we focus on abstracting the various degradation effects from the physical level all the way up to the system level. In addition, we also model the impact of radiation on generating soft errors in SRAM cells using a physical particle simulator. We study different types of 6T SRAM cells (e.g. high performance, low power, etc.). The key goal is to provide designers and software developers at the systems level with an abstracted, yet sufficiently accurate reliability estimation represented with a probabilistic fault analysis. The latter has a more meaningful interpretation of reliability, unlike state of the art that employ other quantification metrics such as (SNM, Vth, etc.)
We study the “Register File” of different processors architectures (MIPS, SPARCV8, ALPHA) as an example.
Further details can be found in the following papers:
- Hussam Amrouch, Victor van Santen, Jörg Henkel “Interdependencies of Degradation Effects and their Impact on Computing” in IEEE Design & Test, 2016 (to appear), PDF
- Hussam Amrouch, Victor M. van Santen, Thomas Ebi, Volker Wenzel, Jörg Henkel, “Towards interdependencies of aging mechanisms”, IEEE/ACM 33rd International Conference on Computer-Aided Design (ICCAD’14), San Jose, USA, pp. 478-485, 2014, PDF
Lucid Infrared Thermography
Thermal analysis is a prerequisite for developing reliability increasing techniques for thermally-constrained processors, i.e. processors with a high power density. For that purpose, infrared (IR) camera measurement setups have been deployed with the purpose to provide direct feedback of the impact that thermal mitigation techniques have. To obtain lucid IR images, the IR-opaque cooling must be removed and hence, an alternative IR-transparent cooling needs to be provided to protect the chip. To this end, the majority of state of the art employs an IR coolant liquid to prevent the chip from overheating. The problem is that several aspects like thermal convection may interfere with the measured IR radiations resulting in equivocal IR images. Thus, they decrease the accuracy in a way that leads to incorrectly estimating reliability. Solving this prominent problem, we introduce an IR-transparent cooling that cools the chip from its rear side allowing the camera to perspicuously capture the IR emissions as no additional layer in between impedes the radiation. It maintains the on-chip temperatures within a safe range equivalent to the original heat sink-based cooling. In our work, we demonstrate how state-of-the-art inaccurate thermal analysis results in incorrectly estimating reliability. Our setup is the most accurate, least intrusive one that has been both proposed and actually applied to state-of-the-art multi-cores.
After the success of setup, we also developed the setup for the research group of Prof. Tulika Mitra at the National University of Singapore (NUS), Singapore. The setup is mainly employed to study the thermal behavior of state-of-the-art mobile processors.
Further details can be found in the following papers:
- Hussam Amrouch, Jörg Henkel “Lucid Infrared Thermography of Thermally-Constrained Processors” in ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED´15), Rome, Italy, July 22-24, 2015, PDF
- Alok Prakash, Hussam Amrouch, Muhammad Shafique, Tulika Mitra, Jörg Henkel “Improving Mobile Gaming Performance through Cooperative CPU-GPU Thermal Management” in ACM/EDAC/IEEE 53rd Design Automation Conference (DAC), Austin, TX, USA, June 5-9, 2016, PDF
- Daniel Palomino, Muhammad Shafique, Hussam Amrouch, Altamiro Susin, Jörg Henkel, “hevcDTM: Application-Driven Dynamic Thermal Management for High Efficiency Video Coding” in The IEEE/ACM 17th Design Automation and Test in Europe Conference (DATE'14), Dresden, Germany, March 24-28, 2014.