Home | Legals | Sitemap | KIT

Dependable Hardware

Registration

Our Software are provided free of charge under the academic license for non-commercial use. For commercial or industrial use please contact us (amrouch∂kit.edu).

To download our software, please select the appropriate package and fill in the registration form. You will receive a confirmation email followed by a second email with the download link within 24 hours.

In case of any problem, please contact: amrouch∂kit.edu, victor.santen∂kit.edu

Registration Form

*
Please kindly provide us with the following information for our reference.
*
*
*
*
*
*

* required fields

Abstract

Throughout technology scaling, circuit reliability has become extraordinary challenging. In fact, we are at an infection point where technology scaling for many designs is hardly feasible, documented by various SoC vendors that do not go beyond 14nm for some of their design products. Although technology in the nano-CMOS era enables the design of more powerful on-chip systems, designers must face the inevitable challenge regarding the fact that their circuits may not function as intended due to a wide range of degradation effects that occur during lifetime and even from the very first moment.

We focus on the following key categories of degradation effects:
1) Time-dependent Degradation Effects: They are due to aging phenomena that shift the electrical characteristics of transistors causing unreliable behavior of circuits over time.
2) Time-independent Degradation Effects: They are due to manufacturing variability that causes fluctuations in the geometry and electrical characteristics of transistors. In addition, noise due to extrinsic sources (e.g., energetic particles) or intrinsic sources causes unpredictable fluctuations that may suddenly degrade the reliability.

Our goal is studying the aforementioned degradations effects from the physical level to the system level [1, 2] towards providing designers at the systems level with abstracted, yet sufficiently accurate reliability estimations. To achieve that, we collaborate with different international research groups with varied expertise.

We are the first who:

  1. Investigated the instantaneous aging effects revealing that aging effects may occur in a μs domain [3, 4], shifting aging from a sole long-term (as treated by state of the art) to a short and long-term reliability challenge.
  2. Investigated the interdependencies of degradations effects at the physical level [5, 6] demonstrating their impact on computing (i.e. designing more effecting guardbands).
  3. Developed an infrared measurement setup for multi-core chips that provides accurate online thermal measurements  as it allows the infrared camera to directly measure the thermal emissions of chip without any layer in between [7, 8].

Released Software Tools:
Our developed software tools, libraries and models are publically available, under the academic license for non-commercial use, allowing other researchers to directly employing them without requiring any further modifications. To visit our download page, please click here.

Research Group Leader:
Dr.-Ing. Hussam Amrouch, amrouch∂kit.edu.

Selected Publications:
Copyright disclaimer: The pdf documents contained in this page are included to ensure timely dissemination of scholarly. Copyright and all rights therein are maintained by the authors or by other copyright holders such as IEEE, ACM, and others.

  • [1] Hussam Amrouch, Behnam Khaleghi, Andreas Gerstlauer, Jörg Henkel “Reliability-Aware Design to Suppress Aging” in ACM/EDAC/IEEE 53rd Design Automation Conference (DAC’16), Austin, TX, USA, June 5-9, 2016 (accepted), PDF.
  • [2] Hussam Amrouch, Javier Martin-Martinez, Victor van Santen, Miquel Moras, Rosana Rodriguez, Montserrat Nafria and Jörg Henkel, “Connecting the Physical and Application Level Towards Grasping Aging Effects”, in IEEE 53rd International Reliability Physics Symposium (IRPS’15), CA, USA, pp. 3D.1.1-3D.1.8, 2015, PDF.
  • [3] Victor M. van Santen, Hussam Amrouch, Javier Martin-Martinez, Montserrat Nafria, Jörg Henkel “Designing Guardbands for Instantaneous Aging Effects” in ACM/EDAC/IEEE 53rd Design Automation Conference (DAC’16), Austin, TX, USA, June 5-9, 2016 (accepted), PDF.
  • [4] Victor M. van Santen, Hussam Amrouch, Narendra Parihar, Souvik Mahapatra, Jörg Henkel, “Aging-Aware Voltage Scaling” in IEEE/ACM 19th Design, Automation and Test in Europe Conference (DATE’16), Dresden, Germany, pp. 576-581, 2016, PDF.
  • [5] Hussam Amrouch, Victor van Santen, Jörg Henkel “Interdependencies of Degradation Effects and their Impact on Computing” in IEEE Design & Test, 2016 (to appear), PDF.
  • [6] Hussam Amrouch, Victor M. van Santen, Thomas Ebi, Volker Wenzel, Jörg Henkel, “Towards interdependencies of aging mechanisms”,  IEEE/ACM 33rd International Conference on Computer-Aided Design (ICCAD’14), San Jose, USA, pp. 478-485, 2014, PDF.
  • [7] Hussam Amrouch, Jörg Henkel “Lucid Infrared Thermography of Thermally-Constrained Processors” in ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED’15), Rome, Italy, July 22-24, 2015, PDF.
  • [8] Alok Prakash, Hussam Amrouch, Muhammad Shafique, Tulika Mitra, Jörg Henkel “Improving Mobile Gaming Performance through Cooperative CPU-GPU Thermal Management” in ACM/EDAC/IEEE 53rd Design Automation Conference (DAC’16), Austin, TX, USA, June 5-9, 2016 (Best Paper Candidate), PDF.

Research Projects

Instantaneous Aging Effects

Instantaneous aging is a recent discovery that bears a large potential for circuits reliability since it is hardly explored until now. Though aging in general is known for a long time and the physical causes behind it such as bias temperature instability (BTI) have been extensively studied in last decade, investigating the impact of instantaneous aging on circuits’ reliability is still in its infancy. Traditionally, aging is treated as a long-term degradation of reliability in which it gradually degrades the chip’s performance over months or years. However, it has been recently reported that the effects of BTI may also be observed in a significantly shorter time domain, i.e. in the order of milliseconds or even microseconds. This is because technology scaling reaches its limit where displacing a few atoms within a transistor due to aging may endanger its entire functionality. In fact, this is a paradigm shift in aging from a sole long-term reliability degradation, as in the traditional view, to a short and even instantaneous reliability degradation.

Our model estimates the instantaneous effects of aging jointly with its long-term effects. The model can be employed within our SRAM reliability estimation tool to explore the impact of instantaneous effects aging on the reliability of SRAM cells.

Further details can be found in the following paper:

  • Victor M. van Santen, Hussam Amrouch, Javier Martin-Martinez, Montserrat Nafria, Jörg Henkel “Designing Guardbands for Instantaneous Aging Effects” in ACM/EDAC/IEEE 53rd Design Automation Conference (DAC’16), Austin, TX, USA, June 5-9, 2016, PDF

Our software tool is publically available under this link (click here).

 

Reliability-Aware Cell Libraries

Due to technology scaling, circuit reliability has become extraordinary challenging. Reliability-aware circuit design flows do virtually not exist and even research is in its infancy. In this project, we bring reliability awareness to EDA tool flows based on so-called degradation-aware cell libraries. These libraries include detailed delay and power information of gates/cells under the impact that degradation effects have on the electrical characteristics of pMOS and nMOS transistors such as Threshold Voltage (Vth), carrier mobility (µ), Drain Current (Id), etc. Our modeling of degradation effects starts from the lowest level where the physical mechanisms behind them do originate, i.e. from the physical level, all the way up to the system level in which degradation effects manifest themselves as failures.

Degradation-aware cell libraries and tool flows are indispensable for not only accurately estimating the required timing guardbands of circuits, but also efficiently containing them. By considering degradation effects during logic synthesis, significantly more resilient circuits can be obtained.

In the current version of our cell libraries, we consider the Bias Temperature Instability (BTI) aging phenomenon. The degradation-aware cell libraries have been created under varied aging stress scenarios. Libraries are ready to be used within existing tool flows (e.g., Synopsys).

In the near future, we plan to include other degradations effects and release updated versions of our degradation-aware cell libraries.

Further details can be found in the following paper:

  • Hussam Amrouch, Behnam Khaleghi, Andreas Gerstlauer, Jörg Henkel “Reliability-Aware Design to Suppress Aging” in ACM/EDAC/IEEE 53rd Design Automation Conference (DAC), Austin, TX, USA, June 5-9, 2016, PDF

Our libraries are publically available under this link (click here).

 

Interdependencies of Degradations Effects

Transistor degradation like aging, variability, etc. is known for a long time and some of the effects have even been discovered more than a century ago. Nevertheless, it was only around a decade ago when degradation effects became relevant within the reliability and lifetime of circuits since feature sizes of transistors began to approach atomic levels.
But still in its infancy is research that investigates the interdependencies of these and other degradation effects. Since the causes are of physical origin, it cannot be excluded that degradation effects influence each other i.e. amplify or delete each other - at least to some degree - and hence degradation effects should jointly and not separately be investigated.
In this work, we are reporting that there are indeed interdependencies of degradation effects that are non-negligible. We quantify for the first time the interdependencies and draw their impact on computing

Our developed model considers the joint impact (from a physical point of view) of the following degradation effects:
- Bias Temperature Instability (BTI).
- Random Telegraph Noise (RTN).
- Hot Carrier Induced Degradation (HCID).
- Random Dopant Fluctuation (RDF).
- Process Variation in Transistor Geometry (PV).
- Intrinsic Noise Source: 1/f, Thermal, Shot Noise.
- Extrinsic Noise Source due to energetic particles (radiation).

Further details can be found in the following paper:

  • Hussam Amrouch, Victor van Santen, Jörg Henkel “Interdependencies of Degradation Effects and their Impact on Computing” in IEEE Design & Test, 2016 (to appear).

Our software tool is publically available under this link (click here).

 

Reliability Estimation Tool for SRAM Cells

In this tool, we focus on abstracting the various degradation effects from the physical level all the way up to the system level. In addition, we also model the impact of radiation on generating soft errors in SRAM cells using a physical particle simulator. We study different types of 6T SRAM cells (e.g. high performance, low power, etc.). The key goal is to provide designers and software developers at the systems level with an abstracted, yet sufficiently accurate reliability estimation represented with a probabilistic fault analysis. The latter has a more meaningful interpretation of reliability, unlike state of the art that employ other quantification metrics such as (SNM, Vth, etc.)

We study the “Register File” of different processors architectures (MIPS, SPARCV8, ALPHA) as an example.

Further details can be found in the following papers:

  • Hussam Amrouch, Victor van Santen, Jörg Henkel “Interdependencies of Degradation Effects and their Impact on Computing” in IEEE Design & Test, 2016 (to appear), PDF
  • Hussam Amrouch, Victor M. van Santen, Thomas Ebi, Volker Wenzel, Jörg Henkel, “Towards interdependencies of aging mechanisms”, IEEE/ACM 33rd International Conference on Computer-Aided Design (ICCAD’14), San Jose, USA, pp. 478-485, 2014, PDF

Our software tool is publically available under this link (click here).

 

Lucid Infrared Thermography

Thermal analysis is a prerequisite for developing reliability increasing techniques for thermally-constrained processors, i.e. processors with a high power density. For that purpose, infrared (IR) camera measurement setups have been deployed with the purpose to provide direct feedback of the impact that thermal mitigation techniques have. To obtain lucid IR images, the IR-opaque cooling must be removed and hence, an alternative IR-transparent cooling needs to be provided to protect the chip. To this end, the majority of state of the art employs an IR coolant liquid to prevent the chip from overheating.  The problem is that several aspects like thermal convection may interfere with the measured IR radiations resulting in equivocal IR images. Thus, they decrease the accuracy in a way that leads to incorrectly estimating reliability.  Solving this prominent problem, we introduce an IR-transparent cooling that cools the chip from its rear side allowing the camera to perspicuously capture the IR emissions as no additional layer in between impedes the radiation. It maintains the on-chip temperatures within a safe range equivalent to the original heat sink-based cooling. In our work, we demonstrate how state-of-the-art inaccurate thermal analysis results in incorrectly estimating reliability. Our setup is the most accurate, least intrusive one that has been both proposed and actually applied to state-of-the-art multi-cores.

After the success of setup, we also developed the setup for the research group of Prof. Tulika Mitra at the National University of Singapore (NUS), Singapore. The setup is mainly employed to study the thermal behavior of state-of-the-art mobile processors.

Further details can be found in the following papers:

  • Hussam Amrouch, Jörg Henkel “Lucid Infrared Thermography of Thermally-Constrained Processors” in ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED´15), Rome, Italy, July 22-24, 2015, PDF
  • Alok Prakash, Hussam Amrouch, Muhammad Shafique, Tulika Mitra, Jörg Henkel “Improving Mobile Gaming Performance through Cooperative CPU-GPU Thermal Management” in ACM/EDAC/IEEE 53rd Design Automation Conference (DAC), Austin, TX, USA, June 5-9, 2016, PDF
  • Daniel Palomino, Muhammad Shafique, Hussam Amrouch, Altamiro Susin, Jörg Henkel, “hevcDTM: Application-Driven Dynamic Thermal Management for High Efficiency Video Coding” in The IEEE/ACM 17th Design Automation and Test in Europe Conference (DATE'14), Dresden, Germany, March 24-28, 2014.

If you are interested in building the setup, please contact (amrouch∂kit.edu).

 

Contact us

Dr.-Ing. Hussam Amrouch (Research Group Leader)          

Email: amrouch∂kit.edu

Webpage: http://ces.itec.kit.edu/~amrouch/
 

 

 

Dipl. Inform. Victor van Santen

Email: victor.santen∂kit.edu

Webpage: http://ces.itec.kit.edu/~vansanten