TALK: Temperature Aware Multiprocessing with Reliability Considerations

  • Speaker:
    Dr. Devendra Rai

    ETH Zurich, Switzerland

  • Location:

    H120,
    Technologiefabrik,
    Karlsruhe

  • Date: June 29th, 2015, 10:30am

Abstract:
Today, we expect that each new generation of computer processors delivers higher computational performance as compared to its predecessor. Thus, as users of recent generation of mobile phones, we can
now play games with rich graphics, enjoy rich multimedia, navigate the world, create and edit movies, none of which was possible only a few years ago. Similarly, our notebook computers are now as powerful as desktops of yesterday, allowing us to work effectively on the move. However, the impressive computational performance of state-of-the-art processors is not without its own design challenges  which must be properly understood and overcome. Specifically, such processors are increasingly vulnerable to overheating, a major contributor being non ideal technology scaling. In the long term, processors which either run too hot or experience rapid changes in temperature have reduced reliability as compared to the processors whose operating temperature has been carefully controlled. In the short  term, the performance of tasks and services executing on a hot processor may deteriorate to an unacceptable level, and in extreme cases, hosted tasks and services may become unavailable if the processor shuts itself down to avoid any thermally induced damage.
Though many techniques to avoid processor overheating are already available, both at the hardware and software level. However, these approaches rely either on using novel processor architectures, novel materials, or buffering tasks. In my talk, I will talk about avoiding processor overheating by trying to accurately predict the temperature of the state-of-the-art processors, given limited runtime information. The strength of the approach being that it is applicable to most commercially available state-of-the-art processors, and can be combined with other temperature management approaches. To this end, I will present a calibration based approach to construct the thermal model of the given processor, which includes the effects of system fan(s). I will also briefly talk about an analytical approach to estimate worst case peak temperature. The calibration based technique has been validated through prototyping on multiple state-of-the-art multicore processors, such as the Intel Xeon E5 series, and the Intel i7 mobile processor.

Speaker biography:
Devendra Rai (MS '09, University of Virginia) recently completed his PhD from ETH Zurich. Most of his research efforts have been towards developing three mutually orthogonal, but complementary approaches to avoid, tolerate, and recover from faults in multi- and many-core systems, with an emphasis on thermally induced faults. One of his papers titled "Power agnostic technique for efficient temperature estimation of multicore embedded systems" at ESWeek in 2012. Towards the end of his PhD, Devendra also started exploring how traditional information security constructs can be defeated by an adversary relying on thermal modulation techniques. In his spare time, he is developing on an end-to-end secured communication algorithm with a strong emphasis on user privacy.