The complexity and elevated requirements of modern applications have driven the development of computing systems characterized by increased number of processing cores, heterogeneity and complex communication interconnect. Inevitably, in order for these systems to yield their maximum performance, novel resource management mechanisms are required. Towards this direction, this presentation outlines the relevant research activities of the Microprocessors and Digital VLSI Systems Laboratory (Microlab) of National Technical University of Athens, which aim at systematically capturing and tackling the intricacies and properties of efficient management of contemporary computing systems. As a representative analysis, the building blocks and design decisions of a run-time resource manager targeting many-core computing systems with Network-on-Chip (NoC) interconnection, are presented. Due to the high complexity and fast response requirements of dynamically mapping many concurrently running applications, a novel run-time resource management framework is introduced, aiming at providing a scalable solution based on distributed decision-making mechanisms. This Distributed Run-Time Resource Management (DRTRM) framework is implemented and evaluated on top of Intel SCC, an actual many-core, NoC based computing platform. Apart from performance optimization, the increased probability of manifested hardware errors is addressed, a side-effect of the tight integration of many processing elements on the same system. SoftRM is introduced, a DRTRM augmented with fault tolerant features offering dynamic recovery from the manifested errors. SoftRM extends the well-known Paxos consensus algorithm concepts, providing dynamic self-organization and workload-aware error mitigation. The joint goal for performance and reliability continues to drive the research efforts of Microlab, extending these notions to heterogeneous architectures (FPGA based systems), hierarchical deployment of devices (Edge computing) as well as dominant application fields such as Machine Learning. The last part of the presentation, outlines the relevant achieved results, open issues and research directions, including the alignment of these research directives with the requirements and needs of Microlab’s industrial and academic partners.
Vasileios Tsoutsouras received his Diploma and Ph.D. degree in Electrical and Computer Engineering from the Microprocessors and Digital Systems Laboratory of the National Technical University of Athens, Greece in 2013 and 2018, respectively. The main topics of his research include dynamic resource management of many-core computing systems, Edge computing in Internet of Things architectures and HW/SW co-design. He has published over 20 technical and research papers in scientific books, international conferences and journals. Since 2013, he has also worked as a research associate of the Institute of Communication and Computer Systems (ICCS) in 3 European founded projects regarding run-time resource management of medical embedded devices and Cloud infrastructure.