"ENVELOPE - Effizienz und Zuverlässigkeit: Selbstorganisation in HPC-Systemen" ist ein BMBF-Projekt als Teil der Förderung von Forschungsvorhaben auf dem Gebiet "Grundlagenorientierte Forschung für HPC-Software im Hoch- und Höchstleistungsrechnen" im Rahmen des Förderprogramms "IKT 2020 - Forschung für Innovationen".Start des Projekts war am 1. Januar 2017 mit einer Laufzeit von drei Jahren bis zum 31.12.2019 Das übergeordnete Ziel von ENVELOPE ist das Erreichen von systemunabhängiger Selbstorgansiation zur Abschätzung zwischen und Optimierung von Effizienz und Zuverlässigkeit in heterogenen HPC-Systemen. Entscheidend hierbei ist die Verbindung der Betrachtung des Systems auf Knotenebene mit einer globalen Sichtweise auf das System und Techniken auf der Anwendungsebene. Insbesodnere die Zuverlässigkeit ist bei der auch in Zukunft stetig steigenden Anzahl an Knoten und somit auch an Komponenten in HPC-Systemen und der damit einhergehenden Steigerung der Wahrscheinlichkeit eines Komponentenausfalls von entscheidender Wichtigkeit. Dabei können klassische Methoden zur Steigerung der Zuverlässigkeit aufgrund ihrer enormen Kosten und des Ressourcenverbrauchs nicht eingesetzt werden. Deshalb sollen in ENVELOPE proaktiv Komponentenausfälle erkannt werden, um bereits im voraus, und nur wenn tatsächlich notwendig, Maßnahmen für die Sicherstellung eines korrekten Anwendungsablaufs treffen zu können. Die entstehende Komplexität im Umgang mit heterogenen Systeme ist dabei vor dem Anwendungsprogrammierer zu verbergen.

Periscope Tuning Framework

Periscope is an automatic performance analysis and tuning tool for large scale parallel systems. It consists of a frontend and a hierarchy of communication and analysis agents. Each of the analysis agents, i.e., the nodes of the agent hierarchy, searches autonomously for inefficiencies in a subset of the application processes. It supports tuning plugins that automatically search for best settings for a set of tuning parameters. 

InvasIC- Invasive Computing (2010-2022)

The goal of the Transregional Collaborative Research Centre 89 funded by the German Science Foundation (DFG) is to investigate a completely novel paradigm for designing and programming future parallel computing systems called invasive computing. Parallel applications can actively invade and retreat resources, i.e., compute cores, memory and network resources, to adapt to the degree of available parallelism and other requirements. Our focus is on invasive computing for HPC. We collaborate with the Chair for Scientific Computing at TUM in the development of invasive HPC applications and in the development of invasive versions of OpenMP and MPI.

READEX - Runtime Exploitation of Application Dynamism for Energy-effcient Exascale Computing (2015-2018)

 The goal of the READEX project is to improved energy-efficiency of applications in the field of High-Performance Computing. The project brings together European experts from different ends of the computing spectrum to develop a tools-aided methodology for dynamic auto-tuning, allowing users to automatically exploit the dynamic behaviour of their applications by adjusting the system to the actual resource requirements. The READEX approach consists of two steps: 1. Design Time Analysis and 2. Runtime Application Tuning. TUM focuses on Design Time Analysis and applies the Periscope Tuning Framework to precompute a Tuning Model. This tuning model is then input to the READEX Runtime Library and will guide the dynamic switching of system configurations.

Software Architecture Analysis for Parallelization (2014-2018)

Siemens Corporate Technology, the central research and development department within Siemens, is working towards enabling technologies that help Siemens business units to migrate sequential applications to multicore processors. However, parallelizing existing applications is an intricate and tedious task that involves several steps. One of the first steps is to analyze the applications’ software architectures in order to determine appropriate starting points for parallelization. This is necessary since most industrial applications consist of a large amount of code that cannot be parallelized by focusing on a few hot spots only. Control and data dependencies spanning different components or even the whole systems usually require architectural changes which entail extensive refactoring. To solve these problems, Siemens Corporate Technology is researching methods and tools that support software architects as well as developers in redesigning their applications with a focus on concurrency within this collaboration project with TU München.

Automatic Tuning of Cloud Applications (2016-2020)

In collaboration with Instana and BMW, we are investigating automatic techniques for tuning of cloud applications. Instana provides a scalable, machine learning based monitoring and incident management infrastructure on top of which an automatic tuning infrastructure is currently under development. The infrastructures dynamically optimizes for given SLAs and budget constraints. With in a collaboration with BMW we are investigating automatic application partitioning for hybrid clouds supporting mixed criticality services and data. The project is funded by the DAAD, Instana and BMW.

Trends in Accelerator Technologies (2017-2018)

The TUM Incentive Fund supports a collaboration project with TUM's Eurotech partners DTU and EPFL and the Baumann State University in Moskau investigating novel accelerator technologies for Cloud and HPC. Part of this effort is the joint research between TUM and Baumann State University in the direction of a smart cache for graph algorithms. 

Score-P (2009-2018)

Since many years, TUM is collaborating with partners of the VI-HPS in developing a joint monitoring infrastructure for performance analysis and tuning tools. This work is currently funded by the READEX project. Previous projects are Score-E, SILC, and LMAC funded by the BMBF. Score-P provides offline profiling and tracing support, as well as online profiling and tuning. It is a joint development of Forschungszentrume Jülich, RWTH Aachen, TU Darmstadt, TU Dresden, TU München, and GNS. As part of the VI-HPS Tuning Workshops, TUM's Periscope Tuning Framework is presented as one of the tools using Score-P to HPC experts in Europe.