Prof. Dr. Michael Gerndt

Architecture of Parallel and Distributed Systems


Technische Universität München
Fakultät für Informatik
Boltzmann Str. 3
85748 Garching

Phone:+49 (89) 289 17652

Cloud and IoT Systems

Cloud systems enable on-demand provisioning of IT resources on a pay-per-use basis. The basis of clouds are autonomous management technologies enabling large scale service infrastructures. Distribution of resources is omnipresent in cloud systems to enable scalability and fault tolerance. Recently, cloud systems are also used in the context of the Internet-of-Things to enable scalable processing of dynamic workloads by utilizing centralized high performance hosts and clusters in combination with edge devices for data preprocessing and reduced latencies. Our research is currently focusing on autoscaling concepts for cloud applications and the architecture of cloud based IoT systems.


  • Predictive Autoscaling Engine (2016-2020, DAAD): The project designs and implements a Predictive Autoscaling Engine for the cloud-based distributed web-applications receiving user requests and processing them in the backend. The software will derive and execute the time-bound sequence of scaling actions that will allow meeting the service-level objectives (SLO) minimizing the cost of used cloud services and resources at the same time by timely provision of the needed cloud capacity.
  • AI for Smart Cloud Operations (2018-2020, BMBF Software Campus): The goal of the project is to significantly increase the reliability of the cloud-based digital infrastructure via means of AI-based autonomous maintenance thus allowing cloud services in such critical domains as e.g. healthcare, logistics, and manufacturing. This project is run in collaboration with Huawei Research.
  • Cloud and IoT Industry Training : This project offers courses for industry in the field of cloud and IoT.
  • Public Private Cloud Tuning (2018, BMW): Private and public clouds will be used in industry in combination for protecting critical data but enable extreme scaling. We investigate methods for automatic decomposition of application for public and private clouds.


  • Lecture on Cloud Computing (IN2073), WS, 4 Credits
  • Master Lab on IoT (IN2106, IN4224), WS+SS, 10 Credits

Theses and Student Projects

  • Budgeting Requirements Prediction for Cloud Services, Latifah Mahna, Master Thesis 2018
  • Real-time container clusters for seamless computing, Fariz Huseynli, Master Thesis 2018


  • Anshul Jindal, Vladimir Podolskiy, and Michael Gerndt. 2018. Autoscaling Performance Measurement Tool. In Companion of the 2018 ACM/SPEC International Conference on Per-formance Engineering (ICPE '18). ACM, New York, NY, USA, 91-92. DOI:, 2018
  • A. Jindal, V. Podolskiy, M. Gerndt: Multilayered Cloud Applications Autoscaling Performance Estimation, IEEE 7th International Symposium on Cloud and Service Computing (SC2), Outstanding Paper Award, doi: 10.1109/SC2.2017.12, 2017


  • Vladimir Podolskiy (Ph.D candidate)
  • Anshul Jindal (Master Student)
  • Anastasia Myasnichenko (Master Student)

High Performance Computing

High Performance Computing is a pillar of scientific and industrial research since a long time. More recently HPC is also important in the fields of machine learning and big data analytics. Architectures are leveraging latest technologies such as highly-parallel multicore processors, accelerators like GPGPUs or FPGAs, high bandwidth memory and high performance networks. Utilizing these resources effectively requires careful tuning of applications. Our work focuses on performance analysis and tuning tools for HPC systems.


  • Periscope Tuning Framework: Periscope is an automatic performance analysis and tuning tool for large scale parallel systems. It consists of a frontend and a hierarchy of communication and analysis agents. Each of the analysis agents, i.e., the nodes of the agent hierarchy, searches autonomously for inefficiencies in a subset of the application processes. It supports tuning plugins that automatically search for best settings for a set of tuning parameters.
  • InvasIC- Invasive Computing (2010-2022, DFG): The goal of the Transregional Collaborative Research Centre 89 funded by the German Science Foundation (DFG) is to investigate a completely novel paradigm for designing and programming future parallel computing systems called invasive computing. Parallel applications can actively invade and retreat resources, i.e., compute cores, memory and network resources, to adapt to the degree of available parallelism and other requirements. Our focus is on invasive computing for HPC. We collaborate with the Chair for Scientific Computing at TUM in the development of invasive HPC applications and in the development of invasive versions of OpenMP and MPI. (Subproject D3)
  • READEX - Runtime Exploitation of Application Dynamism for Energy-effcient Exascale Computing (2015-2018, EU):  The goal of the READEX project is to improved energy-efficiency of applications in the field of High-Performance Computing. The project brings together European experts from different ends of the computing spectrum to develop a tools-aided methodology for dynamic auto-tuning, allowing users to automatically exploit the dynamic behaviour of their applications by adjusting the system to the actual resource requirements. The READEX approach consists of two steps: 1. Design Time Analysis and 2. Runtime Application Tuning. TUM focuses on Design Time Analysis and applies the Periscope Tuning Framework to precompute a Tuning Model. This tuning model is then input to the READEX Runtime Library and will guide the dynamic switching of system configurations.
  • Software Architecture Analysis for Parallelization (2014-2018) Siemens Corporate Technology, the central research and development department within Siemens, is working towards enabling technologies that help Siemens business units to migrate sequential applications to multicore processors. However, parallelizing existing applications is an intricate and tedious task that involves several steps. One of the first steps is to analyze the applications’ software architectures in order to determine appropriate starting points for parallelization. This is necessary since most industrial applications consist of a large amount of code that cannot be parallelized by focusing on a few hot spots only. Control and data dependencies spanning different components or even the whole systems usually require architectural changes which entail extensive refactoring. To solve these problems, Siemens Corporate Technology is researching methods and tools that support software architects as well as developers in redesigning their applications with a focus on concurrency within this collaboration project with TU München.


  • Lecture on Advanced Computer Architecture (IN2067), WS, 6 Credits
  • Lecture on Parallel Programming (IN2147), SS, 5 Credits
  • Lecture on Parallel Program Engineering (IN2310), SS, 5 Credits
  • Lab course on Efficient Programming of Multicore Processors and HPC systems (IN2106), SS, 10 Credits
  • Lab course Programming of Supercomputers (IN2190), WS, 5 Credits

Theses and Student Projects

  • Automatic Sensitivity Analysis of Energy-Eciency Tuning on Input Characteristics, Shristi Dasgupta, Master Thesis 2018
  • The Elastic Phase Oriented Programming Model for Elastic HPC Applications, Jophin John, Master Thesis 2018
  • Integration of Apache Spark with Invasive Resource Manager, Jeeta Ann Chacko, Master Thesis 2018


  • sdfssf


Michael Gerndt received a Ph.D. in Computer Science in 1989 from the University of Bonn. He developed SUPERB the first automatic parallelizer for distributed memory parallel machines. For two years, in 1990 and 1991, he held a postdoc position at the University of Vienna and joined Research Centre Juelich in 1992 where he concentrated on programming and implementation issues of shared virtual memory systems. This research led to his habilitation in 1998 at Technische Universität München (TUM). Since 2000 he is professor for architecture of parallel and distributed systems at TUM. His current research focuses on programming models and tools for scalable parallel architectures. He is leading the development of the automatic performance analysis tools Periscope and of iOMP, an extension of OpenMP for invasive computing. iOMP is a research project in the new Transregional Collaborative Research Center InvasIC (TR 89) funded by the German Science Foundation. In addition he is heading projects on parallel programming languages and their implementation on multicore processors as well as resource management in Cloud environments funded by public and industry sources. Since October 2011 he is the coordinator of the European FP7 project AutoTune on automatic online tuning. Michael Gerndt is the contact person of the Faculty of Informatics for international affairs.

Previous Projects

  • Score-E - Scalable Tools for the Analysis and Optimization of Energy Consumption in HPC (2013-2016) The main objective of the Score-E project, funded under the 3rd "HPC software for scalable parallel computers" call of the Federal Ministry of Education and Research (BMBF), is to provide user-friendly analysis and optimization tools for the energy consumption of HPC applications. TUM focuses on the implementation of tuning support in the Score-P monitoring system and of energy tuning plugins in the Periscope Tuning Framework.
  • AutoTune (2011-2014) The AutoTune project focuses on extending Periscope to the Periscope Tuning Framework combining performance and energy efficiency analysis with automatic tuning plugins. The project is coordinated by TUM. The project partners are University of Vienna, Universitat Autonoma de Barcelona, CAPS entreprises, Leibniz Computing Centre, and University of Galway.
  • ISAR - Integrated System and Application analysis for massively parallel systems in the petascale Range (2009-2011) The goal of the three-year BMBF project ISAR is the realization of an integrated scalable system and application analysis software for the use in production environments. This software is based on the Periscope toolkit.
  • SILC - Scalable Infrastructure for the Automated Performance Analysis of Parallel Codes (2009-2011) Funded by the German Ministry for Education and Research, the goal of the SILC project (Scalable Infrastructure for the Automated Performance Analysis of Parallel Codes) is therefore the design and implementation of a scalable and easy-to-use performance measurement infrastructure for supercomputing applications as a basis for several already existing performance-analysis tools developed by partner institutions.
  • MAPCO - Multicore Architecture and Programming Model Co-Optimization (2009-2012) MAPCO investigates the performance of OpenMP and MPI programs on available and future HPC multicore processors. In order to perform efficient design space exploration and performance evaluation, appropriate extensions to existing multicore simulation environments for shared and distributed address spaces are developed.
  • Efficient Parallel Strategies in Computational Modelling of Materials (2009-2012) The project will develop a new paradigm for the parallelisation of density functional theory (DFT) methods for electronic structure calculations and implement this new strategy. Advanced embedding techniques will account for environment effects (e.g. solvent, support) on a system. We propose a strong modularisation of the DFT approach, facilitating task specific parallelisation, memory management, and low-level optimisation.
  • Cloud Computing, a Scientific and Economic Enabler for Morocco (2011-2013) This collaboration project with Moroccon universities and research institutions focuses on Cloud Computing. We jointly develop course material for university courses, workshops for industry as well as a German Moroccon Cloud. The Cloud platform will be used for Cloud research and for compute-intensive applications. The project is funded by the International Office of the Federal Ministery of Education and Research.
  • Ensemble Programming (2009-2012) Current programming models for HPC architectures force the application developer to structure his application in a way that optimizes the code for the hierarchical structure. In this research project we develop an new programming model that is based on a fine granular program, e.g., on the level of individual molecules, that enables automatic aggregation to optimize the performance on hierarchical systems.
  • CAS2 - Intellligent Policy Driven Business Service Management (2008-2010) Intelligent policy driven business service management is an innovation that can be considered as a reference architecture for service-based IT management of advanced analytical applications and solutions with similar performance characteristics. A prototypical implementation of this reference architecture is intended for early customer validation. This project is funded by the IBM Center for Advanced Studies.
  • Performance Analysis Tools for Peta-Scale Computing Systems (2007-2008) The goal of this project is to cooperate in the development of performance analysis tools for peta-scale computers. It is a cooperation project between Center for Advanced Computing Research at Caltech, Center for Applied Scientific Computing at Lawrence Livermore National Laboratory, and LRR/TUM. It is funded by BaCaTec.
  • ARM (2006-2008) The objective of this project is to develop an Autonomous Resource Management Framework for large-scale applications based on Grid and Open System Management Standards which allows for flexible integration of different business applications as well as IT-Resource providers. The project is funded by IBM until 2008.
  • Periscope (2005-2008) This Periscope project is funded by DFG (2005-2008). The goal is to further enhance the prototype of an automatic performance analysis environment developed in the PERIDOT project.
  • PALMA (2004-2006) The PALMA project (Performance Monitoring and Analysis of Large Distributed Systems using Mobile Agents) is a collaboration project with Jadavpur University in Colcatta, India. The goal is to develop a SLA-based performance tuning environment for Grid applications.
  • CrossGrid (2002-2005) The CrossGrid project will develop, implement and exploit new Grid components for interactive compute and data intensive applications like simulation and visualisation for surgical procedures, flooding crisis, team decision support systems, distributed data analysis in high-energy physics, air pollution combined with weather forecasting. It is a project with 21 European partners and is closely related to the DataGrid project. Our task is to develop high-level performance analysis support being able to evaluate performance properties defined in the APART specification language based on low-level performance data.
  • EP-Cache (2002-2005) The EP-Cache project is funded by the German Federal Ministry for Education and Research. It is a collaboration of four german partners. The goal is to develop new hardware monitoring concepts and, on top of those, new performance analysis tools and program tuning techniques.
  • APART (1999-2004) The Esprit Working Group APART (Automatic Performance Analysis: Resources and Tools) is a group of 7 European and 3 American partners. The working group will explore all issues in automatic performance analysis support for parallel machines and grids. Working Group on Tools for Porting Applications to SMP-Clusters The working group will collect requirements for tools for porting applications from SMP systems to SMP-clusters, identify state-of-the-art tools in facilitating the transformation and deployment of shared memory application for SMP-Clusters, and will identify enhancements for existing tools as well as new tools.
  • Peridot (2001-2004) This Peridot project (PERormance Indication and Detection of bottlenecks Occuring on Teraflop computers) is funded by KONWIHR. The goal is to implement an automatic performance analysis environment for the Hitachi SR8000.