Dai Yang, M.Sc.

Foto von Dai Yang

M.Sc. Dai Yang

Technische Universität München

Informatik 10 - Lehrstuhl für Rechnerarchitektur & Parallele Systeme (Prof. Schulz)


Boltzmannstr. 3
85748 Garching b. München

Ph.D. Candidate, Informatics
Research Associate @ CAPS 
Acting System Administrator @ CAPS



Short CV

  • B.Sc. and M.Sc. Informatios @TUM (Minor: Mechanical Engineering)
  • 学士+硕士,慕尼黑工业大学
  • Sys/SW Engineering @ Airbus DS Electronics/HENSOLDT
    软件工程师, 空中客车防务与太空公司
  • IT Operations, also Neubau EntrepreneurshipCenter @ UnternehmerTUM
  • Various Advisory Work at Student Projects

Research Interest

  • Fault Tolerance, Fault Prediction and Fault Avoidance, System Reliability and Safety
  • Self organisation in distributed systems
  • Performance Engineering in Parallel and Distributed Systems
  • Avionics Systems, Flight Control
  • Space Software
  • Machine Learning Automation



  • BfS Project Big Data for Gas Turbine


  • Project MOVE-II
  • Neubau (new Construction) UnternehmerTUM


Conference and Journal

  • Alvaro Frank, Dai Yang, Tim Süß , Martin Schulz, and André Brinkmann. Reducing False Node Failure Predictions in HPC. 26th IEEE International Conference on High Performance Computing, Data and Analytics (HiPC) 2019. Accepted for publication.
  • Bengisu Elis, Dai Yang, and Martin Schulz. 2019. QMPI: A Next Generation MPI Profiling Interface for Modern HPC Platforms. In Proceedings of the 26th European MPI Users’ Group Meeting (EuroMPI’ 19), Torsten Hoefler and Jesper Larsson Träff (Eds.). ACM, New York, NY, USA, Article 4, 10 pages
  • David Jauk, Dai Yang, and Martin Schulz. Predicting Faults in High Performance Computing Systems: An In-Depth Survey of the State-of-the-Practice. In SC 19’: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, 2019, Denver, Colorado, United States. Accepted for publication.
  • Dai Yang, Josef Weidendorfer, Tilman Küstner, Carsten Trinitis and Sibylle Ziegler. Enabling Application-Integrated Proactive Fault Tolerance. Par-Co 2017, Bologna, Italy. Accepted for publication. --PrePrint
  • Martin Schulz, Marc-André Hermanns, Michael Knobloch, Kathryn Mohror, Nathan T. Hjelm, Bengisu Elis, Karlo Kraljic, and Dai Yang: The MPI Tool Interfaces: Past, Present, and Future — Capabilities and Prospects.

Short Papers, Workshop Papers and Posters

  • Amir Raoofy, Dai Yang, Josef Weidendorfer, Carsten Trinitis and Martin Schulz: Enabling Malleability for Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics using LAIK. PARS Workshop 2019
  • Thomas Becker, Nico Rudolf, Dai Yang and Wolfgang Karl: Symptom-based Fault Detection in Modern Computer Systems. PARS Workshop 2019
  • Tejas Kale, Dai Yang, Event Driven Programming for Embedded Systems. 10th European CubeSat Symposium 2018. Toulouse, France. 
  • Thomas Becker, Dai Yang, Tilman Küstner and Martin Schulz. Co-Scheduling in a Tasked-Based Programming Model. Workshop on Co-Scheduling of HPC Applications (COSH'18) in Conjunction with HiPEAC Conference 2018. January 2018, Manchester, United Kingdom
  • Josef Weidendorfer, Dai Yang and Carsten Trinitis: LAIK: A Library for Fault Tolerant Distribution of Global Data for Parallel Applications. PARS Workshop 2017 --> Pre-Print
  • Dai Yang, Moritz Dötterl, Sebastian Rückerl and Amir Rooafy: Hardening the Linux Kernel agains Soft Errors. Poster for The 13th International School on the Effects of Radiation on Embedded Systems for Space Applications (SERESSA'17), Garching, Germany.


  • Absolventenfest / Tag der Informatik 2017: Ein Studium fürs Leben.
  • PAR-CO 2017: Enabling Application Integrated Fault Tolerance
  • PARS 2017: LAIK: A Library for Fault Tolerant Distribution of Global Data


Bachelor's Thesis: Emulation of VANET communication through virtualization of ECUs, Prof. Knoll, Dipl.-Ing. Manuel Schiller

Master's Thesis: Hazards from High System Entropy: An Explorative Analysis of Case Reports, Prof. Broy, Dr. Mario Gleirscher



Interdisiplinary Project, Guided Research

  • Design und Implementierung einer leichtgewichtigen fehlertoleranten Datenhaltungskomponente fuer HPC Systeme (Rosskopf)
  • Design und Implementierung eines Hardwareausfallsimulators am Beispiel eines Low-Budget IoT System (Jonischkeit)
  • Event Driven Sensor Data Processing for Embedded Systems (Kale)
  • Design and Implementation of a Portal for Visualization of HPC Workload Distribution (Podanev)

Invited Talks in Lectures:

  • Betriebsystem in der Spieleentwicklung (WS2017)
  • Einführung in die Rechnerarchitektur (WS2017)
  • Mikroprozessoren (WS2017, WS2018)
  • Lab Sessions: Virtualization Techniques (WS2018)

Practical Course:

  • Advanced Topics in Computer Architecture and Parallel Systems (WS2018)
  • Systemnahe Programmierung bei der Spieleentwicklung (WS2017)


  • Seminar: Trending Topics in HPC (WS2018, WS2019)
  • Seminar: Rechnertechnik in der Raumfahrt (SS2018, SS2019)
  • Proseminar: Mehrkernarchitektur (SS2017)
  • Seminar und Proseminar: Geschichte der Rechnerarchitektur (SS2017, SS2018, SS2019)
  • Seminar: Virtualisierungstechnik (SS2017)

Student Tutor:

  • Vorkurs Mathematik fuer Informatik


Reviewer, Paper Referee

  • COSH18,19
  • ARCS 2018
  • ISC 2018
  • IPDPS 2018
  • Computer Frontier 17
  • PARCO 17
  • IEEE Transaction on Parallel and Distributed Systems 2017

Conference and Workshop Organization

  • SERESSA 2017, Local Organization Chair & Web Chair
  • CompSpace 2018, Chair & PC
  • CompSpace 2019, Chair & PC

Further Information

For further information please check my LinkedIn page at here