Home » Performance Optimisation of Numerical Simulation Codes

Performance Optimisation of Numerical Simulation Codes

General aims of the course

Numerical simulations got an ever-increasing role as a research tool and have to fulfil more and more demanding expectations on their performance.

Much focus is given usually to the modelling of research problems and the development of numerical algorithms. However, numerical algorithms have to be implemented very efficiently in order to meet their performance requirements.

Therefore, knowledge about performance optimisation techniques becomes increasingly important also for application developers who could trust on compilers and other software tools in the past to unleash the computational power of their systems. This course provides background and applicable knowledge for the development of efficient numerical simulation programs that provide high performance on recent computer systems.

Intended learning outcomes

After the course you will be able

  • to understand important properties of modern computer architectures and their influence on the performance of numerical simulations,
  • to identify performance bottlenecks in numerical simulations,
  • to implement numerical algorithms that can be executed efficiently and with high performance as well as to improve existing numerical simulatio applications.

Course main content

The course is an introduction to performance optimisation methods for numerical simulations with respect to the execution on the underlying hardware platforms.

Starting from current computer architectures, methods for the performance tuning of serial programs will be discussed. The efficient usage of memory hierarchies and arithmetic logic units are central content for it: caches, loop optimisations, vectorization and other implementation techniques.

Consequently, topics on the optimisation of parallel programs will follow. These comprise the use of shared memory computers with the associated challenges of non-uniform memory access and cache-coherency issues when using OpenMP and Pthreads. The optimised usage of MPI, the Message-Passing Interface, will be discussed as a subject for performance improvements on distributed memory computers.

Finally, hybrid parallelisation approaches will be discussed.


Knowledge of

  • one of the programming lanuages C, C++ or Fortran, and
  • parallel programming (for example with OpenMP, MPI or Pthreads).

Furthermore, optional and benefical is basic knowledge of

  • numerical analysis and computer science, and
  • the use of Linux, Unix or Mac OS X (especially the shell bash).

You will need to bring a laptop and you will need to access to eduroam.


Literature will be announced here in good time before the course starts.

Course schedule

First course week: Introductory lectures and presence labs
week 46 (2015-11-09 – 2015-11-13) at KTH in Stockholm

Second course week: Homework assignments covering single aspects of performance optimisation
week 47 (2015-11-16 – 2015-11-20)

Third course week: Project work: complete optimisation of a provided numerical simulation
from week 48 (2015-11-23…)


  • Participation in lectures and presence labs
  • Homework assignments
  • Project work

The number of credits for the course is 5.0 ECTS.


Michael Schliephake, PDC Center for High Performance Computing, KTH
michs at kth.se


Deadline for application: 2015-10-26.