ALIs
kommt nochMessage Passing Interface
MPI provides an API for performing distributed memory message passing in parallel programs. Supported programming languages are Fortran, C, C++, Java, Perl and Python. This document describes the common supported features of all MPI implementations used on the LRZ HPC systems as well as providing documentation on the MPI standard.
Table of contents
- Introductory remarks
- Standardization
- Parallel environments on LRZ systems
- Compiling and linking
- Executing MPI binaries
- Interactive vs. Batch
- Startup mechanisms
- SPMD mode
- MPMD execution
- Hybrid parallel programs
- MPI-2 and other special topics
- Troubleshooting MPI
- General MPI Documentation
- Standard documents
- Off-site MPI information
- Tutorials
Introductory remarks
The message passing interface is at present the most heavily used paradigm for parallel programming on LRZ's HPC systems. In particular, to fully exploit the capabilities for the specialized interconnects used in supercomputers, a number of proprietary MPI implementations are deployed at LRZ. A full list of available MPI environments is provided below.
Standardization
In order to guarantee portability as well as to allow vendors to produce well-optimized implementations, the interface is standardized. The most up-to-date release of the standard is Version 2.2. Basic functionality covered by MPI is
- point-to-point communication in blocking, nonblocking, buffered and unbuffered modes
- collective communication (e.g., all-to-all, scatter/gather, reduction operations)
- using basic and derived MPI data types
- and running on a static processor configuration
More advanced functionality (which may not be fully implemented by a given real-world implementation) is
- parallel I/O operations (MPI-IO)
- dynamic process generation
- one-sided communication routines
- extended collective operations
- external interfaces, and improved language bindings.
Parallel environments on LRZ systems
A parallel environment is automatically set up at login by automatically loading an environment module appropriate to the system used. Alternative MPI environments are normally also available; these can be accessed by switching to a different module.
A parallel environment may not be usable on all systems; in such a case loading the environment module will fail with an error message.
All environments are listed in the following table; links to individual subdocuments which contain specifics on the implementation are also provided.
|
Fully supported MPI environments |
|||
|---|---|---|---|
|
Hardware Interface |
supported Compiler(s) |
MPI flavour |
Environment Module Name |
|
cache-coherent NUMAlink |
Intel Fortran, C, C++ |
SGI MPT |
mpi.altix |
| Infiniband | Intel Fortran, C, C++ |
SGI MPT
default environment on Nehalem-based ICE systems |
mpi.mpt |
| Any |
Intel Compilers
GCC PGI Compilers |
Parastation MPI
default environment on IA64 and x86_64 cluster systems |
mpi.parastation |
|
Any |
Intel Fortran, C, C++ (others are possible) |
mpi.intel |
|
|
Experimental MPI environments |
|||
|
Hardware Interface |
supported Compiler(s) |
MPI flavour |
Environment Module Name |
|
Any, but may only partially work or have reduced performance |
Intel Fortran, C, C++ |
mpi.ompi |
|
|
Any, but may have reduced performance for distributed systems |
Intel Compilers GCC (Others are possible) |
MPICH2 |
mpi.mpich2 |
If multiple compilers are supported, this is typically encoded into the module name. For example, the module for the PGI supported Parastation MPI is called mpi.parastation/5.0/pgi, where the number 5.0 refers to the Parastation release. The module will require that a suitable fortran/pgi and ccomp/pgi modules be loaded and perform the setup depending on the loaded version.
Finally, it should be remarked that different parallel environments are normally not binary compatible, so switching over to an alternative MPI requires
- complete recompilation
- relinking, and
- in most cases also using the same environment for execution
In particular, binaries built on an Itanium-based system will not run on a standard x86_64 CPU and vice versa.
Compiling and linking
For compilation and linkage, compiler wrappers are made available which should be used since they automatically attend to adding the correct include paths and required libraries. The following table illustrates how compilation and linkage might be performed for all supported languages:
|
Language |
Compile |
Link |
| Fortran | mpif90 -c -o foo.o foo.f90 | mpif90 -o myprog.exe myprog.o foo.o |
| C | mpicc -c -o bar.o bar.c | mpicc -o cprog.exe cprog.o bar.o |
| C++ | mpiCC -c -o stuff.o stuff.cpp | mpiCC -o Cprog.exe cprog.o stuff.o |
Of course, suitable application-specific include paths, macros and library paths need to be added (typically via the -I, -D and -L switches, respectively), as well as compiler-specific optimization and/or debugging switches.
Executing MPI binaries
Interactive vs. Batch
Generally, running parallel programs interactively is discouraged since interactive resources are shared among all users logged in to the system, and are targeted at performing program development (editing and compiling code and preparing input data and job scripts for production work). So, anything running longer than a few minutes and with more than 4 MPI tasks should be submitted as a batch job. Please consult the batch documents for the Supercomputer and the Cluster systems for details on how to use the respective batch facilities.Startup mechanisms
SPMD mode
The standard way of starting up MPI programs in SPMD mode is to use the mpiexec command:
mpiexec -n 128 ./myprog.exe
will execute the single program myprog.exe with 128 MPI tasks on as many cores (provided sufficient resources are available!). In some cases it may also be necessary to use the legacy mpirun command:
mpirun -np 128 ./myprog.exe
Please consult the vendor-specific subdocument for details or vendor-specific extensions on the startup mechanism.
MPMD execution
For multiple program multiple data mode, the standard way to start up is to specify multiple clauses to mpiexec:
mpiexec -n 12 ./calculate.exe : -n 4 ./control.exe
will start up 16 MPI tasks in its MPI_COMM_WORLD, where 12 are run with the binary calculate.exe and 4 are run with the binary control.exe. The binaries must of course have a consistent communication structure.
Hybrid parallel programs
For execution of hybrid parallel MPI programs (for example in conjunction with OpenMP), the startup mechanism depends on the MPI implementation as well as the compiler used; also, it may be necessary to link with a thread-safe version of the MPI libraries. While a setup like
export OMP_NUM_THREADS=4
mpiexec -n 12 ./myprog.exe
might work, starting 12 tasks using 4 threads each (with a resource requirement of 48 cores), there's a good chance that performance will be bad due to incorrect placement of tasks and/or threads. So please consult the vendor-specific subdocument and/or the vendor-specific documentation for further information on how to optimize hybrid execution.
MPI-2 and other special topics
This subsection is in preparation and will contain links to additional pages describing specific MPI-2 features.
Troubleshooting MPI
General MPI Documentation
Standard documents
- MPI 2.1 (3.2 MB PDF)
- MPI 2.2 (3.3 MB PDF)
- MPI Reference Card (for C interface, PostScript)
Note: Version 2.2 was released in September 2009; existing implementations will not yet contain any of the (few) new features incorporated in that version.
Off-site MPI information
The MPI Home page provides general information about MPI.
Development of the standard is done by the MPI Forum; the release of the next version (3.0) is expected for 2010.
There exists a Wikipedia article about MPI which also contains some example programs.
Tutorials
- Parallel programming courses at HLRS
- Introduction to MPI (University of Cambridge computing services)
- MPI course at EPCC
- A User's Guide to MPI, by Peter Pacheco. A brief tutorial introduction to important features of MPI for C programmers (original is located at ftp://math.usfca.edu/pub/MPI/mpi.guide.ps).
- Examples from the book: "Using MPI"
- Examples from the book: "Using MPI-2"
Please also consult the LRZ HPC training page for the latest course documents.