::ELIMINATING THE LAUNCHER
The information on this page would be useful for users who want to launch each process in an application manually or want to use their own launcher to start an application.
The Current Launcher
It is possible to launch an MPICH application without using the provided launcher. First we need to know what the launcher does and then we can show how to launch an application without it. MPICH.NT uses environment variables to communicate with the spawned processes, so any launcher that can provide the required environment variables could launch an MPICH.NT application.
What the launcher does:
1) Create the first process
Process zero acquires a port to listen on and then communicates this port number back to the launcher.
2) Create the rest of the processes
The launcher then creates all the rest of the processes, informing them of which port the first process is listening on through an environment variable.
Here are the environment variables set by the launcher:
Required | |
MPICH_JOBID | Unique string accross all machines used to create named objects like mutexes and shared memory queues. I create this string by appending a number to the root hostname (ie. fry14). The launcher uses this value as a key in the registry to store information about running mpich applications |
MPICH_IPROC | The rank of the current process. |
MPICH_NPROC | The total number of processes. |
MPICH_ROOT | The hostname of the root process and the port where it is listening. Use a colon to separate the host name and port: hostA:port or a.b.c.d:port |
MPICH_EXTRA | Only valid on the root process. The name of a temporary file used to communicate the port number from the root process to the launcher. |
Conditional | |
MPICH_SHM_LOW | The lowest rank that the current process can reach through shared memory queues. |
MPICH_SHM_HIGH | The highest rank the current process can reach through shared memory queues. |
MPICH_COMNIC | The name of the network card used for mpi communication connections if it is different from that which gethostname returns. |
Without the Launcher
The key to eliminating the launcher is to remove the interaction with the first process. If you set MPICH_ROOTPORT to an available port number in the envionment of the first process then the process will use this port and it will not attempt to write the number out to the file described by MPICH_EXTRA.
Here is an example.
I brought up two command prompts on two separate machines, set the environment variables and ran an application according to the charts below:
Host | Fry | Jazz |
Environment | MPICH_JOBID=fry.123 MPICH_IPROC=0 MPICH_NPROC=2 MPICH_ROOT=fry:12345 |
MPICH_JOBID=fry.123 MPICH_IPROC=1 MPICH_NPROC=2 MPICH_ROOTHOST=fry:12345 |
Command | netpipe.exe | netpipe.exe |
Here is the same example on a single machine which uses shared memory:
Host | Fry | Fry |
Environment | MPICH_JOBID=fry.2000 MPICH_IPROC=0 MPICH_NPROC=2 MPICH_ROOT=fry:12345 MPICH_SHM_LOW=0 MPICH_SHM_HIGH=1 |
MPICH_JOBID=fry.2000 MPICH_IPROC=1 MPICH_NPROC=2 MPICH_ROOT=fry:12345 MPICH_SHM_LOW=0 MPICH_SHM_HIGH=1 |
Command | netpipe.exe | netpipe.exe |
Here is an example of four processes on two machines which mixes shared memory and socket communication:
Host | Fry | Fry | Jazz | Jazz |
Environment | MPICH_JOBID=fry.100 MPICH_IPROC=0 MPICH_NPROC=4 MPICH_ROOT=fry:12345 MPICH_SHM_LOW=0 MPICH_SHM_HIGH=1 |
MPICH_JOBID=fry.100 MPICH_IPROC=1 MPICH_NPROC=4 MPICH_ROOT=fry:12345 MPICH_SHM_LOW=0 MPICH_SHM_HIGH=1 |
MPICH_JOBID=fry.100 MPICH_IPROC=2 MPICH_NPROC=4 MPICH_ROOT=fry:12345 MPICH_SHM_LOW=2 MPICH_SHM_HIGH=3 |
MPICH_JOBID=fry.100 MPICH_IPROC=3 MPICH_NPROC=4 MPICH_ROOT=fry:12345 MPICH_SHM_LOW=2 MPICH_SHM_HIGH=3 |
Command | mandel.exe | mandel.exe | mandel.exe | mandel.exe |
This is the exact process for the first example from a command prompt:
On Fry
C:\Temp>set MPICH_JOBID=fry.123
C:\Temp>set MPICH_IPROC=0
C:\Temp>set MPICH_NPROC=2
C:\Temp>set MPICH_ROOT=fry:12345
C:\Temp>netpipe.exe
On Jazz
C:\Temp>set MPICH_JOBID=fry.123
C:\Temp>set MPICH_IPROC=1
C:\Temp>set MPICH_NPROC=2
C:\Temp>set MPICH_ROOTHOST=fry:12345
C:\Temp>netpipe.exe