Documentation
Job submission scenario
- The user must first join the grid. By invoking mpiboot,
it spawns the MPD process which makes the local node join the
P2P-MPI group if it exists, or creates it otherwise.
- The job is then submitted by invoking a run command which
p2pmpirun nproc nreplica files executable
starts the process rank 0 of the MPI application on local host.
- Discovery: the local MPD issues a search request to find other MPDs pipe advertisements. When enough advertisements have been found, the local MPD sends into each discovered pipe, the socket where the MPI program can be contacted.
- Hand-check: the remote peer sends its FT and FD ports directly to the submitter's MPI process.
- File transfer: program and data are downloaded from the submitter host via the FT service.
- Execution Notification: once transfer is complete the FT service on remote host notifies its MPD to execute the downloaded program.
- Remote executable launch: MPD executes the downloaded program to join the execution platform.
- Execution preamble: all processes in the execution platform exchange
their IP addresses to construct their local communication table.
- Fault detection: MPI processes register in their local FD service and starts. Then FD will exchange their heart-beat message and will notify MPI processes if they become aware of a node failure.