POE stands for “POR with out-of-order and elusive interleavings".
MPI programs suffer from “elusive" interleavings in the presence of MPI
wildcard Receive R(*) . Issuing the MPI sends in different orders does not
help as the MPI runtime provides no guarantees based on when the Send is
actually issued. POE employs dynamic source re-writing in place of wildcard
so that every interleaving explored by POE algorithm is deterministic. Also,
the presence of collectives such as MPI\_Barrier can cause traditional
runtime verification techniques such as “Dynamic Partial Order Reduction" to
fail since it is not always possible to issue instructions in different orders
in the presence of collectives. POE also addresses this issue by creating
“match sets".
POE algorithm works by creating a graph structure with the MPI operations of
processes as nodes and by adding Intra-Completes-Before (IntraCB) edges
between these nodes. As the name suggests, IntraCB edges are only added between
the MPI operations within the MPI processes. The IntraCB edges are added
between the MPI operations based on the MPI semantics of the operations. MPI
functions like MPI_Barriers, MPI_Wait etc have the semantics,
that no later MPI operations can be issued until these MPI operations complete.
We call such operations as fence operations. Note that instructions
issued before the fence operations can linger even after the fence operation
is complete.
The IntraCB edges are added based on the following rules: Let i and j be the MPI operations of a process P such that i < j (i.e., i comes before j in the program order. There is an IntraCB edge between i and j is one of the following is true:
If there is a path from i to j then i is called the ancestor of j. POE algorithm uses these IntraCB edges to form match-sets to be issued into the MPI runtime. The following is the (informal) algorithm:
An MPI operation is “matched" if it has been issued to the MPI runtime by the POE algorithm.
MPI_Wait, then ⟨ i ⟩ forms a match set.
MPI_Finalize, check if any wildcard match sets are yet to be explored. If yes, then restart the processes and goto step 1, otherwise, exit.
MPI_Finalize, then there is a deadlock. Flag a deadlock and kill all the MPI processes.
ADDITIONAL DETAILS: Additional details of POE algorithm can be found here.
This document was translated from LATEX by HEVEA.