The matrix is torus-wrap mapped onto the processors(transparent to the user) and uses partial pivoting during the factorization of the matrix. Each processor contains a portion of the matrix and the right hand sides determined by a distribution function to optimally load balance the computation and communication during the factorization of the matrix. The general prescription is that no processor can have no more(or less) than one row or column of the matrix than any other processor. Since the input matrix is not torus-wrapped permutation of the results is performed to "unwrap the results" which is transparent to the user.
As stated previously the right hand side is appended to the matrix. When there is one right hand side this is attached to the first column of the processor mesh. This is shown in the next figure.
The first set of interfaces use Epetra vectors. The memory requirements using the 4 processor example are given below with the packing of the data into the vectors. Only the first two processors are given explicitly. Note the matrix is packed in column order.
After the solution process the answers are retrieved from the positions where the right hand sides were stored.