This material is part of the course Practical Introduction to Parallel Programming and meant for educational purposes only.
Task
Implement a parallel program that executes the Gram-Schmidt algorithm to make the columns of an \(N \times N\) matrix \(\mathbf{A}\) orthonormal through a series of dot products and vector subtractions.
The pseudo-code of the Gram-Schmidt algorithm reads as follows:
Note that \(\mathbf{A}_{j}\) denotes the \(j\)-th column of the matrix \(\mathbf{A}\) and superscript \(\top\) stands for taking the transpose.
The code below show a serial implementation of the Gram-Schmidt algorithm.
In the parallel implementation of the Gram-Schmidt algorithm, the matrix \(\mathbf{A}\) is distributed row-wise over the processors:
This means that each thread handles \(\frac{N}{p}\) rows, with \(p\) the number of threads.
Hints
Use the functions matrix_alloc
and matrix_free
to create and destroy a matrix with arbitrary dimensions.
Define a struct data_t
that is to be passed to each thread. It should contain at least the matrix size, a pointer to the matrix, a pointer to a shared reduction variable, and the first and last row indices to be processed by a thread.
You must define a data_t
instance for each thread!
What performance do you get? How could you improve the performance?