We present and analyze an implementation of Strassen's matrix multiplication algorithm. Our method is twice as fast as the standard multiplication for matrices of size $512 \times 512$.
In 1969 Strassen discovered that it's possible to multiply $n \times n$ matrices in sub-cubic running time $O(n^\log 7)$. Since then, a lot of effort was spend in designing algorithms with better asymptotic running time. The current record $O(n^2.37)$ is due to Winograd and Coppersmith (1990).
It's not clear if any of these advanced methods are of practical use. Therefore, we investigate Strassen's divide and conquer algorithm. We apply a recursive storage scheme to avoid computational overhead for accessing sub-matrices. Also, we use the standard multiplication in combination with loop unrolling to speed up the recursion end.
We give an estimate for the number of computations and evaluate the efficiency of our optimizations. Finally, we compare our constants to experimental results.
Selected References: