After the relase of EPD 6.0 now linking numpy agains the Intel MKL library (10.2), I wanted to have some insight about the performance impact of the MKL usage.

**What impact does the MKL have on numpy performance ?**

I have very roughly started a basic benchmark comparing EPD 5.1 with EPD 6.0. The former is using numpy 1.3 with BLAS and the latter numpy 1.4 with the MKL. I am using a Thinkpad T60 with an Intel dual-core 2Ghz CPU running Windows 32bit.

! The benchmarking methodology is really poor and can be made much more realistic but it gives a first insight.

Contrary to what I said at the last LFPUG meeting on Wednesday, you can control the maximal number of threads used by the system using the OMP_NUM_THREADS environment variables. I have updated the benchmark script to show its value when running it.

Here are some results :

**Testing linear algebra functions**

I took some of the often used methods and barely compared the cpu time using the ipython timeit command.

Example 1 : eigenvalues

def test_eigenvalue():
i= 500
data = random((i,i))
result = numpy.linalg.eig(data)

The results are interesting 752ms for the MKL version versus 3376 for the ATLAS. That is a 4.5x faster. Testing the very same code on Matlab 7.4 (R2007a) gives a timing of 790ms.

Example 2 : single value decompositions

def test_svd():
i = 1000
data = random((i,i))
result = numpy.linalg.svd(data)
result = numpy.linalg.svd(data, full_matrices=False)

Results are 4608ms with the MKL versus 15990ms without. This is nearly 3.5x faster.

Example 3 : matrix inversion

def test_inv():
i = 1000
data = random((i,i))
result = numpy.linalg.inv(data)

Results are 418ms with the MKL versus 1457ms without. This is 3.5x faster

Example 4 : det()

def test_det():
i=1000
data = random((i,i))
result = numpy.linalg.det(data)

Results are 186ms with the MKL versus 400ms without. This is 2x faster.

Example 5 : dot()

def test_dot():
i = 1000
a = random((i, i))
b = numpy.linalg.inv(a)
result = numpy.dot(a, b) - numpy.eye(i)

Results are 666ms with the MKL versus 2444ms without. This is 3.5x faster.

**Conclusion :**

Linear algebra functions show a clear performance improvement. I am open to collect more information on that if you have some home made benchmarking. If the amount of information, we should consider publishing the results as official benchmark somewhere.

Function |
Without MKL |
With MKL |
Speed up |

test_eigenvalue |
3376ms |
752ms |
4.5x |

test_svd |
15990ms |
4608ms |
3.5x |

test_inv |
1457ms |
418ms |
3.5x |

test_det |
400ms |
186ms |
2x |

test_dot |
2444ms |
666ms |
3.5x |

For those of you wanting to test your environment, feel free to use the script here below.

Read the rest of this entry »