With a background of researcher (4 years at UCL working on IWRM modelling) and entrepreneur (ran my own consulting company during two years), I am now running the European branch of Enthought.
This blog is the place where I store potentially valuable information. I would be more than happy to get feedback on my posts !
Hello,
I have the new EPD and am trying to implement fft with fastnumpy. Have you run benches with fft? I’ve tried what I can to get fft to use both cores on a Core2 Duo, but no luck; only 50% utilization across both and performance similar to vanilla numpy 1.6.1
Script results:
True
Intel MKL version: Intel(R) Math Kernel Library Version 10.3.1 Product Build 20101110 for 32-bit applications
Intel cpu_clocks: 8903351421090
Intel cpu_frequency: 1.66251
max Intel threads: 2
using numpy 1.5.1
(2, 65536) items
simple loop 2.54858477242
____ Test script: _________
import numpy
import numpy.fft as fft
print numpy.use_fastnumpy
import time
#from scipy.fftpack import fft
import mkl
print ‘Intel MKL version:’, mkl.get_version_string()
print ‘Intel cpu_clocks:’, mkl.get_cpu_clocks()
print ‘Intel cpu_frequency:’, mkl.get_cpu_frequency()
#print ‘Intel MKL, freeing buffer memory:’, mkl.thread_free_buffers()
print ‘max Intel threads:’, mkl.get_max_threads()
mkl.set_num_threads(2)
N = 2**20
print ‘using numpy’, numpy.__version__
a = numpy.random.rand(2, N)
print a.shape, ‘items’
t0 = time.clock()
for i in range(10):
continue
base = time.clock()-t0
fftn = fft.fftn
t0 = time.clock()
for i in range(10):
r = fftn(a, (N,), (1,))
print ‘simple loop’, time.clock()-t0-base
Testing your script on my MacBookPro :
dpinte:tmp dpinte$ python test_fastnumpy.py
True
Intel MKL version: Intel(R) Math Kernel Library Version 10.3.1 Product Build 20101110 for 32-bit applications
Intel cpu_clocks: 10504847091303
Intel cpu_frequency: 2.66015408791
max Intel threads: 2
using numpy 1.5.1
(2, 1048576) items
simple loop 1.95484
Decresing the number of threads to 1 makes it go much faster :
dpinte:tmp dpinte$ python test_fastnumpy.py
True
Intel MKL version: Intel(R) Math Kernel Library Version 10.3.1 Product Build 20101110 for 32-bit applications
Intel cpu_clocks: 11993346847969
Intel cpu_frequency: 2.66027246734
max Intel threads: 1
using numpy 1.5.1
(2, 1048576) items
simple loop 1.218312
And on my Windows VM, we got good results :
Without fast numpy :
Z:\dpinte\tmp>python test_fastnumpy.py
False
Intel MKL version: Intel(R) Math Kernel Library Version 10.3.1 Produ
01110 for Intel(R) 64 architecture applications
Intel cpu_clocks: 206824057234718
Intel cpu_frequency: 2.66014196505
max Intel threads: 1
using numpy 1.5.1
(2L, 1048576L) items
simple loop 2.48767119843
With fastnumpy :
Z:\dpinte\tmp>python test_fastnumpy.py
True
Intel MKL version: Intel(R) Math Kernel Library Version 10.3.1 Produ
01110 for Intel(R) 64 architecture applications
Intel cpu_clocks: 206936224606227
Intel cpu_frequency: 2.66014026949
max Intel threads: 1
using numpy 1.5.1
(2L, 1048576L) items
simple loop 1.16259915716
That is a half of the time of the standard version of numpy.