## About Me

With a background of researcher (4 years at UCL working on IWRM modelling) and entrepreneur (ran my own consulting company during two years), I am now running the European branch of Enthought.

This blog is the place where I store potentially valuable information. I would be more than happy to get feedback on my posts !

### Like this:

Like Loading...

Hello,

I have the new EPD and am trying to implement fft with fastnumpy. Have you run benches with fft? I’ve tried what I can to get fft to use both cores on a Core2 Duo, but no luck; only 50% utilization across both and performance similar to vanilla numpy 1.6.1

Script results:

True

Intel MKL version: Intel(R) Math Kernel Library Version 10.3.1 Product Build 20101110 for 32-bit applications

Intel cpu_clocks: 8903351421090

Intel cpu_frequency: 1.66251

max Intel threads: 2

using numpy 1.5.1

(2, 65536) items

simple loop 2.54858477242

____ Test script: _________

import numpy

import numpy.fft as fft

print numpy.use_fastnumpy

import time

#from scipy.fftpack import fft

import mkl

print ‘Intel MKL version:’, mkl.get_version_string()

print ‘Intel cpu_clocks:’, mkl.get_cpu_clocks()

print ‘Intel cpu_frequency:’, mkl.get_cpu_frequency()

#print ‘Intel MKL, freeing buffer memory:’, mkl.thread_free_buffers()

print ‘max Intel threads:’, mkl.get_max_threads()

mkl.set_num_threads(2)

N = 2**20

print ‘using numpy’, numpy.__version__

a = numpy.random.rand(2, N)

print a.shape, ‘items’

t0 = time.clock()

for i in range(10):

continue

base = time.clock()-t0

fftn = fft.fftn

t0 = time.clock()

for i in range(10):

r = fftn(a, (N,), (1,))

print ‘simple loop’, time.clock()-t0-base

Testing your script on my MacBookPro :

dpinte:tmp dpinte$ python test_fastnumpy.py

True

Intel MKL version: Intel(R) Math Kernel Library Version 10.3.1 Product Build 20101110 for 32-bit applications

Intel cpu_clocks: 10504847091303

Intel cpu_frequency: 2.66015408791

max Intel threads: 2

using numpy 1.5.1

(2, 1048576) items

simple loop 1.95484

Decresing the number of threads to 1 makes it go much faster :

dpinte:tmp dpinte$ python test_fastnumpy.py

True

Intel MKL version: Intel(R) Math Kernel Library Version 10.3.1 Product Build 20101110 for 32-bit applications

Intel cpu_clocks: 11993346847969

Intel cpu_frequency: 2.66027246734

max Intel threads: 1

using numpy 1.5.1

(2, 1048576) items

simple loop 1.218312

And on my Windows VM, we got good results :

Without fast numpy :

Z:\dpinte\tmp>python test_fastnumpy.py

False

Intel MKL version: Intel(R) Math Kernel Library Version 10.3.1 Produ

01110 for Intel(R) 64 architecture applications

Intel cpu_clocks: 206824057234718

Intel cpu_frequency: 2.66014196505

max Intel threads: 1

using numpy 1.5.1

(2L, 1048576L) items

simple loop 2.48767119843

With fastnumpy :

Z:\dpinte\tmp>python test_fastnumpy.py

True

Intel MKL version: Intel(R) Math Kernel Library Version 10.3.1 Produ

01110 for Intel(R) 64 architecture applications

Intel cpu_clocks: 206936224606227

Intel cpu_frequency: 2.66014026949

max Intel threads: 1

using numpy 1.5.1

(2L, 1048576L) items

simple loop 1.16259915716

That is a half of the time of the standard version of numpy.