About Me

With a background of researcher (4 years at UCL working on IWRM modelling) and entrepreneur (ran my own consulting company during two years), I am now running the European branch of Enthought.

This blog is the place where I store potentially valuable information. I would be more than happy to get feedback on my posts !

Advertisements

2 Responses to About Me

  1. Hello,

    I have the new EPD and am trying to implement fft with fastnumpy. Have you run benches with fft? I’ve tried what I can to get fft to use both cores on a Core2 Duo, but no luck; only 50% utilization across both and performance similar to vanilla numpy 1.6.1

    Script results:
    True
    Intel MKL version: Intel(R) Math Kernel Library Version 10.3.1 Product Build 20101110 for 32-bit applications
    Intel cpu_clocks: 8903351421090
    Intel cpu_frequency: 1.66251
    max Intel threads: 2
    using numpy 1.5.1
    (2, 65536) items
    simple loop 2.54858477242

    ____ Test script: _________
    import numpy
    import numpy.fft as fft
    print numpy.use_fastnumpy
    import time
    #from scipy.fftpack import fft
    import mkl

    print ‘Intel MKL version:’, mkl.get_version_string()
    print ‘Intel cpu_clocks:’, mkl.get_cpu_clocks()
    print ‘Intel cpu_frequency:’, mkl.get_cpu_frequency()
    #print ‘Intel MKL, freeing buffer memory:’, mkl.thread_free_buffers()

    print ‘max Intel threads:’, mkl.get_max_threads()
    mkl.set_num_threads(2)

    N = 2**20

    print ‘using numpy’, numpy.__version__
    a = numpy.random.rand(2, N)
    print a.shape, ‘items’
    t0 = time.clock()
    for i in range(10):
    continue
    base = time.clock()-t0
    fftn = fft.fftn
    t0 = time.clock()
    for i in range(10):
    r = fftn(a, (N,), (1,))
    print ‘simple loop’, time.clock()-t0-base

    • dpinte says:

      Testing your script on my MacBookPro :

      dpinte:tmp dpinte$ python test_fastnumpy.py
      True
      Intel MKL version: Intel(R) Math Kernel Library Version 10.3.1 Product Build 20101110 for 32-bit applications
      Intel cpu_clocks: 10504847091303
      Intel cpu_frequency: 2.66015408791
      max Intel threads: 2
      using numpy 1.5.1
      (2, 1048576) items
      simple loop 1.95484

      Decresing the number of threads to 1 makes it go much faster :

      dpinte:tmp dpinte$ python test_fastnumpy.py
      True
      Intel MKL version: Intel(R) Math Kernel Library Version 10.3.1 Product Build 20101110 for 32-bit applications
      Intel cpu_clocks: 11993346847969
      Intel cpu_frequency: 2.66027246734
      max Intel threads: 1
      using numpy 1.5.1
      (2, 1048576) items
      simple loop 1.218312

      And on my Windows VM, we got good results :

      Without fast numpy :
      Z:\dpinte\tmp>python test_fastnumpy.py
      False
      Intel MKL version: Intel(R) Math Kernel Library Version 10.3.1 Produ
      01110 for Intel(R) 64 architecture applications
      Intel cpu_clocks: 206824057234718
      Intel cpu_frequency: 2.66014196505
      max Intel threads: 1
      using numpy 1.5.1
      (2L, 1048576L) items
      simple loop 2.48767119843

      With fastnumpy :
      Z:\dpinte\tmp>python test_fastnumpy.py
      True
      Intel MKL version: Intel(R) Math Kernel Library Version 10.3.1 Produ
      01110 for Intel(R) 64 architecture applications
      Intel cpu_clocks: 206936224606227
      Intel cpu_frequency: 2.66014026949
      max Intel threads: 1
      using numpy 1.5.1
      (2L, 1048576L) items
      simple loop 1.16259915716

      That is a half of the time of the standard version of numpy.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: