London Financial Python User Group meets on February 3

January 20, 2010

The 3rd meeting of the LFPUG is to be held on February 3 at 7PM.

Topics :

  • Improving NumPy performance with the Intel MKL – Didrik Pinte
  • Python to Excel bridges :
    • “PyXLL, a user friendly Python-Excel bridge” – Tony Roberts
    • Discussion on connecting Python and Excel (xlrd/xlwt, pyinex, win32com, pyxll, …)
  • Speeding up Python code using Cython – Didrik Pinte

Location: The event will be hosted by KBC Financial Products that kindly proposed a meeting room for us. For security concerns, please do RSVP at dpinte@enthought.com to be confirmed on the attendee list. Address of the day : 111 Old Broad Street, EC2N 1FP (just opposite Tower 42)

The details are available  on the Python.org wiki page.


NumPy performance improvement with the MKL

January 15, 2010

After the relase of EPD 6.0 now linking numpy agains the Intel MKL library (10.2), I wanted to have some insight about the performance impact of the MKL usage.

What impact does the MKL have on numpy performance ?

I have very roughly started a basic benchmark comparing EPD 5.1 with EPD 6.0. The former is using numpy 1.3 with BLAS and the latter numpy 1.4 with the MKL. I am using a Thinkpad T60 with an Intel dual-core 2Ghz CPU running Windows 32bit.

! The benchmarking methodology is really poor and can be made much more realistic but it gives a first insight.

Contrary to what I said at the last LFPUG meeting on Wednesday, you can control the maximal number of threads used by the system using the OMP_NUM_THREADS environment variables. I have updated the benchmark script to show its value when running it.

Here are some results :

  1. Testing linear algebra functions

I took some of the often used methods and barely compared the cpu time using the ipython timeit command.

Example 1 : eigenvalues

def test_eigenvalue():
 i= 500
 data = random((i,i))
 result = numpy.linalg.eig(data)

The results are interesting 752ms for the MKL version versus 3376 for the ATLAS. That is a 4.5x faster.  Testing the very same code on Matlab 7.4 (R2007a) gives a timing of 790ms.

Example 2 :  single value decompositions

def test_svd():
 i = 1000
 data = random((i,i))
 result = numpy.linalg.svd(data)
 result = numpy.linalg.svd(data, full_matrices=False)

Results are 4608ms  with the MKL versus 15990ms without. This is nearly 3.5x faster.

Example 3 : matrix inversion

def test_inv():
 i = 1000
 data = random((i,i))
 result = numpy.linalg.inv(data)

Results are 418ms with the MKL versus 1457ms without.  This is 3.5x faster

Example 4 :  det()

def test_det():
 i=1000
 data = random((i,i))
 result = numpy.linalg.det(data)

Results are 186ms with the MKL versus 400ms without. This is 2x faster.

Example 5 :  dot()

def test_dot():
 i = 1000
 a = random((i, i))
 b = numpy.linalg.inv(a)
 result = numpy.dot(a, b) - numpy.eye(i)

Results are 666ms with the MKL versus 2444ms without. This is 3.5x faster.

Conclusion :

Linear algebra functions show a clear performance improvement.  I am open to collect more information on that if you have some home made benchmarking. If the amount of information, we should consider publishing the results as official benchmark somewhere.

Function Without MKL With MKL Speed up
test_eigenvalue 3376ms 752ms 4.5x
test_svd 15990ms 4608ms 3.5x
test_inv 1457ms 418ms 3.5x
test_det 400ms 186ms 2x
test_dot 2444ms 666ms 3.5x

For those of you wanting to test your environment, feel free to use the script here below.
Read the rest of this entry »


Reading SRTM hgt files using numpy

January 15, 2010

Reading some things about the SRTM datasets, I thought at using numpy to open and parse them efficiently.

The hgt data format is well defined here http://www2.jpl.nasa.gov/srtm/faq.html.

Files can read directly from numpy like this :

import numpy
# reading an <span style="font-family:Arial;">International 3-arc-second file</span>
srtm_dtype = numpy.dtype([('data', numpy.uint16, 1201)])
image = numpy.fromfile('N00E072.hgt', dtype=srtm_dtype)

If the file were bigger, you could also use memmaps to allow the loading of huge files without sufficient memory :
image = numpy.memmap('N00E072.hgt', dtype=numpy.uint16, mode="r+", shape=(1201,1201))

And here it is … you can easily adapt that to United States 1-arc-second files by updating the shape of the dtype.


using gdal/ogr with epd

August 12, 2009

I had to add the gdal and ogr libraries to a fresh EPD 4.3.0 Windows install. Here is how i’ve done and what were the problems encountered :

  1. Install the python module : easy_install gdal
  2. Add the dll libraries : Download and uncompress to c:\gdalwin32-1.6 from http://download.osgeo.org/gdal/win32/1.6/
  3. Update your path

This is were you had to manually do some updates. I haven’t exactly found the culprit but there is a problem with the OpenSSL dynamic libraries (libeay32.dll and ssleay32.dll). There is one version in c:\Windows\System32, one in c:\Python25 and one in c:\gdalwin32-1.6\bin directory.

If you do not update the path, you get the following error when loading ogr :

The ordinal 3873 could not be located in the dynamic library LIBEAY32.DLL

and the library cannot be loaded.

Just put your c:\gdalwin32-1.6\bin\ directory on the first place in your path :

PATH=c:\gdalwin32-1.6\bin\;%PATH%

Hope this will help.


Debugging OpenLayers scripts

November 14, 2008

While developping application using the great OpenLayers library, I often face the problem of javascript debugging. I found very nice applications to help me do it :


using OpenLayers filters with Cluster strategy

November 7, 2008

OpenLayers filters allows you to apply complex styling to your layers. I wondered how I could use the Cluster strategy with a specific styling that should apply when the clusters contains only one point.

Read the rest of this entry »


pyYaml dump options

October 31, 2008

Using pyYaml to create input file for a small db migration project, I did not found any place describing the exhaustive list of keyword attributes that yaml.dump method accepts. Here is a first guess :

Read the rest of this entry »


hr_timesheet : impossible to edit a new entry

September 22, 2008

Currently implementing an OpenERP system for my company, I have the following problem using the timesheet module. I can add new entries to my timesheet but I cannot edit them. All the fields of the new entry are greyed and I cannot edit them. It seems to be a problem of acces rights. Looking at the “Access controls” tab, I tried to find out wich models I had to enable … there is the problem and no information on what model should be added with “write” permission.

If someone has an idea on the configuration that must be set, I would be very interested ;-)

I’ve posted on the OpenERP forum : http://www.openerp.com/forum/topic7672.html

Update : runninng a dev version was the origin of the problem. The bug was corrected in the trunk. Thank for the quick support of the TinyERP team.


mdb-export and decimal values

September 19, 2008

mdb-export is a very nice tool to manage MsAccess database under Linux. For one of my project, I have to automate the import of a mdb file into a mysql database.

You need to pay attention to the data type of the original Access file, especially to decimal values

In our case, the decimal values were truncated just as integer values. They were defined a “réel simple” in french, thus probably “simple real” in english. Changing the type as “réel double” (“real double” ) solved the problem !


Extract documents from Maildir files – munpack

September 8, 2008

Just to mention a very nice tool. munpack unpack messages in MIME or split-uuencode format (from the munpack man page).

For example, it allowed me to extract files from a mail that could not be read by an egroupware instance.


Follow

Get every new post delivered to your Inbox.