The second example I wrote while testing how easy it is to use a C library from Python using Cython is the Gnu Scientific Library, gsl. The goal here is to compare how easy it is to do it using Cython or weave.
The first example is a bessel function (from gsl/gsl_sf_bessel.h):
double gsl_sf_bessel_J0(const double x);
The Cython interface is (gsl_test.pyx):
cdef extern from "gsl/gsl_sf_bessel.h": double gsl_sf_bessel_J0(double x) def gsl_sf_bessel_jo(double x): return gsl_sf_bessel_J0 (x)
The weave interface is defined in a pure python file. There is no need to have a full compilation round (setup.py ,etc) :
from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext import numpy import sys if sys.platform == "linux2" : include_gsl_dir = "/usr/local/include/" lib_gsl_dir = "/usr/local/lib/" elif sys.platform == "win32": include_gsl_dir = r"c:\msys\1.0\local\include" lib_gsl_dir = r"c:\msys\1.0\local\lib" ext = Extension("gsl", ["gsl.pyx"], include_dirs=[numpy.get_include(), include_gsl_dir], library_dirs=[lib_gsl_dir], libraries=["gsl"] ) setup(ext_modules=[ext], cmdclass = {'build_ext': build_ext})
The next step is to build you Cython module :
python setup.py build_ext --inplace
There is nothing to do with the weave extension that is built on the fly.
Finally all what you need is a function to compare the results from the two methods :
""" Test the GSL cython interface """ from nose.tools import assert_almost_equal import numpy import gsl_test import weave_test def test_cython_gsl_bessel(): x = 5.0 assert_almost_equal(gsl_test.gsl_sf_bessel_jo(x), -1.775967713143382920e-01, 15) def test_weave_gsl_bessel(): x = 5.0 assert_almost_equal(weave_test.weave_gsl_bessel(x), -1.775967713143382920e-01, 15)
The two tests passes ok on my Windows system.
Going to the next step with more complex calls where we use numpy arrays with some stats function :
# importing the needed function definition cdef extern from "gsl/gsl_sort.h": void gsl_sort (double * data, size_t stride, size_t n) cdef extern from "gsl/gsl_statistics_double.h": double gsl_stats_median_from_sorted_data (double sorted_data[], size_t stride, size_t n) double gsl_stats_quantile_from_sorted_data (double sorted_data[], size_t stride, size_t n, double f) def gsl_stat_demo(np.ndarray[np.float_t, ndim=1] inreal): """ Computing some statistics using GSL on a one dimensional numpy array """ cdef double median, upperq, lowerq gsl_sort (<double *>inreal.data, 1, 5) median = gsl_stats_median_from_sorted_data (<double *>inreal.data, 1, 5) upperq = gsl_stats_quantile_from_sorted_data (<double *>inreal.data, 1, 5, 0.75); lowerq = gsl_stats_quantile_from_sorted_data (<double *>inreal.data, 1, 5, 0.25); return (median, upperq, lowerq)
The same code in using weave is :
def weave_gsl_stat(ary): gsl_stat_code = """ double median, upperq, lowerq; py::tuple results(3); gsl_sort (ary, 1, 5); median = gsl_stats_median_from_sorted_data (ary, 1, 5); upperq = gsl_stats_quantile_from_sorted_data (ary, 1, 5, 0.75); lowerq = gsl_stats_quantile_from_sorted_data (ary, 1, 5, 0.25); results[0] = median; results[1] = upperq; results[2] = lowerq; return_val = results; """ return inline(gsl_stat_code, ['ary'], headers=['<gsl/gsl_statistics.h>', '<gsl/gsl_sort.h>'], libraries=['gsl'], include_dirs=[include_gsl_dir], library_dirs=[lib_gsl_dir], compiler='gcc')
Writing two tests to check that we have the right results will confirm that the two version run ok :
def test_cython_gsl_stats(): a = numpy.array([17.2, 18.1, 16.5, 18.3, 12.6]) (median, upperq, lowerq) = gsl.gsl_stat_demo(a) assert_almost_equal(median, 17.2, 1) assert_almost_equal(upperq, 18.1, 1) assert_almost_equal(lowerq, 16.5, 1) def test_weave_gsl_stats(): a = numpy.array([17.2, 18.1, 16.5, 18.3, 12.6]) (median, upperq, lowerq) = weave_test.weave_gsl_stat(a) assert_almost_equal(median, 17.2, 1) assert_almost_equal(upperq, 18.1, 1) assert_almost_equal(lowerq, 16.5, 1)
In conclusion, even if interfacing gsl with Python using Cython was quiet easy, it seems that using weave removes one layer of “complexity” (no setup.py file). This is convenient but requires that the environment where you run the file has a compiler, the needed headers and ibraries.
Hi dpinte !
Thank you for your great article/tutorial about Cython. I use Python/Numpy for scientific computing since a year now and it is definitely my language of choice.
– It is a friendly language
– Development is very fast
– Execution time is “fast” (if Numpy is used correctly).
However, sometimes it is good to get C/C++ speed in your program. I tried interfacing C/C++ with SWIG, Weave and now Cython.
– SWIG is in my opinion too complicated to use as you have to learn a new language and because it had one layer of complexity.
– Weave is nice, but I don’t like in-lining C code in a string. Moreover, you said that it remove on layer of “complexity”. I don’t see that as you have to create an “inline” object containing all information you would have put in a “setup.py” file for Cython.
– Finally, Cython is very easy to use has you basically just have to define your variable and use C functions.
To conclude, I wrote a script that compute a distance matrix (O^2) in Python with Numpy arrays and the same script in Cython. It took me 10 minutes to figure it out how Cython works and I gained a speed up of 550 times !!! Amazing 🙂
Kev
Hey kfrancoi,
I should probably have used another word than complexity but anyway the development/deployment cycle with Cython requires more work :
1. having a setup.py file
2. every change requires a rebuilt of the module (python setup.py build_ext –inplace)
3. the Cython interface is strong typed !
4. the deployment of your module is platform specific
Using weave, you only manage pure python file. That’s much easier (but they are other drawbacks 😉 )
Can you comment on a comparison between Weave/Cython vs ctypes? Using ctypes, I can call the bessel function similarly to the way you call your Cython module.
But in ctypes you can not access c++ source files.
You can access a function in C++ if you define extern “C”. In this function you can then call any C++ libraries you want so C++ works with ctypes just fine.
For this example (and my own project) to work, I needed to include both the gsl and gslcblas libraries, to wit:
libraries = [“gsl”, “gslcblas”]
as in the example here:
https://github.com/dagss/euroscipy2010/blob/master/gsl/setup.py
I and my buddies appeared to be viewing the great techniques on the blog and so then I got an awful suspicion I never thanked you for those strategies.
Most of the people had been certainly glad to learn all of them and now have in truth been
taking pleasure in those things. Thanks for truly
being simply accommodating and then for going for certain superior themes most people are really
wanting to learn about. Our honest regret for not saying thanks
to sooner.
[…] Unfortunately, the documentation isn’t great, but there are plenty of examples out there (SageMath have a couple of very clear ones). It should definitely be your first port of call if all you want to do is speed up part of your Python code. It’s not so good if you want to interface with legacy code, or have something more complicated that you need to do in C (e.g. something that’s split into different functions). See also Cython, which seems to be a bit more flexible, but not too much more difficult to use. There’s a nice speed comparison with NumPy, Weave blitz and inline, and MATLAB here, and a Cython/Weave example (with benchmarks) here. There’s also an interesting blog post on getting GSL to work with Weave/Cython. […]
[…] Allows you to write C inline, the compiles […]