# Optimization with numpy and numexpr

Using numpy and numexpr to optimize numerical expression evaluation in Python.

Download the Jupyter notebook | View my code workshop (GitHub)

## Optimization with numpy and numexpr - Measuring Performance¶

Nearly everyone who programs in Python has used numpy at some point. Numpy is an essential data science tool, part of the 'scientific stack' which also includes pandas, matplotlib, and several other very powerful libraries commonly used in scientific computing.

Numpy provides an array type and (highly optimized) functions for array operations. Many of us probably know that optimized execution (i.e., speed) is one of the advantages of numpy, but have never payed too much attention to the details of how much performance advantage we can achieve using numpy's types and functions and other related optimization tricks. We'll use execution timing to begin to take a more detailed view of the potential performance improvements available with numpy. We'll also look at numexpr, a numerical array expression evaluator, which can be used on its own or in combination with numpy to achieve some dramatic performance improvements.

First, let's do the necessary imports. Notice that we only import the functions we need from the math library. See this thread for why you should avoid using 'import *'.

In :
```from math import log, cos
import numpy as np
```

We will first do some common numeric computation without numpy. This will give us a performance baseline against which to measure various improvements. You have probably written something like the following code many times, performing some key computation step inside a for loop. In this case, we are using a list comprehension, which gives the same result as a for loop but provides a more concise and readable syntax.

We'll try doing a basic numerical computation for each of a series of a numbers, using two of python's built-in types - list and range.

In :
```loops = 25000000

a, b = range(1, loops), [i for i in range(1,loops)]
print(f'Type of a is {type(a)}; type of b is {type(b)}')

def f(x):
return 3 * log(x) + cos(x) ** 2

%timeit r = [f(x) for x in a]
%timeit r = [f(x) for x in b]
```

```Type of a is <class 'range'>; type of b is <class 'list'>
1 loop, best of 3: 11.5 s per loop
1 loop, best of 3: 11.1 s per loop
```

The results are similar for each type - it takes ~12s to do the calculation for all the items in the list or range. Intuitively, this seems pretty slow, but we would like to quantify exactly how much improvement is reasonably possible. Now let's do the same calculation using numpy.

In :
```a = np.arange(1, loops)
print(f'The type of a is {type(a)}')

%timeit r = 3 * np.log(a) + np.cos(a) ** 2
```

```The type of a is <class 'numpy.ndarray'>
1 loop, best of 3: 1.05 s per loop
```

You can see that the execution time is much improved (by an order of magnitude) when using the numpy array type and the arange function (which is equivalent to range() when using integer arguments, but returns an array instead of a list).

Notice also that there is another advantage to using numpy besides just speed - our code is cleaner. Because numpy functions use the array type, a single call to a numpy function performs work that would require a loop or other structure if using a built-in type. In the example above, we only save a couple of lines, since f() is very simple. Let's look at an example where the code reduction (and the speedup) is more significant.

In :
```list_1, list_2 = [np.random.random() for i in range(10000)], [np.random.random() for i in range(10000)]
array_1, array_2 = np.asarray(list_1),np.asarray(list_2)

def dot(a,b):
sum = 0
for j in a:
for k in b:
sum += j*k

return sum

%timeit r = dot(list_1,list_2)
print('\n')
%timeit r = np.dot(array_1,array_2)
```

```1 loop, best of 3: 5 s per loop

The slowest run took 57.63 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.84 µs per loop
```

Here we use numpy's dot function to calculate the dot product of two array vectors. In this case, the performance improvement is even more dramatic. Even the slowest run (~160µs) is thousands of times faster than the best non-numpy run!

We also see a real advantage here in terms of making the code more compact and readable. Declaring our user-defined dot() function requires several lines, whereas we can do the same calculation in one line with np.dot(). When you have to perform numeric computation of this sort, always ask yourself if there is a numpy function that will take care of this for you. If you find yourself writing a lot of code to perform a numerical calculation, particularly involving nested loops, then you are almost certainly doing it the hard way. Using numpy's functions can often save many lines or tens of lines of code.

Making full use of numpy has numerous advantages:

• saving development time
• making your code execute faster
• making your code also more compact and readable

### Faster numerical expression evaluation with numexpr¶

Another tool for speeding up execution time of numeric calculations is numexpr, a fast numerical expression evaluator. The full details of how numexpr achieves these improvements is beyond the scope of this article, but here's a quick overview. Expressions are compiled to byte code and executed on a virtual machine written in c, which uses vector registers to handle blocks of elements at a time for the most efficient execution.

In :
```import numexpr as ne

ne.set_num_threads(1)
f = '3 * log(a) + cos(a) ** 2'
%timeit r = ne.evaluate(f)
```

```1 loop, best of 3: 502 ms per loop
```

We see that numexpr provides quite a performance improvement - evaluating the expression with numexpr is twice as fast (~.5s vs ~1s) with numexpr as with numpy. Numexpr also makes use of threading to further optimize execution. Let's use four threads instead of just one this time.

In :
```ne.set_num_threads(4)
%timeit r = ne.evaluate(f)
```

```1 loop, best of 3: 229 ms per loop
```

We have cut execution time in half yet again, down to around 200ms. From the original ~12s execution time, we have cut the time down by roughly a factor of 50 using numexpr with threading. Numpy and numexpr can be combined to achieve further speed improvements. For some more information on combining numpy and numexpr, see Numpy micro-optimization and numexpr.

Notice that timeit dynamically determines the number of test runs based on execution time. For more info, see the timeit docs.

We'll go into more detail on these topics in another article.

Finally, a note on the question of Python's purported slowness.

### How slow (or not) is Python?¶

Python has a reputation for being 'slow'. Here are some helpful articles that shed some light on this issue:

In a shallow way this view is correct. If we consider only execution time, speed is not python's strong suit in comparison to c/c++ and other languages. But when considering development time, speed becomes a big advantage for python. I've been working on a project involving an old, mostly c++ codebase that is being converted to python to try to bring it up to date. We're also trying to make some small improvements to the existing codebase in the short-term. On several issues, I've spent hours or a couple of days on doing something in c++ that could be done in minutes in python.

I'd like to help make the point that python doesn't have to be slow. Fmiliarizing yourself with some basic techniques can go a long way towards improving the performance of your python applications. This article introduces a few of the ways to optimize numerical computation. As we will see, some simple techniques can provide large reductions in execution time (an order of magnitude or more).

One common technique for improving application performance is to combine python with c/c++. There are many, many tools and techniques for building hybrid applications, so we'll cover those in a separate article.