Spot the Difference — It’s NumPy!

My first brush with NumPy happened over writing a block of code to make a plot using pylab. ⇣


pylab is part of matplotlib (in matplotlib.pylab) and tries to give you a MatLab like environment. matplotlib has a number of dependencies, among them numpy which it imports under the common alias np. scipy is not a dependency of matplotlib.


I had a tuple (of lows and highs of temperature) of lengh 2 with 31 entries in each (the number of days in the month of July), parsed from this text file:

Boston July Temperatures
-------------------------
Day High Low
------------
1 91 70
2 84 69
3 86 68
4 84 68
5 83 70
6 80 68
7 86 73
8 89 71
9 84 67
10 83 65
11 80 66
12 86 63
13 90 69
14 91 72
15 91 72
16 88 72
17 97 76
18 89 70
19 74 66
20 71 64
21 74 61
22 84 61
23 86 66
24 91 68
25 83 65
26 84 66
27 79 64
28 72 63
29 73 64
30 81 63
31 73 63
view raw julyTemps.txt hosted with ❤ by GitHub

Given below, are 2 sets of code that do the same thing; one without NumPy and the other with NumPy. They output the following graph using PyLab:

differenceTemp

Code without NumPy

import pylab
def loadfile():
inFile = open('julyTemps.txt', 'r')
high =[]; low = []
for line in inFile:
fields = line.split()
if len(fields) < 3 or not fields[0].isdigit():
pass
else:
high.append(int(fields[1]))
low.append(int(fields[2]))
return low, high
def producePlot(lowTemps, highTemps):
diffTemps = [highTemps[i] - lowTemps[i] for i in range(len(lowTemps))]
pylab.title('Day by Day Ranges in Temperature in Boston in July 2012')
pylab.xlabel('Days')
pylab.ylabel('Temperature Ranges')
return pylab.plot(range(1,32),diffTemps)
producePlot(loadfile()[1], loadfile()[0])
view raw withoutNumPy.py hosted with ❤ by GitHub

Code with NumPy
import pylab
import numpy as np
def loadFile():
inFile = open('julyTemps.txt')
high = [];vlow = []
for line in inFile:
fields = line.split()
if len(fields) != 3 or 'Boston' == fields[0] or 'Day' == fields[0]:
continue
else:
high.append(int(fields[1]))
low.append(int(fields[2]))
return (low, high)
def producePlot(lowTemps, highTemps):
diffTemps = list(np.array(highTemps) - np.array(lowTemps))
pylab.plot(range(1,32), diffTemps)
pylab.title('Day by Day Ranges in Temperature in Boston in July 2012')
pylab.xlabel('Days')
pylab.ylabel('Temperature Ranges')
pylab.show()
(low, high) = loadFile()
producePlot(low, high)
view raw withNumPy.py hosted with ❤ by GitHub

The difference in code lies in how the variable diffTemps is calculated.

diffTemps = list(np.array(highTemps) - np.array(lowTemps))

seems more readable than

diffTemps = [highTemps[i] - lowTemps[i] for i in range(len(lowTemps))]

Notice how straight forward it is with NumPy. At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. element-by-element operations are the “default mode” when an ndarray is involved, but the element-by-element operation is speedily executed by pre-compiled C code.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s