My first brush with NumPy happened over writing a block of code to make a plot using pylab. ⇣
pylab
is part ofmatplotlib
(inmatplotlib.pylab
) and tries to give you a MatLab like environment.matplotlib
has a number of dependencies, among themnumpy
which it imports under the common aliasnp
.scipy
is not a dependency ofmatplotlib
.
I had a tuple (of lows and highs of temperature) of lengh 2 with 31 entries in each (the number of days in the month of July), parsed from this text file:
Boston July Temperatures | |
------------------------- | |
Day High Low | |
------------ | |
1 91 70 | |
2 84 69 | |
3 86 68 | |
4 84 68 | |
5 83 70 | |
6 80 68 | |
7 86 73 | |
8 89 71 | |
9 84 67 | |
10 83 65 | |
11 80 66 | |
12 86 63 | |
13 90 69 | |
14 91 72 | |
15 91 72 | |
16 88 72 | |
17 97 76 | |
18 89 70 | |
19 74 66 | |
20 71 64 | |
21 74 61 | |
22 84 61 | |
23 86 66 | |
24 91 68 | |
25 83 65 | |
26 84 66 | |
27 79 64 | |
28 72 63 | |
29 73 64 | |
30 81 63 | |
31 73 63 |
Given below, are 2 sets of code that do the same thing; one without NumPy and the other with NumPy. They output the following graph using PyLab:
Code without NumPy
import pylab | |
def loadfile(): | |
inFile = open('julyTemps.txt', 'r') | |
high =[]; low = [] | |
for line in inFile: | |
fields = line.split() | |
if len(fields) < 3 or not fields[0].isdigit(): | |
pass | |
else: | |
high.append(int(fields[1])) | |
low.append(int(fields[2])) | |
return low, high | |
def producePlot(lowTemps, highTemps): | |
diffTemps = [highTemps[i] - lowTemps[i] for i in range(len(lowTemps))] | |
pylab.title('Day by Day Ranges in Temperature in Boston in July 2012') | |
pylab.xlabel('Days') | |
pylab.ylabel('Temperature Ranges') | |
return pylab.plot(range(1,32),diffTemps) | |
producePlot(loadfile()[1], loadfile()[0]) |
Code with NumPy
import pylab | |
import numpy as np | |
def loadFile(): | |
inFile = open('julyTemps.txt') | |
high = [];vlow = [] | |
for line in inFile: | |
fields = line.split() | |
if len(fields) != 3 or 'Boston' == fields[0] or 'Day' == fields[0]: | |
continue | |
else: | |
high.append(int(fields[1])) | |
low.append(int(fields[2])) | |
return (low, high) | |
def producePlot(lowTemps, highTemps): | |
diffTemps = list(np.array(highTemps) - np.array(lowTemps)) | |
pylab.plot(range(1,32), diffTemps) | |
pylab.title('Day by Day Ranges in Temperature in Boston in July 2012') | |
pylab.xlabel('Days') | |
pylab.ylabel('Temperature Ranges') | |
pylab.show() | |
(low, high) = loadFile() | |
producePlot(low, high) |
The difference in code lies in how the variable diffTemps
is calculated.
diffTemps = list(np.array(highTemps) - np.array(lowTemps))
seems more readable than
diffTemps = [highTemps[i] - lowTemps[i] for i in range(len(lowTemps))]
Notice how straight forward it is with NumPy. At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. element-by-element operations are the “default mode” when an ndarray is involved, but the element-by-element operation is speedily executed by pre-compiled C code.