How to become a Data Scientist in 6 months

June 15, 2016June 17, 2016 Anirudh Non Technical Data Science, Kaggle, Machine Learning, PyData, Python

Disclaimer: I’m not a data scientist yet. That’s still work in progress, but I’d recommend this excellent talk given by Tetiana Ivanova to put an enthusiast’s data science journey in perspective.

My First Data Science Hackathon

December 20, 2015December 20, 2015 Anirudh Non Technical Analytics Vidhya, Data Science, Hackathon, Python, R

I participated in https://t.co/alLuY7JjjT
Finished 24th/54. It was my first ever #datascience #hackathon. Determined to get better at this.

— Anirudh (@anirudhjay) December 20, 2015

So after 8 months of playing around with R and Python and blog post after blog post, I found myself finally hacking away at a problem set from the 17th storey of the Hindustan Times building at Connaught Place. I had entered my first ever data science hackathon conducted by Analytics Vidhya, a pioneer in analytics learning in India. Pizzas and Pepsi were on the house. Like any predictive analysis hackathon, this one accepted unlimited entries till submission time. It was from 2pm to 4:30pm today – 2.5 hours, of which I ended up wasting 1.5 hours trying to make my first submission which encountered submission error after submission error until the problem was fixed finally post lunch. I had 1 hour to try my best. It wasn’t the best performance, but I thought of blogging this experience anyway, as a reminder of the work that awaits me. I want to be the one winning prize money at the end of the day.

🙂

Solutions to Machine Learning Programming Assignments

November 24, 2015July 25, 2016 Anirudh Technical Andrew Ng, Code Snippets, Coding, Machine Learning, Octave, Python, Solutions

This post contains links to a bunch of code that I have written to complete Andrew Ng’s famous machine learning course which includes several interesting machine learning problems that needed to be solved using the Octave / Matlab programming language. I’m not sure I’d ever be programming in Octave after this course, but learning Octave just so that I could complete this course seemed worth the time and effort. I would usually work on the programming assignments on Sundays and spend several hours coding in Octave, telling myself that I would later replicate the exercises in Python.

If you’ve taken this course and found some of the assignments hard to complete, I think it might not hurt to go check online on how a particular function was implemented. If you end up copying the entire code, it’s probably your loss in the long run. But then John Maynard Keynes once said, ‘In the long run we are all dead‘. Yeah, and we wonder why people call Economics the dismal science!

Most people disregard Coursera’s feeble attempt at reigning in plagiarism by creating an Honor Code, precisely because this so-called code-of-conduct can be easily circumvented. I don’t mind posting solutions to a course’s programming assignments because GitHub is full to the brim with such content. Plus, it’s always good to read others’ code even if you implemented a function correctly. It helps understand the different ways of tackling a given programming problem.

ex1
ex2
ex3
ex4
ex5
ex6
ex7
ex8

Enjoy!

Spot the Difference — It’s NumPy!

October 22, 2015October 25, 2015 Anirudh Technical Code Snippets, Data Visualization, NumPy, Python

My first brush with NumPy happened over writing a block of code to make a plot using pylab. ⇣

pylab is part of matplotlib (in matplotlib.pylab) and tries to give you a MatLab like environment. matplotlib has a number of dependencies, among them numpy which it imports under the common alias np. scipy is not a dependency of matplotlib.

I had a tuple (of lows and highs of temperature) of lengh 2 with 31 entries in each (the number of days in the month of July), parsed from this text file:

	Boston July Temperatures
	-------------------------

	Day High Low
	------------

	1 91 70
	2 84 69
	3 86 68
	4 84 68
	5 83 70
	6 80 68
	7 86 73
	8 89 71
	9 84 67
	10 83 65
	11 80 66
	12 86 63
	13 90 69
	14 91 72
	15 91 72
	16 88 72
	17 97 76
	18 89 70
	19 74 66
	20 71 64
	21 74 61
	22 84 61
	23 86 66
	24 91 68
	25 83 65
	26 84 66
	27 79 64
	28 72 63
	29 73 64
	30 81 63
	31 73 63

view raw julyTemps.txt hosted with ❤ by GitHub

Given below, are 2 sets of code that do the same thing; one without NumPy and the other with NumPy. They output the following graph using PyLab:

Code without NumPy

	import pylab

	def loadfile():
	inFile = open('julyTemps.txt', 'r')
	high =[]; low = []
	for line in inFile:
	fields = line.split()
	if len(fields) < 3 or not fields[0].isdigit():
	pass
	else:
	high.append(int(fields[1]))
	low.append(int(fields[2]))
	return low, high

	def producePlot(lowTemps, highTemps):
	diffTemps = [highTemps[i] - lowTemps[i] for i in range(len(lowTemps))]
	pylab.title('Day by Day Ranges in Temperature in Boston in July 2012')
	pylab.xlabel('Days')
	pylab.ylabel('Temperature Ranges')
	return pylab.plot(range(1,32),diffTemps)

	producePlot(loadfile()[1], loadfile()[0])

view raw withoutNumPy.py hosted with ❤ by GitHub

Code with NumPy

	import pylab
	import numpy as np

	def loadFile():
	inFile = open('julyTemps.txt')
	high = [];vlow = []
	for line in inFile:
	fields = line.split()
	if len(fields) != 3 or 'Boston' == fields[0] or 'Day' == fields[0]:
	continue
	else:
	high.append(int(fields[1]))
	low.append(int(fields[2]))
	return (low, high)

	def producePlot(lowTemps, highTemps):
	diffTemps = list(np.array(highTemps) - np.array(lowTemps))
	pylab.plot(range(1,32), diffTemps)
	pylab.title('Day by Day Ranges in Temperature in Boston in July 2012')
	pylab.xlabel('Days')
	pylab.ylabel('Temperature Ranges')
	pylab.show()


	(low, high) = loadFile()
	producePlot(low, high)

view raw withNumPy.py hosted with ❤ by GitHub

The difference in code lies in how the variable diffTemps is calculated.

diffTemps = list(np.array(highTemps) - np.array(lowTemps))

seems more readable than

diffTemps = [highTemps[i] - lowTemps[i] for i in range(len(lowTemps))]

Notice how straight forward it is with NumPy. At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. element-by-element operations are the “default mode” when an ndarray is involved, but the element-by-element operation is speedily executed by pre-compiled C code.

MITx 6.00.2x Introduction to Computational Thinking and Data Science (Fall 2015)

October 21, 2015October 21, 2015 Anirudh Non Technical Coding, Data Science, Data Visualization, edX, MIT, MOOC, Python

MIT’s Fall 2015 iteration of 6.00.2x starts today. After an enriching learning experience with 6.00.1x, I have great expectations from this course. As the course website mildly puts it, 6.00.2x is an introduction to using computation to understand real-world phenomena. MIT OpenCourseware (OCW) mirroring the material covered in 6.00.1x and 6.00.2x can be found here.

The course follows this book by John Guttag (who happens to be one of the instructors for this course). However, purchasing the book isn’t a necessity for this course.

One thing I loved about 6.00.1x was its dedicated Facebook group, which gave a community / classroom-peergroup feel to the course. 6.00.2x also has a Facebook group. Here’s a sneak peak:

The syllabus and schedule for this course is shown below. The course is spread out over 2 months which includes 7 weeks of lectures.

The prerequisites for this course are pretty much covered in this set of tutorial videos that have been created by one of the TAs for 6.00.1x. If you’ve not taken 6.00.1x in the past, you can go through these videos (running time < 1hr) to judge whether or not to go ahead with 6.00.2x.

So much for the update. Got work to do! 🙂

Funny Python

October 17, 2015October 17, 2015 Anirudh Non Technical funny, Monty Python, Python

If a programming language is named after a sketch comedy troupe, one knows what to expect. Python IS a funny language with its own bag of surprises.

pythonMonty — Monty Python’s Flying Circus

For instance, If you’ve just moved from a language such as C to Python and you’re missing curly braces (how can one not want whitespaces!!), and you try this:

>>> from __future__ import braces

Or say, if you try importing this.

>>> import this

Or if you ever wanted to know why XKCD’s Cueball left Perl for Python, you should know, that it was for gravity defying stunts that he couldn’t perform anywhere else. Just import antigravity!

>>> import antigravity

You’re led to this webcomic on your browser.

So the upshot is that you can get tickled and trolled by Python every now and then, keeping in line with its rich tradition of doing so (check out video below).

Comedians!

Python to the Rescue

October 16, 2015October 16, 2015 Anirudh Non Technical Coding, Development Economics, Economics, MIT, Python

Another journal-like entry

Programming as a profession is only moderately interesting. It can be a good job, but you could make about the same money and be happier running a fast food joint. You’re much better off using code as your secret weapon in another profession.

People who can code in the world of technology companies are a dime a dozen and get no respect. People who can code in biology, medicine, government, sociology, physics, history, and mathematics are respected and can do amazing things to advance those disciplines.

– Advice from an Old Programmer

I was reading a paper today, written by MIT’s Esther Duflo, part of a homework assignment on a MOOC on development policy (Foundations of Development Policy: Advanced Development Economics) offered by Duflo and Abhijit Banerjee. So I opened the paper and started copying important lines from the PDF to a text editor to make notes. I could copy the text, but when I pasted it onto a text editor, it turned out to be gibberish (you can try it too!).

For instance, instead of pasting

Between 1973 and 1978 the Indonesian Government constructed over 61,000 primary schools throughout the county

I got:

Ehwzhhq 4<:6 dqg 4<:;/ wkh Lqgrqhvldq Jryhuqphqw frqv wuxfwhg ryhu 94/333 sulpdu| vfkrrov wkurxjkrxw wkh frxqwu|

It was a good thing the cipher used for this text wasn’t too complicated. After some perusal, I found that ‘B’ became ‘E’, ‘e’ became ‘h’, ‘t’ became ‘w’ and so on. So I copied the entire content of the PDF to a text file and named the encrypted file estherDuflo.txt. I noticed that the encryption had been implemented only on the first 1475 lines. The remaining was plain English.

So I wrote a Python script to decrypt the gibberish, rather than simply typing out my notes. It took 20 minutes writing the code and 8 ms to execute (of course!). I didn’t want to spend a lot of time ensuring a thorough decryption, so the result wasn’t perfect, but then I’m going to make do. I named the decrypted file estherDufloDecrypted.txt.

Sample from the Encrypted File

	5U LL?} @?_ w@MLh @h!i\| L?ti^ i?Uit Lu 5U LL
	L?t\|h U\|L? ? W?_L?it@G ,_i?Ui uhL4 @? N? t @* L*U)
	, Tih4i?\|
	,t\| ih # L
	W
	Devwudfw
	Ehwzhhq 4<:6 dqg 4<:;/ wkh Lqgrqhvldq Jryhuqphqw frqvwuxfwhg ryhu 94/333 sulpdu\|
	vfkrrov wkurxjkrxw wkh frxqwu\|1 Wklv lv rqh ri wkh odujhvw vfkrro frqvwuxfwlrq surjudpv rq
	uhfrug1 L hydoxdwh wkh hhfw ri wklv surjudp rq hgxfdwlrq dqg zdjhv e\| frpelqlqj glhuhqfhv
	dfurvv uhjlrqv lq wkh qxpehu ri vfkrrov frqvwuxfwhg zlwk glhuhqfhv dfurvv frkruwv lqgxfhg
	e\| wkh wlplqj ri wkh surjudp1 Wkh hvwlpdwhv vxjjhvw wkdw wkh frqvwuxfwlrq ri sulpdu\| vfkrrov
	ohg wr dq lqfuhdvh lq hgxfdwlrq dqg hduqlqjv1 Fkloguhq djhg 5 wr 9 lq 4<:7 uhfhlyhg 3145 wr
	314< pruh \|hduv ri hgxfdwlrq iru hdfk vfkrro frqvwuxfwhg shu 4/333 fkloguhq lq wkhlu uhjlrq
	ri eluwk1 Xvlqj wkh yduldwlrqv lq vfkrrolqj jhqhudwhg e\| wklv srolf\| dv lqvwuxphqwdo yduldeohv
	iru wkh lpsdfw ri hgxfdwlrq rq zdjhv jhqhudwhv hvwlpdwhv ri hfrqrplf uhwxuqv wr hgxfdwlrq
	udqjlqj iurp 91; shufhqw wr 4319 shufhqw1 +MHO L5/ M64/ R48/ R55,
	Wkh txhvwlrq ri zkhwkhu lqyhvwphqw lq lqiudvwuxfwxuh lqfuhdvhv kxpdq fdslwdo dqg uhgxfhv
	sryhuw\| kdv orqj ehhq d frqfhuq wr ghyhorsphqw hfrqrplvwv dqg srolf\|pdnhuv1 Iru h{dpsoh/
	dydlodelolw\| ri vfkrrolqj lqiudvwuxfwxuh kdv ehhq vkrzq wr eh srvlwlyho\| fruuhodwhg zlwk frpsohwhg
	vfkrrolqj ru hquroophqw e\| Qdqf\| Elugvdoo +4<;8, lq xuedq Eud}lo/ Ghqqlv GhWud\| +4<;;, dqg Ohh

view raw estherDuflo.txt hosted with ❤ by GitHub

My Code

	from string import *

	# create decipher dictionary
	l = letters[:26]
	decipher = "".join([l[(i+3)%26] for i in range(len(l))])
	decipher = dict(zip(decipher,l))

	# open and read encrypted text
	filename = 'estherDuflo.txt'
	f = open(filename, 'rw')
	lines = f.readlines()
	lines = [l[:-1] for l in lines]
	# use first 1475 lines only
	newlines = lines[:1475]

	# apply decryption on those 1475 lines
	decipheredLines = []
	for line in newlines:
	x = line.lower()
	s = []
	for letter in x:
	if letter in letters:
	s.append(decipher[letter])
	else:
	s.append(letter)
	s.append('\n')
	decipheredLines.append(''.join(s))

	# write deciphered text to new text file
	decipheredFile = 'estherDufloDeciphered.txt'
	df = open(decipheredFile, 'w')
	for line in decipheredLines:
	df.write("%s" % line)

	# close both text files
	f.close()
	df.close()

view raw estherDuflo.py hosted with ❤ by GitHub

Sample from the Decrypted File

	5r ii?} @?_ t@jie @e!f\| i?qf^ f?rfq ir 5r ii
	i?q\|e r\|i? ? t?_i?fq@d ,_f?rf rei4 @? k? q @* i*r)
	, qfe4f?\|
	,q\| fe # i
	t
	abstract
	between 4<:6 and 4<:;/ the indonesian government constructed over 94/333 primar\|
	schools throughout the countr\|1 this is one of the largest school construction programs on
	record1 i evaluate the eect of this program on education and wages b\| combining dierences
	across regions in the number of schools constructed with dierences across cohorts induced
	b\| the timing of the program1 the estimates suggest that the construction of primar\| schools
	led to an increase in education and earnings1 children aged 5 to 9 in 4<:7 received 3145 to
	314< more \|ears of education for each school constructed per 4/333 children in their region
	of birth1 using the variations in schooling generated b\| this polic\| as instrumental variables
	for the impact of education on wages generates estimates of economic returns to education
	ranging from 91; percent to 4319 percent1 +jel i5/ j64/ o48/ o55,
	the question of whether investment in infrastructure increases human capital and reduces
	povert\| has long been a concern to development economists and polic\|makers1 for e{ample/
	availabilit\| of schooling infrastructure has been shown to be positivel\| correlated with completed
	schooling or enrollment b\| nanc\| birdsall +4<;8, in urban bra}il/ dennis detra\| +4<;;, and lee

view raw estherDufloDecrypted.txt hosted with ❤ by GitHub

Karatsuba Multiplication Algorithm – Python Code

October 13, 2015October 14, 2015 Anirudh Technical Algorithms, Code Snippets, Coursera, Karatsuba, Math, Python

Motivation for this blog post

I’ve enrolled in Stanford Professor Tim Roughgarden’s Coursera MOOC on the design and analysis of algorithms, and while he covers the theory and intuition behind the algorithms in a surprising amount of detail, we’re left to implement them in a programming language of our choice.

~~And I’m ging to post Python code for all the algorithms covered during the course!~~

The Karatsuba Multiplication Algorithm

Karatsuba’s algorithm reduces the multiplication of two n-digit numbers to at most $n^{\log_23}\approx n^{1.585}$ single-digit multiplications in general (and exactly $n^{\log_23}$ when n is a power of 2). Although the familiar grade school algorithm for multiplying numbers is how we work through multiplication in our day-to-day lives, it’s slower ( $\Theta(n^2)\,\!$ ) in comparison, but only on a computer, of course!

Here’s how the grade school algorithm looks:
(The following slides have been taken from Tim Roughgarden’s notes. They serve as a good illustration. I hope he doesn’t mind my sharing them.)

…and this is how Karatsuba Multiplication works on the same problem:

A More General Treatment

Let $x$ and $y$ be represented as $n$ -digit strings in some base $B$ . For any positive integer $m$ less than $n$ , one can write the two given numbers as

$x = x_1B^m + x_0$
$y = y_1B^m + y_0$ ,

where $x_0$ and $y_0$ are less than $B^m$ . The product is then

$xy = (x_1B^m + x_0)(y_1B^m + y_0)$
$xy = z_2B^{2m} + z_1B^m + z_0$

where

$z_2 = x_1y_1$
$z_1 = x_1y_0 + x_0y_1$
$z_0 = x_0y_0$

These formulae require four multiplications, and were known to Charles Babbage. Karatsuba observed that $xy$ can be computed in only three multiplications, at the cost of a few extra additions. With $z_0$ and $z_2$ as before we can calculate

$z_1 = (x_1 + x_0)(y_1 + y_0) - z_2 - z_0$

which holds since

$z_1 = x_1y_0 + x_0y_1$
$z_1 = (x_1 + x_0)(y_1 + y_0) - x_1y_1 - x_0y_0$

A more efficient implementation of Karatsuba multiplication can be set as $xy = (b^2 + b)x_1y_1 - b(x_1 - x_0)(y_1 - y_0) + (b + 1)x_0y_0$ , where $b = B^m$ .

Example

To compute the product of 12345 and 6789, choose B = 10 and m = 3. Then we decompose the input operands using the resulting base (B^m = 1000), as:

12345 = 12 · 1000 + 345

6789 = 6 · 1000 + 789

Only three multiplications, which operate on smaller integers, are used to compute three partial results:

z₂ = 12 × 6 = 72

z₀ = 345 × 789 = 272205

z₁ = (12 + 345) × (6 + 789) − z₂ − z₀ = 357 × 795 − 72 − 272205 = 283815 − 72 − 272205 = 11538

We get the result by just adding these three partial results, shifted accordingly (and then taking carries into account by decomposing these three inputs in base 1000 like for the input operands):

result = z₂ · B^2m + z₁ · B^m + z₀, i.e.

result = 72 · 1000² + 11538 · 1000 + 272205 = 83810205.

Pseudocode and Python code

	procedure karatsuba(num1, num2)
	if (num1 < 10) or (num2 < 10)
	return num1*num2
	/* calculates the size of the numbers */
	m = max(size_base10(num1), size_base10(num2))
	m2 = m/2
	/* split the digit sequences about the middle */
	high1, low1 = split_at(num1, m2)
	high2, low2 = split_at(num2, m2)
	/* 3 calls made to numbers approximately half the size */
	z0 = karatsuba(low1,low2)
	z1 = karatsuba((low1+high1),(low2+high2))
	z2 = karatsuba(high1,high2)
	return (z210^(2m2))+((z1-z2-z0)*10^(m2))+(z0)

view raw karatsuba_pseudocode.txt hosted with ❤ by GitHub

	def karatsuba(x,y):
	"""Function to multiply 2 numbers in a more efficient manner than the grade school algorithm"""
	if len(str(x)) == 1 or len(str(y)) == 1:
	return x*y
	else:
	n = max(len(str(x)),len(str(y)))
	nby2 = n / 2

	a = x / 10**(nby2)
	b = x % 10**(nby2)
	c = y / 10**(nby2)
	d = y % 10**(nby2)

	ac = karatsuba(a,c)
	bd = karatsuba(b,d)
	ad_plus_bc = karatsuba(a+b,c+d) - ac - bd

	# this little trick, writing n as 2*nby2 takes care of both even and odd n
	prod = ac * 10*(2nby2) + (ad_plus_bc * 10**nby2) + bd

	return prod

view raw karatsuba.py hosted with ❤ by GitHub

Teach Yourself Machine Learning the Hard Way!

October 9, 2015October 12, 2015 Anirudh Non Technical Algorithms, Data Science, Machine Learning, Python

This formula is kick-ass!

Darshan Hegde

It has been 3 years since I have steered my interests towards Machine Learning. I had just graduated from college with a Bachelor of Engineering in Electronics and Communication Engineering. Which is, other way of saying that I was:

a toddler in programming.
little / no knowledge of algorithms.
studied engineering math, but it was rusty.
no knowledge of modern optimization.
zero knowledge of statistical inference.

I think, most of it is true for many engineering graduates (especially, in India !). Unless, you studied mathematics and computing for undergrad.

Lucky for me, I had a great mentor and lot of online materials on these topics. This post will list many such materials I found useful, while I was learning it the hard way !

All the courses that I’m listing below have homework assignments. Make sure you work through each one of them.

1. Learn Python

If you are new to programming…

View original post 507 more words

Why Parselmouth Harry Potter is also Parsermouth Harry Potter

September 27, 2015September 27, 2015 Anirudh Non Technical cartoon, Code Snippets, funny, Python, Stack Overflow

If you’re a Pythonista or just a coder, you may have come across this web cartoon:

A comic I did for @Webs is up! pic.twitter.com/CZHAv0eUVI #codehumor #python

— Ryan Sawyer (@EightballArt) September 25, 2014

Its creator Ryan Sawyer has been working as a full-time graphic designer and freelance illustrator for the past 10 years. His projects have been featured on websites such as /Film, io9, BoingBoing, Uproxx, MusicRadar, SuperPunch, IGN, and PackagingDigest.

I recently came across an interesting thread on Reddit on the origins of this cartoon. Basically, the cartoonist, ergo Python-speaking-Harry, got their code from this Stack Overflow forum for short, useful Python code snippets! Convenient, right?!

What’s funny is that the forum later got closed as it was deemed not constructive!

ParsermouthStackOverflow — Click Image to Enlarge

The code is supposed to print a recursive count of lines of python source code from the current working directory, including an ignore list – so as to print total sloc. Don’t blame me though, if the code doesn’t work!

	# prints recursive count of lines of python source code from current directory
	# includes an ignore_list. also prints total sloc

	import os
	cur_path = os.getcwd()
	ignore_set = set(["__init__.py", "count_sourcelines.py"])

	loclist = []

	for pydir, _, pyfiles in os.walk(cur_path):
	for pyfile in pyfiles:
	if pyfile.endswith(".py") and pyfile not in ignore_set:
	totalpath = os.path.join(pydir, pyfile)
	loclist.append( ( len(open(totalpath, "r").read().splitlines()),
	totalpath.split(cur_path)[1]) )

	for linenumbercount, filename in loclist:
	print "%05d lines in %s" % (linenumbercount, filename)

	print "\nTotal: %s lines (%s)" %(sum([x[0] for x in loclist]), cur_path)

view raw sloc.py hosted with ❤ by GitHub

	5U LL?} @?_ w@MLh @h!i\| L?ti^ i?Uit Lu 5U LL
	L?t\|h U\|L? ? W?_L?it@G ,_i?Ui uhL4 @? N? t @* L*U)
	, Tih4i?\|
	,t\| ih # L
	W
	Devwudfw
	Ehwzhhq 4<:6 dqg 4<:;/ wkh Lqgrqhvldq Jryhuqphqw frqvwuxfwhg ryhu 94/333 sulpdu\|
	vfkrrov wkurxjkrxw wkh frxqwu\|1 Wklv lv rqh ri wkh odujhvw vfkrro frqvwuxfwlrq surjudpv rq
	uhfrug1 L hydoxdwh wkh hhfw ri wklv surjudp rq hgxfdwlrq dqg zdjhv e\| frpelqlqj glhuhqfhv
	dfurvv uhjlrqv lq wkh qxpehu ri vfkrrov frqvwuxfwhg zlwk glhuhqfhv dfurvv frkruwv lqgxfhg
	e\| wkh wlplqj ri wkh surjudp1 Wkh hvwlpdwhv vxjjhvw wkdw wkh frqvwuxfwlrq ri sulpdu\| vfkrrov
	ohg wr dq lqfuhdvh lq hgxfdwlrq dqg hduqlqjv1 Fkloguhq djhg 5 wr 9 lq 4<:7 uhfhlyhg 3145 wr
	314< pruh \|hduv ri hgxfdwlrq iru hdfk vfkrro frqvwuxfwhg shu 4/333 fkloguhq lq wkhlu uhjlrq
	ri eluwk1 Xvlqj wkh yduldwlrqv lq vfkrrolqj jhqhudwhg e\| wklv srolf\| dv lqvwuxphqwdo yduldeohv
	iru wkh lpsdfw ri hgxfdwlrq rq zdjhv jhqhudwhv hvwlpdwhv ri hfrqrplf uhwxuqv wr hgxfdwlrq
	udqjlqj iurp 91; shufhqw wr 4319 shufhqw1 +MHO L5/ M64/ R48/ R55,
	Wkh txhvwlrq ri zkhwkhu lqyhvwphqw lq lqiudvwuxfwxuh lqfuhdvhv kxpdq fdslwdo dqg uhgxfhv
	sryhuw\| kdv orqj ehhq d frqfhuq wr ghyhorsphqw hfrqrplvwv dqg srolf\|pdnhuv1 Iru h{dpsoh/
	dydlodelolw\| ri vfkrrolqj lqiudvwuxfwxuh kdv ehhq vkrzq wr eh srvlwlyho\| fruuhodwhg zlwk frpsohwhg
	vfkrrolqj ru hquroophqw e\| Qdqf\| Elugvdoo +4<;8, lq xuedq Eud}lo/ Ghqqlv GhWud\| +4<;;, dqg Ohh

	5r ii?} @?_ t@jie @e!f\| i?qf^ f?rfq ir 5r ii
	i?q\|e r\|i? ? t?_i?fq@d ,_f?rf rei4 @? k? q @* i*r)
	, qfe4f?\|
	,q\| fe # i
	t
	abstract
	between 4<:6 and 4<:;/ the indonesian government constructed over 94/333 primar\|
	schools throughout the countr\|1 this is one of the largest school construction programs on
	record1 i evaluate the eect of this program on education and wages b\| combining dierences
	across regions in the number of schools constructed with dierences across cohorts induced
	b\| the timing of the program1 the estimates suggest that the construction of primar\| schools
	led to an increase in education and earnings1 children aged 5 to 9 in 4<:7 received 3145 to
	314< more \|ears of education for each school constructed per 4/333 children in their region
	of birth1 using the variations in schooling generated b\| this polic\| as instrumental variables
	for the impact of education on wages generates estimates of economic returns to education
	ranging from 91; percent to 4319 percent1 +jel i5/ j64/ o48/ o55,
	the question of whether investment in infrastructure increases human capital and reduces
	povert\| has long been a concern to development economists and polic\|makers1 for e{ample/
	availabilit\| of schooling infrastructure has been shown to be positivel\| correlated with completed
	schooling or enrollment b\| nanc\| birdsall +4<;8, in urban bra}il/ dennis detra\| +4<;;, and lee

Discovering Python & R

— my journey as a worker bee in quant finance

Python

How to become a Data Scientist in 6 months

My First Data Science Hackathon

Solutions to Machine Learning Programming Assignments

Spot the Difference — It’s NumPy!

MITx 6.00.2x Introduction to Computational Thinking and Data Science (Fall 2015)

Funny Python

Python to the Rescue

Karatsuba Multiplication Algorithm – Python Code

Example

Teach Yourself Machine Learning the Hard Way!

Why Parselmouth Harry Potter is also Parsermouth Harry Potter

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Example

Share this:

Share this:

Share this: