MOOC – Anirudh Jayaraman

Abu Mostafa’s Machine Learning MOOC – Now on EdX

This was in the pipeline for quite some time now. I have been waiting for his lectures on a platform such as EdX or Coursera, and the day has arrived. You can enroll and start with week 1’s lectures as they’re live now.

This course is taught by none other than Dr. Yaser S. Abu – Mostafa, whose textbook on machine learning, Learning from Data is #1 bestseller textbook (Amazon) in all categories of Computer Science. His online course has been offered earlier over here.

Teaching

Dr. Abu-Mostafa received the Clauser Prize for the most original doctoral thesis at Caltech. He received the ASCIT Teaching Awards in 1986, 1989 and 1991, the GSC Teaching Awards in 1995 and 2002, and the Richard P. Feynman prize for excellence in teaching in 1996.

Live ‘One-take’ Recordings

The lectures have been recorded from a live broadcast (including Q&A, which will let you gauge the level of CalTech students taking this course). In fact, it almost seems as though Abu Mostafa takes a direct jab at Andrew Ng’s popular Coursera MOOC by stating the obvious on his course page.

A real Caltech course, not a watered-down version

Again, while enrolling note that this is what Abu Mostafa had to say about the online course: “A Caltech course does not cater to short attention spans, and it may not provide instant gratification…[like] many MOOCs out there that are quite simple and have a ‘video game’ feel to them.” Unsurprisingly, many online students have dropped out in the past, but some of those students who “complained early on but decided to stick with the course had very flattering words to say at the end”.

Prerequisites

Basic probability
Basic matrices
Basic calculus
Some programming language/platform (I choose Python!)

If you’re looking for a challenging machine learning course, this is probably one you must take.

September 24, 2016

Deterministic Selection Algorithm Python Code

Through this post, I’m sharing Python code implementing the median of medians algorithm, an algorithm that resembles quickselect, differing only in the way in which the pivot is chosen, i.e, deterministically, instead of at random.

Its best case complexity is O(n) and worst case complexity O(nlog₂n)

I don’t have a formal education in CS, and came across this algorithm while going through Tim Roughgarden’s Coursera MOOC on the design and analysis of algorithms. Check out my implementation in Python.

	def merge_tuple(a,b):
	""" Function to merge two arrays of tuples """
	c = []
	while len(a) != 0 and len(b) != 0:
	if a[0][0] < b[0][0]:
	c.append(a[0])
	a.remove(a[0])
	else:
	c.append(b[0])
	b.remove(b[0])
	if len(a) == 0:
	c += b
	else:
	c += a
	return c

	def mergesort_tuple(x):
	""" Function to sort an array using merge sort algorithm """
	if len(x) == 0 or len(x) == 1:
	return x
	else:
	middle = len(x)/2
	a = mergesort_tuple(x[:middle])
	b = mergesort_tuple(x[middle:])
	return merge_tuple(a,b)

	def lol(x,k):
	""" Function to divide a list into a list of lists of size k each. """
	return [x[i:i+k] for i in range(0,len(x),k)]

	def preprocess(x):
	""" Function to assign an index to each element of a list of integers, outputting a list of tuples"""
	return zip(x,range(len(x)))

	def partition(x, pivot_index = 0):
	""" Function to partition an unsorted array around a pivot"""
	i = 0
	if pivot_index !=0: x[0],x[pivot_index] = x[pivot_index],x[0]
	for j in range(len(x)-1):
	if x[j+1] < x[0]:
	x[j+1],x[i+1] = x[i+1],x[j+1]
	i += 1
	x[0],x[i] = x[i],x[0]
	return x,i

	def ChoosePivot(x):
	""" Function to choose pivot element of an unsorted array using 'Median of Medians' method. """
	if len(x) <= 5:
	return mergesort_tuple(x)[middle_index(x)]
	else:
	lst = lol(x,5)
	lst = [mergesort_tuple(el) for el in lst]
	C = [el[middle_index(el)] for el in lst]
	return ChoosePivot(C)

	def DSelect(x,k):
	""" Function to """
	if len(x) == 1:
	return x[0]
	else:
	xpart = partition(x,ChoosePivot(preprocess(x))[1])
	x = xpart[0] # partitioned array
	j = xpart[1] # pivot index
	if j == k:
	return x[j]
	elif j > k:
	return DSelect(x[:j],k)
	else:
	k = k – j – 1
	return DSelect(x[(j+1):], k)

	arr = range(100,0,-1)
	print DSelect(arr,50)
	%timeit DSelect(arr,50)

view raw DSelect.py hosted with ❤ by GitHub

I get the following output:

51
100 loops, best of 3: 2.38 ms per loop

Note that on the same input, quickselect is faster, giving us:

1000 loops, best of 3: 254 µs per loop

July 22, 2016

Google’s New Deep Learning MOOC Using TensorFlow

Deep learning became a hot topic in machine learning in the last 3-4 years (see inset below) and recently, Google released TensorFlow (a Python based deep learning toolkit) as an open source project to bring deep learning to everyone.

deep_learning_google_trends — Interest in the Google search term *Deep Learning* over time

If you have wanted to get your hands dirty with TensorFlow or needed more direction with that, here’s some good news – Google is offering an open MOOC on deep learning methods using TensorFlow here. This course has been developed with Vincent Vanhoucke, Principal Scientist at Google, and technical lead in the Google Brain team. However, this is an intermediate to advanced level course and assumes you have taken a first course in machine learning, or that you are at least familiar with supervised learning methods.

Google’s overall goal in designing this course is to provide the machine learning enthusiast a rapid and direct path to solving real and interesting problems with deep learning techniques.

What is Deep Learning?

Course Overview

January 24, 2016

Statistical Learning – 2016

On January 12, 2016, Stanford University professors Trevor Hastie and Rob Tibshirani will offer the 3rd iteration of Statistical Learning, a MOOC which first began in January 2014, and has become quite a popular course among data scientists. It is a great place to learn statistical learning (machine learning) methods using the R programming language. For a quick course on R, check this out – Introduction to R Programming

Slides and videos for Statistical Learning MOOC by Hastie and Tibshirani available separately here. Slides and video tutorials related to this book by Abass Al Sharif can be downloaded here.

The course covers the following book which is available for free as a PDF copy.

Logistics and Effort:

Rough Outline of Schedule (based on last year’s course offering):

Week 1: Introduction and Overview of Statistical Learning (Chapters 1-2)
Week 2: Linear Regression (Chapter 3)
Week 3: Classification (Chapter 4)
Week 4: Resampling Methods (Chapter 5)
Week 5: Linear Model Selection and Regularization (Chapter 6)
Week 6: Moving Beyond Linearity (Chapter 7)
Week 7: Tree-based Methods (Chapter 8)
Week 8: Support Vector Machines (Chapter 9)
Week 9: Unsupervised Learning (Chapter 10)

Prerequisites: First courses in statistics, linear algebra, and computing.

December 13, 2015

MITx 6.00.2x Introduction to Computational Thinking and Data Science (Fall 2015)

MIT’s Fall 2015 iteration of 6.00.2x starts today. After an enriching learning experience with 6.00.1x, I have great expectations from this course. As the course website mildly puts it, 6.00.2x is an introduction to using computation to understand real-world phenomena. MIT OpenCourseware (OCW) mirroring the material covered in 6.00.1x and 6.00.2x can be found here.

The course follows this book by John Guttag (who happens to be one of the instructors for this course). However, purchasing the book isn’t a necessity for this course.

One thing I loved about 6.00.1x was its dedicated Facebook group, which gave a community / classroom-peergroup feel to the course. 6.00.2x also has a Facebook group. Here’s a sneak peak:

The syllabus and schedule for this course is shown below. The course is spread out over 2 months which includes 7 weeks of lectures.

The prerequisites for this course are pretty much covered in this set of tutorial videos that have been created by one of the TAs for 6.00.1x. If you’ve not taken 6.00.1x in the past, you can go through these videos (running time < 1hr) to judge whether or not to go ahead with 6.00.2x.

So much for the update. Got work to do! 🙂

October 21, 2015

Machine Learning — New Coursera Specialization from the University of Washington

I have finally embarked on my first machine learning MOOC / Specialization. I love Python, and this course uses Python as the language of choice. Also, the instructors assert that Python is widely used in industry, and is becoming the de facto language for data science in industry. They use IPython Notebook in their assignments and videos.

The specialization offered by the University of Washington consists of 5 courses and a capstone project spread across about 8 months (September through April). The specialization’s first iteration kicked off yesterday.

The first course, Machine Learning Foundations: A Case Study Approach is 6 weeks long, running from September 22 through November 9.

The Instructors:

Emily Fox and Carlos Guestrin

Key Learning Outcomes
– Identify potential applications of machine learning in practice.
– Describe the core differences in analyses enabled by regression, classification, and clustering.
– Select the appropriate machine learning task for a potential application.
– Apply regression, classification, clustering, retrieval, recommender systems, and deep learning.
– Represent your data as features to serve as input to machine learning models.
– Assess the model quality in terms of relevant error metrics for each task.
– Utilize a dataset to fit a model to analyze new data.
– Build an end-to-end application that uses machine learning at its core.
– Implement these techniques in Python.

Week-by-Week
Week 1: Introductory welcome videos and the instructors’ views on the future of intelligent applications
Week 2: Predicting House Prices (Regression)
Week 3: Classification (Sentiment Analysis)
Week 4: Clustering and Similarity: Retrieving Documents
Week 5: Recommending Products
Week 6: Deep Learning: Searching for Images

EDIT

It’s been 3 days since the course began, and here’s how the classmate demographic looks like:

September 23, 2015

MOOC Review: Introduction to Computer Science and Programming Using Python (6.00.1x)

I enrolled in Introduction to Computer Science and Programming Using Python with the primary objective of learning to code using Python. This course, as the name suggests, is more than just about Python. It uses Python as a tool to teach computational thinking and serves as an introduction to computer science. The fact that it is a course offered by MIT, makes it special.

As a matter of fact, this course is aimed at students with little or no prior programming experience who feel the need to understand computational approaches to problem solving. Eric Grimson is an excellent teacher (also Chancellor of MIT) and he delves into the subject matter to a surprising amount of detail.

The video lectures are based on select chapters from an excellent book by John Guttag. While the book isn’t mandatory for the course (the video lectures do a great job of explaining the material on their own), I benefited greatly from reading the textbook. There are a couple of instances where the code isn’t presented properly in the slides (typos or indentation gone wrong when pasting code to the slides), but the correct code / study material can be found in the textbook. Also, for explanations that are more in-depth, the book comes in handy.

MIT offers this course in 2 parts via edX. While 6.00.1x is is an introduction to computer science as a tool to solve real-world analytical problems, 6.00.2x is an introduction to computation in data science. For a general look and feel of the course, this OCW link may be a good starting point. It contains material including video lectures and problem sets that are closely related to 6.00.1x and 6.00.2x.

Each week’s material of 6.00.1x consists of 2 topics, followed by a Problem Set. Problem Sets account for 40% of your grade. Video lectures are followed by finger exercises that can be attempted any number of times. Finger exercises account for 10% of your grade. The Quiz (kind of like a mid-term exam) and the Final Exam account for 25% each. The course is of 8 weeks duration and covers the following topics (along with corresponding readings from John Guttag’s textbook).

From the questions posted on forums, it was apparent that the section of this course that most people found challenging, was efficiency and orders of growth – and in particular, the Big-O asymptotic notation and problems on algorithmic complexity.

Lectures on Classes, Inheritance and Object Oriented Programming (OOP) were covered really well in over 100 minutes of video time. I enjoyed the problem set that followed, requiring the student to build an Internet news filter alerting the user when it noticed a news story that matched that user’s interests.

The final week had lectures on the concept of Trees, which were done hurriedly when compared to the depth of detail the instructor had earlier gone to, while explaining concepts from previous weeks. However, this material was covered quite well in Guttag’s textbook and the code for tree search algorithms was provided for perusal as part of the courseware.

At the end of the course, there were some interesting add-on videos to tickle the curiosity of the learner on the applications of computation in diverse fields such as medicine, robotics, databases and 3D graphics.

The Wiki tab for this course (in the edX platform) is laden with useful links to complement each week of lectures. I never got around to reading those, but I’m going through them now, and they’re quite interesting. It’s a section that nerds would love to skim through.

I learnt a great deal from this course (scored well too) putting in close to 6-hours-a-week of study. It is being offered again on August 26, 2015. In the mean time, I’m keeping my eyes open for MIT’s data science course (6.00.2x) which is likely to be offered in October, in continuation to 6.00.1x.

August 17, 2015

Statistics: The Sexiest Job of the Decade

Anyone who’s got a formal education in economics knows who Hal Varian is. He’s most popularly known for his book Intermediate Economics. He’s also the Chief Economist at Google. He is known to have famously stated more or less, that statisticians and data analysts would be the sexiest jobs of the next decade.

That has come true, to a great extent, and we’ll be seeing more.

Great places to learn more about data science and statistical learning:
1] Statistical Learning (Stanford)
2] The Analytics Edge (MIT)

In a paper called ‘Big Data: New Tricks for Econometrics‘, Varian goes on to say that:

In fact, my standard advice to graduate students these days is “go to the computer science department and take a class in machine learning.” There have been very fruitful collaborations between computer scientists and statisticians in the last decade or so, and I expect collaborations between computer scientists and econometricians will also be productive in the future.

July 7, 2015

The Merge Sort — Python Code

I have just begun working on a MOOC on algorithms offered by Stanford. Since this course gives us the liberty to choose a programming language, there isn’t any code discussed in those lectures. I plan to convert any algorithm discussed in those lectures into Python code. Since Merge Sort was the first algorithm discussed, I’m starting with that.

Merge Sort is supposedly a good introduction to divide and conquer algorithms, greatly improving upon selection, insertion and bubble sort techniques, especially when input size increases.

Pseudocode:

— Recursively sort the first half of the input array.
— Recursively sort the second half of the input array.
— Merge two sorted sub-lists into one list.

C = output [length = n]
A = 1st sorted array [n/2]
B = 2nd sorted array [n/2]
i = 0 or 1 (depending on the programming language)
j = 0 or 1 (depending on the programming language)

for k = 1 to n

if A(i) < B(j)
C(k) = A(i)
i = i + 1

else if A(i) > B(j)
C(k) = B(j)
j = j + 1

Note: the pseudocode for the merge operation ignores the end cases.

Visualizing the algorithm can be done in 2 stages — first, the recursive splitting of the arrays, 2 each 2 at a time, and second, the merge operation.

Here’s the Python code to merge sort an array.

	# Code for the merge subroutine

	def merge(a,b):
	""" Function to merge two arrays """
	c = []
	while len(a) != 0 and len(b) != 0:
	if a[0] < b[0]:
	c.append(a[0])
	a.remove(a[0])
	else:
	c.append(b[0])
	b.remove(b[0])
	if len(a) == 0:
	c += b
	else:
	c += a
	return c

	# Code for merge sort

	def mergesort(x):
	""" Function to sort an array using merge sort algorithm """
	if len(x) == 0 or len(x) == 1:
	return x
	else:
	middle = len(x)/2
	a = mergesort(x[:middle])
	b = mergesort(x[middle:])
	return merge(a,b)

view raw mergesort.py hosted with ❤ by GitHub

We can divide a list in half log₂ n times where n is the length of the list. The second process is the merge. Each item in the list will eventually be processed and placed on the sorted list. So the merge operation which results in a list of size n requires n operations. The result of this analysis is that log₂ n splits, each of which costs n for a total of nlog₂ n operations.

Other Algorithms:
Karatsuba Integer Multiplication Algorithm
Quick Sort Python Code

July 5, 2015

Review: An Introduction to Interactive Programming in Python (Part 1)

This class (Part 1 of a 2-part course on interactive programming using Python – and the first course of the Fundamentals of Computing Specialization offered by RICE Unviersity) was an excellent introduction to programming because of its focus on building interactive (and fun) applications with the lessons learned each week. Most introductory coding classes start with text based (boring?) programs, while all through this course you’re required to build a series of projects that get progressively complicated with every passing week. I’m not to be mistaken to be trashing conventional pedagogy, but then again, how many gifted coders do you know who learned to code after completing all the exercises, cover-to-cover of some programming textbook? The best way to learn to enjoy coding would be to build interactive stuff, and this course scores full points on that.

A short introduction to the class in a charmingly nerdy way

The mini-projects / assignments during the course are implemented on a cloud-based environment called CodeSkulptor (built by Scott Rixner, one of the instructors for this course). I found CodeSkulptor unique, in that it allows you to share your code (because it’s browser based) with just about anyone with an Internet connection and makes you work with a graphic user interface (GUI) module similar to Pygame, called Simplegui. It also had a debugging tool, called Viz Mode that helped visualize the process. It eases the task of debugging your code and you’ll realize how cool it is as you start using it more.

Since the course mini-projects were peer-reviewed, evaluating other people’s code also became a more straight-jacket affair, as everyone has their code on the same version of Python. This ensures that the focus is on learning to code, without wasting time on the logistics of programming environment (tuning differences in versions or IDEs). I especially enjoyed peer grading – for each mini project we completed, we had to evaluate and grade the work of 5 others. This was very rewarding – because I got the opportunity to fix bugs in others’ code (which makes you a better coder, I guess) and also got to see better implementations than the ones I had coded, further enriching the learning experience. Indeed, the benefits of peer grading and assessment have been well studied and documented.

Of all the assignments, the one I loved the most was implementing the classic arcade game Pong. You could try playing a version of the game I implemented here. It is a 2-player implementation, but you can play it as a single-player game, only if you imagine yourself to be answering this somewhat cheeky question! Which Pong character are you? Left or Right?

The principal reason behind my joining this course was the way it is structured and taught. We had to watch two sets of videos (part a and part b) and then complete one quiz for each set. The main task for each week was to complete a mini-project that was due along with the quizzes early Sunday morning, followed by assessment of peers’ mini-projects on the following Sunday-Wednesday. The instructors clearly put in A LOT OF WORK to make the lecture videos interesting, laced with humor, with just enough to get you going on your own with the week’s mini-project. That way you’d spend less time viewing the lecture videos, spending more time on actually getting the code for your mini-project to work. So in a way, one might say this course doesn’t follow standard pedagogy for an introductory programming course, but then, as Scott Rixner assures, “You’d know enough to be dangerous!”

The projects that were completed in Part 1 of this course were indeed exciting:

– Rock Paper Scissors Lizard Spock: A simple implementation played with the computer. This project covers basics on statements, expressions and variables, functions, logic and conditionals [I’m a huge fan of The Big Bang Theory, so I was obviously eager to complete this game. Instead of a series of if-elif-else clauses, this implementation used modular logic, all of which is taught in a really fun way. A great way to start off the course].
– Guess the Number: Computer chooses a random number between 1 and 100 and you guess that number. It covered event-driven programming, local and global variables, buttons and input fields [This game although fun, might have been more interesting to code if the computer had to guess the number that the player chose, using bisection search].
– Stopwatch: This was the first project that used a graphic user interface, using some modular arithmetic to get the digits of the ticking seconds in place. A game was also built on it where the player had to stop the watch right at the start of a second to score points. This game tested your reaction-time. It covered static drawing, timers and interactive drawing.
– Pong: The last project of Part 1 and the most fun. Creating the game required only a minor step-up from learnings from previous weeks. It covered knowledge of lists, keyboard input, motion, positional/velocity control. Coding the ball physics where you put to use high-school physics knowledge of elasticity and collisions was very enjoyable. In my game, I set elasticity = 1 (for perfectly elastic collisions)

In an interview with the founders of this MOOC, who spent they say that they spent over 1000 hours building it (Part 1 and Part 2 combined, I guess). That’s an awful lot of effort and it all shows in how brilliantly the class is executed. The support system in the class is excellent. You’ll always find help available within minutes of posting your doubts and queries on the forums. I’ve seen Joe Warren (one of the main instructors of the course) replying to forum posts quite regularly. In addition, there was enough supplementary material in the form of pages on concepts and examples, practice exercises, and video content created by students from previous iterations of the class to better explain concepts and aspects of game-building, improving upon the lecture material.

Concepts and Examples

Practice Exercises

Student-created Videos Explaining Concepts

Overall, I had a great learning experience. I completed Part 1 with a 100 per cent score even though I had a minor hiccup while building the game Pong, which was the most satisfying of all the projects in Part 1. I would review Part 2 when I’m done with that in August this year. I’d easily recommend this course to anyone wishing to start off with Python. It is a great place to be introduced to Python, but it shouldn’t be your ONLY resource. I have been taking MIT’s 6.01x introductory Python course side-by-side. I shall review that course as soon as I’m through with it. That course is pedagogically more text-bookish, and indeed they do profess the use of their textbook to complement the course. I’m 4 weeks into that course and finding that enjoyable too – albeit in a different way. I still haven’t lost a point on any of the assignments or finger exercises there, and hope the trend continues:

PS: In one of the forum threads, Joe posted a list of resources that could be referred to in addition to the class.

Python Books:

Byte of Python
Building Skills in Python
Building Skills in Object-Oriented Design (Python)
Data Structures and Algorithms in Python
Dive into Python
Dive into Python 3
Google’s Python Class
Hacking Secret Cyphers with Python – Al Sweigart
Hitchiker’s Guide to Python!
How to Think Like a Computer Scientist: Learning with Python
- How to Think Like a Computer Scientist: Learning with Python, Interactive Edition
Introduction to Programming Using Python – Cody Jackson
Invent Your Own Computer Games With Python – Al Sweigart
Learn Python The Hard Way
Lectures on scientific computing with python – J.R. Johansson
Making Games with Python & Pygame – Al Sweigart
Natural Language Processing with Python
Non-Programmer’s Tutorial for Python
Official Python Tutorial
Porting to Python 3: An In-Depth Guide
Problem Solving with Algorithms and Data Structures
Python Bibliotheca
Python Cookbook – David Beazley
Python for Fun
Python for Informatics: Exploring Information
Python for you and me
Python Koans
Python Module of the Week
Python Practice Book
Python Programming – PDF
Python Programming – Wikibooks
Python Scientific Lecture Notes
Python Standard Library – Fredrik Lundh
Snake Wrangling For Kids
Test-Driven Web Development with Python
Text Processing in Python
The Art and Craft of Programming
The Programming Historian – William J. Turkel, Adam Crymble and Alan MacEachern
Think Python – Allen B. Downey

Another List of Books:

http://pythonbooks.revolunet.com/ – about 50 books – Another good list of free python books that is kept up to date, and I believe are all free or open-source: (I won’t repeat all the books on the list here, just go check it out! Some are also on the list above, but not all)

Further Online Learning:

www.codecademy.com
www.khanacademy.com for math help
http://code.google.com/edu/languages/google-python-class/
http://pythontutor.com/visualize.html Python Tutor – the basis for CodeSkulptor’s Viz Mode

July 2, 2015

Tag: MOOC

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: