**Disclaimer:** I’m not a data scientist yet. That’s still work in progress, but I’d recommend this excellent talk given by Tetiana Ivanova to put an enthusiast’s data science journey in perspective.

# Machine Learning

# Google’s New Deep Learning MOOC Using TensorFlow

Deep learning became a hot topic in machine learning in the last 3-4 years (see inset below) and recently, Google released TensorFlow (a Python based deep learning toolkit) as an open source project to bring deep learning to everyone.￼

If you have wanted to get your hands dirty with TensorFlow or needed more direction with that, here’s some good news – Google is offering an open MOOC on deep learning methods using TensorFlow here. This course has been developed with Vincent Vanhoucke, Principal Scientist at Google, and technical lead in the Google Brain team. However, this is an intermediate to advanced level course and assumes you have taken a first course in machine learning, or that you are at least familiar with supervised learning methods.

Google’s overall goal in designing this course is to provide the machine learning enthusiast a rapid and direct path to solving real and interesting problems with deep learning techniques.

**What is Deep Learning?**

**Course Overview**

# Statistical Learning – 2016

On **January 12, 2016**, Stanford University professors Trevor Hastie and Rob Tibshirani will offer the 3rd iteration of Statistical Learning, a MOOC which first began in January 2014, and has become quite a popular course among data scientists. It is a great place to learn statistical learning (machine learning) methods using the **R** programming language. For a quick course on R, check this out – Introduction to R Programming

Slides and videos for Statistical Learning MOOC by Hastie and Tibshirani available separately here. Slides and video tutorials related to this book by Abass Al Sharif can be downloaded here.

The course covers the following book which is available for free as a PDF copy.

**Logistics and Effort:**

**Rough Outline of Schedule** (based on last year’s course offering):

**Week 1:** Introduction and Overview of Statistical Learning (Chapters 1-2)

**Week 2:** Linear Regression (Chapter 3)

**Week 3:** Classification (Chapter 4)

**Week 4:** Resampling Methods (Chapter 5)

**Week 5:** Linear Model Selection and Regularization (Chapter 6)

**Week 6:** Moving Beyond Linearity (Chapter 7)

**Week 7:** Tree-based Methods (Chapter 8)

**Week 8:** Support Vector Machines (Chapter 9)

**Week 9:** Unsupervised Learning (Chapter 10)

**Prerequisites:** First courses in statistics, linear algebra, and computing.

# Supplementary Material to Andrew Ng’s Machine Learning MOOC

Although the lecture videos and lecture notes from **Andrew Ng**‘s Coursera MOOC are sufficient for the online version of the course, if you’re interested in more mathematical stuff or want to be challenged further, you can go through the following notes and problem sets from **CS 229**, a 10-week course that he teaches at Stanford (which also happens to be the most enrolled course on campus). It’s not hard to end up with a 100% score on his MOOC which is obviously a (much) watered down version of the course he teaches at Stanford, at least in terms of difficulty. If you don’t believe me, just have a go at the problem sets from the links below.

**Lecture Notes**

- Lecture notes 1 (ps) (pdf) Supervised Learning, Discriminative Algorithms
- Lecture notes 2 (ps) (pdf) Generative Algorithms
- Lecture notes 3 (ps) (pdf) Support Vector Machines
- Lecture notes 4 (ps) (pdf) Learning Theory
- Lecture notes 5 (ps) (pdf) Regularization and Model Selection
- Lecture notes 6 (ps) (pdf) Online Learning and the Perceptron Algorithm. (optional reading)
- Lecture notes 7a (ps) (pdf) Unsupervised Learning, k-means clustering.
- Lecture notes 7b (ps) (pdf) Mixture of Gaussians
- Lecture notes 8 (ps) (pdf) The EM Algorithm
- Lecture notes 9 (ps) (pdf) Factor Analysis
- Lecture notes 10 (ps) (pdf) Principal Components Analysis
- Lecture notes 11 (ps) (pdf) Independent Components Analysis
- Lecture notes 12 (ps) (pdf) Reinforcement Learning and Control

**Section Notes**

- Section notes 1 (pdf) Linear Algebra Review and Reference
- Section notes 2 (pdf) Probability Theory Review
- Files for the Matlab tutorial: sigmoid.m, logistic_grad_ascent.m, matlab_session.m
- Section notes 4 (ps) (pdf) Convex Optimization Overview, Part I
- Section notes 5 (ps) (pdf) Convex Optimization Overview, Part II
- Section notes 6 (ps) (pdf) Hidden Markov Models
- Section notes 7 (pdf) The Multivariate Gaussian Distribution
- Section notes 8 (pdf) More on Gaussian Distribution
- Section notes 9 (pdf) Gaussian Processes

**Handouts and Problem Sets**

- Handout #1: Course Information (HTML) (pdf)
- Handout #2: Course Schedule (HTML) (pdf)
- Handout #3: Cover Sheet
- Handout #4: Practice Midterm 1 Solution: Solution
- Handout #5: Practice Midterm 2 Solution: Solution
- Problem Set 1 (pdf) Data: q1x.dat, q1y.dat, q2x.dat, q2y.dat Solution: Solution (pdf)
- Problem Set 2 (pdf) Data: ps2.zip Solution: Solution (pdf)
- Problem Set 3 (pdf) Solution: Solution (pdf)
- Problem Set 4 (pdf)

# Solutions to Machine Learning Programming Assignments

This post contains links to a bunch of code that I have written to complete Andrew Ng’s famous machine learning course which includes several interesting machine learning problems that needed to be solved using the Octave / Matlab programming language. I’m not sure I’d ever be programming in Octave after this course, but learning Octave just so that I could complete this course seemed worth the time and effort. I would usually work on the programming assignments on Sundays and spend several hours coding in Octave, telling myself that I would later replicate the exercises in **Python**.

If you’ve taken this course and found some of the assignments hard to complete, I think it might not hurt to go check online on how a particular function was implemented. If you end up copying the entire code, it’s probably your loss in the long run. But then John Maynard Keynes once said, ‘*In the long run we are all dead*‘. Yeah, and we wonder why people call Economics the dismal science!

Most people disregard Coursera’s feeble attempt at reigning in plagiarism by creating an ** Honor Code**, precisely because this so-called code-of-conduct can be easily circumvented. I don’t mind posting solutions to a course’s programming assignments because GitHub is full to the brim with such content. Plus, it’s always good to read others’ code even if you implemented a function correctly. It helps understand the different ways of tackling a given programming problem.

ex1

ex2

ex3

ex4

ex5

ex6

ex7

ex8

Enjoy!

# Teach Yourself Machine Learning the Hard Way!

This formula is kick-ass!

It has been 3 years since I have steered my interests towards Machine Learning. I had just graduated from college with a Bachelor of Engineering in Electronics and Communication Engineering. Which is, other way of saying that I was:

- a toddler in programming.
- little / no knowledge of algorithms.
- studied engineering math, but it was rusty.
- no knowledge of modern optimization.
- zero knowledge of statistical inference.

I think, most of it is true for many engineering graduates (especially, in India !). Unless, you studied mathematics and computing for undergrad.

Lucky for me, I had a great mentor and lot of online materials on these topics. This post will list many such materials I found useful, while I was learning it the hard way !

All the courses that I’m listing below have homework assignments. Make sure you work through each one of them.

**1. Learn Python**

If you are new to programming…

View original post 507 more words

# Scatter Plot Bug Fix in Dato’s GraphLab Create ML Package in Python

I have been using Dato’s GraphLab Create for Coursera’s new Machine Learning Specialization that uses Python. Like me, if you’ve been facing **trouble obtaining scatter plots on your canvas in GraphLab Create** despite the following code:

graphlab.canvas.set_target('ipynb')

…then no worries, there is a quick fix. I’ve been deliberately lousy with the presentation, so sorry about that. Chances are that no one’s going to end up reading this anyway. I saw this problem being discussed on a Dato forum, so I decided to blog about the fix.

**EDIT:** Note that this problem is in *GraphLab Create v1.6* only. They came up with *v1.6.1* a few days after the problem was escalated on their forum, so a good option would be to upgrade GraphLab Create.

**The problem you face should looks something like this** (*click images below to enlarge***):**

**To solve the problem:**

Locate **sframe.py** from your home directory by searching for it from your desktop environment (applies to **Windows** users too). I found it in the following path on my computer:

*~/anaconda/lib/python2.7/site-packages/graphlab/canvas/views*

The file **sframe.py** should look like this:

Then replace the code in **lines 255-227** of the opened **.py** file with the code **highlighted below**:

This should take care of the problem for good**.**

**Now you have your desired result:**