How to become a Data Scientist in 6 months

Disclaimer: I’m not a data scientist yet. That’s still work in progress, but I’d recommend this excellent talk given by  Tetiana Ivanova to put an enthusiast’s data science journey in perspective.

Advertisements

Google’s New Deep Learning MOOC Using TensorFlow

Deep learning became a hot topic in machine learning in the last 3-4 years (see inset below) and recently, Google released TensorFlow (a Python based deep learning toolkit) as an open source project to bring deep learning to everyone.

deep_learning_google_trends
Interest in the Google search term Deep Learning over time

If you have wanted to get your hands dirty with TensorFlow or needed more direction with that, here’s some good news – Google is offering an open MOOC on deep learning methods using TensorFlow here. This course has been developed with Vincent Vanhoucke, Principal Scientist at Google, and technical lead in the Google Brain team. However, this is an intermediate to advanced level course and assumes you have taken a first course in machine learning, or that you are at least familiar with supervised learning methods.

Google’s overall goal in designing this course is to provide the machine learning enthusiast a rapid and direct path to solving real and interesting problems with deep learning techniques.

What is Deep Learning?

Course Overview

Statistical Learning – 2016

On January 12, 2016, Stanford University professors Trevor Hastie and Rob Tibshirani will offer the 3rd iteration of Statistical Learning, a MOOC which first began in January 2014, and has become quite a popular course among data scientists. It is a great place to learn statistical learning (machine learning) methods using the R programming language. For a quick course on R, check this out – Introduction to R Programming

Slides and videos for Statistical Learning MOOC by Hastie and Tibshirani available separately here. Slides and video tutorials related to this book by Abass Al Sharif can be downloaded here.

The course covers the following book which is available for free as a PDF copy.

Logistics and Effort:

statLearnEffort

Rough Outline of Schedule (based on last year’s course offering):

Week 1: Introduction and Overview of Statistical Learning (Chapters 1-2)
Week 2: Linear Regression (Chapter 3)
Week 3: Classification (Chapter 4)
Week 4: Resampling Methods (Chapter 5)
Week 5: Linear Model Selection and Regularization (Chapter 6)
Week 6: Moving Beyond Linearity (Chapter 7)
Week 7: Tree-based Methods (Chapter 8)
Week 8: Support Vector Machines (Chapter 9)
Week 9: Unsupervised Learning (Chapter 10)

Prerequisites: First courses in statistics, linear algebra, and computing.

 

Supplementary Material to Andrew Ng’s Machine Learning MOOC

Although the lecture videos and lecture notes from Andrew Ng‘s Coursera MOOC are sufficient for the online version of the course, if you’re interested in more mathematical stuff or want to be challenged further, you can go through the following notes and problem sets from CS 229, a 10-week course that he teaches at Stanford (which also happens to be the most enrolled course on campus). It’s not hard to end up with a 100% score on his MOOC which is obviously a (much) watered down version of the course he teaches at Stanford, at least in terms of difficulty. If you don’t believe me, just have a go at the problem sets from the links below.

Lecture Notes

Section Notes

Handouts and Problem Sets

Solutions to Machine Learning Programming Assignments

This post contains links to a bunch of code that I have written to complete Andrew Ng’s famous machine learning course which includes several interesting machine learning problems that needed to be solved using the Octave / Matlab programming language. I’m not sure I’d ever be programming in Octave after this course, but learning Octave just so that I could complete this course seemed worth the time and effort. I would usually work on the programming assignments on Sundays and spend several hours coding in Octave, telling myself that I would later replicate the exercises in Python.

If you’ve taken this course and found some of the assignments hard to complete, I think it might not hurt to go check online on how a particular function was implemented. If you end up copying the entire code, it’s probably your loss in the long run. But then John Maynard Keynes once said, ‘In the long run we are all dead‘. Yeah, and we wonder why people call Economics the dismal science!

Most people disregard Coursera’s feeble attempt at reigning in plagiarism by creating an Honor Code, precisely because this so-called code-of-conduct can be easily circumvented. I don’t mind posting solutions to a course’s programming assignments because GitHub is full to the brim with such content. Plus, it’s always good to read others’ code even if you implemented a function correctly. It helps understand the different ways of tackling a given programming problem.

ex1
ex2
ex3
ex4
ex5
ex6
ex7
ex8

Enjoy!

 

Teach Yourself Machine Learning the Hard Way!

This formula is kick-ass!

Darshan Hegde

It has been 3 years since I have steered my interests towards Machine Learning. I had just graduated from college with a Bachelor of Engineering in Electronics and Communication Engineering. Which is, other way of saying that I was:

  • a toddler in programming.
  • little / no knowledge of algorithms.
  • studied engineering math, but it was rusty.
  • no knowledge of modern optimization.
  • zero knowledge of statistical inference.

I think, most of it is true for many engineering graduates (especially, in India !). Unless, you studied mathematics and computing for undergrad.

Lucky for me, I had a great mentor and lot of online materials on these topics. This post will list many such materials I found useful, while I was learning it the hard way !

All the courses that I’m listing below have homework assignments. Make sure you work through each one of them.

1. Learn Python

If you are new to programming…

View original post 507 more words

Scatter Plot Bug Fix in Dato’s GraphLab Create ML Package in Python

I have been using Dato’s GraphLab Create for Coursera’s new Machine Learning Specialization that uses Python. Like me, if you’ve been facing trouble obtaining scatter plots on your canvas in GraphLab Create despite the following code:

graphlab.canvas.set_target('ipynb')

…then no worries, there is a quick fix. I’ve been deliberately lousy with the presentation, so sorry about that. Chances are that no one’s going to end up reading this anyway. I saw this problem being discussed on a Dato forum, so I decided to blog about the fix.

EDIT: Note that this problem is in GraphLab Create v1.6 only. They came up with v1.6.1 a few days after the problem was escalated on their forum, so a good option would be to upgrade GraphLab Create.

The problem you face should looks something like this (click images below to enlarge):Screenshot from 2015-09-25 14:10:20

To solve the problem:

Locate sframe.py from your home directory by searching for it from your desktop environment (applies to Windows users too). I found it in the following path on my computer:

~/anaconda/lib/python2.7/site-packages/graphlab/canvas/views

The file sframe.py should look like this:

Screenshot from 2015-09-25 15:49:44

Then replace the code in lines 255-227 of the opened .py file with the code highlighted below:

Screenshot from 2015-09-25 15:53:19

This should take care of the problem for good.

Now you have your desired result:

Screenshot from 2015-09-25 16:18:41