Getting Started

I have been searching for good MOOCs to get me started with R and Python programming languages. I’ve already begun the Johns Hopkins University Data Science Specialization on Coursera. It consists of 9 courses (including Data Scientist’s Toolbox, R programming, Getting and Cleaning Data, Exploratory Data Analysis, Reproducible Research, Statistical Inference, Regression Models, Practical Machine Learning and Developing Data Products), ending with a 7-week Capstone Project that I’m MOST excited about. I want to get there fast.

The Capstone would consist of :
  • Building a predictive data model for analyzing large textual data sets
  • Cleaning real-world data and perform complex regressions
  • Creating visualizations to communicate data analyses
  • Building a final data product in collaboration with SwiftKey, award-winning developer of leading keyboard apps for smartphones

I started with the R programming course where I found the programming assignments to be moderately difficult. They were good practice and also time-consuming for me since I haven’t yet gotten used to the R syntax, which is supposedly unintuitive. Anyway, I completed the course with distinction (90+ marks) scoring 95 on 100, losing 5 because I hadn’t familiarized myself with Git / GitHub. I did this course for a verified certificate, which cost me $29, and looks like this:

Coursera rprog 2015

I won’t be paying for any of the remaining courses though, but still will get a certificate of accomplishment for each course I pass. I have alredy begun with Getting and Cleaning Data and Data Scientist’s Toolbox.

I checked today, and it seems Andrew Ng’s Machine Learning course has gone open to all and is self-paced. A lot of people have gone on to participate in Kaggle competitions with what they learnt in his course, so I’d like to experience it — even though it’s taught with Octave / MATLAB. My very short term goal is to start participating in these competitions ASAP.

Kaggle Competitions

I will be learning the basics of Git this week and along with that, about reading from MySQL, HDF5, the web and APIs. I intend to start reading Trevor Hastie’s highly recommended book, Introduction to Statistical Learning.

ISL Cover 2


Meanwhile, I need to get started with Git and GitHub too, and I found a very useful blog by Kevin Markham and his short concise videos are great introductory material.

Incidentally, I was in a dilemma whether to start with Hastie’s material or Andrew Ng’s course first. This is what Kevin had to say

Hastie or Ng

The only reason I have reservations against Andrew Ng’s course is that its instruction isn’t in R or Python. Also, CTO and co-founder of Kaggle, Ben Hamner mentions here how useful R and Python are vis-à-vis Matlab / Octave.

Ben Hamner on Python R Matlab v2


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s