I have been searching for good MOOCs to get me started with R and Python programming languages. I’ve already begun the Johns Hopkins University Data Science Specialization on Coursera. It consists of 9 courses (including Data Scientist’s Toolbox, R programming, Getting and Cleaning Data, Exploratory Data Analysis, Reproducible Research, Statistical Inference, Regression Models, Practical Machine Learning and Developing Data Products), ending with a 7-week Capstone Project that I’m MOST excited about. I want to get there fast.
The Capstone would consist of :
- Building a predictive data model for analyzing large textual data sets
- Cleaning real-world data and perform complex regressions
- Creating visualizations to communicate data analyses
- Building a final data product in collaboration with SwiftKey, award-winning developer of leading keyboard apps for smartphones
I started with the R programming course where I found the programming assignments to be moderately difficult. They were good practice and also time-consuming for me since I haven’t yet gotten used to the R syntax, which is supposedly unintuitive. Anyway, I completed the course with distinction (90+ marks) scoring 95 on 100, losing 5 because I hadn’t familiarized myself with Git / GitHub. I did this course for a verified certificate, which cost me $29, and looks like this:
I won’t be paying for any of the remaining courses though, but still will get a certificate of accomplishment for each course I pass. I have alredy begun with Getting and Cleaning Data and Data Scientist’s Toolbox.
I checked today, and it seems Andrew Ng’s Machine Learning course has gone open to all and is self-paced. A lot of people have gone on to participate in Kaggle competitions with what they learnt in his course, so I’d like to experience it — even though it’s taught with Octave / MATLAB. My very short term goal is to start participating in these competitions ASAP.
I will be learning the basics of Git this week and along with that, about reading from MySQL, HDF5, the web and APIs. I intend to start reading Trevor Hastie’s highly recommended book, Introduction to Statistical Learning.
[DOWNLOAD LINK TO THE BOOK]
Meanwhile, I need to get started with Git and GitHub too, and I found a very useful blog by Kevin Markham and his short concise videos are great introductory material.
Incidentally, I was in a dilemma whether to start with Hastie’s material or Andrew Ng’s course first. This is what Kevin had to say
The only reason I have reservations against Andrew Ng’s course is that its instruction isn’t in R or Python. Also, CTO and co-founder of Kaggle, Ben Hamner mentions here how useful R and Python are vis-à-vis Matlab / Octave.