MITx: 6.008.1x Computational Probability and Inference

I got really interested in Computational Probability and Inference (6.008.1x) for the following reasons:

  1. I love probability and have solved countless problems on probability ever since I learned math
  2. …and yet I’ve never coded up probabilistic models!
  3. The assignments and project work for this course are to be implemented in Python!

You don’t need to have prior experience in either probability or inference, but you should be comfortable with basic Python programming and calculus.

– Basic discrete probability theory
– Graphical models as a data structure for representing probability distributions
– Algorithms for prediction and inference
– How to model real-world problems in terms of probabilistic inference

The course started on September 12, is 12-weeks long and is structured in the following manner:

Week 1 (9/12 – 9/16): Introduction to probability and computation
A first look at basic discrete probability, how to interpret it, what probability spaces and random variables are, and how to code these up and do basic simulations and visualizations.

Week 2 (9/19 – 9/23): Incorporating observations
Incorporating observations using jointly distributed random variables and using events. Three classic probability puzzles are presented to help elucidate how to interpret probability: Simpson’s paradox, Monty Hall, boy or girl paradox.

Week 3 (9/26 – 9/30): Introduction to inference, structure in distributions, and information measures
The product rule and inference with Bayes’ theorem. Independence: A structure in distributions. Measures of randomness: entropy and information divergence. Mutual information.

Week 4 (10/3 – 10/7): Expectations, and driving to infinity in modeling uncertainty
Expected values of random variables. Classic puzzle: the two envelope problem. Probability spaces and random variables that take on a countably infinite number of values and inference with these random variables.

Week 5 (10/10 – 10/14): Efficient representations of probability distributions on a computer
Introduction to undirected graphical models as a data structure for representing probability distributions and the benefits/drawbacks of these graphical models. Incorporating observations with graphical models.

Week 6 (10/17 – 10/21): Inference with graphical models, part I
Computing marginal distributions with graphical models in undirected graphical models including hidden Markov models..

Week 7 (10/24 – 10/28): Inference with graphical models, part II
Computing most probable configurations with graphical models including hidden Markov models.

Week 8 (10/31 – 11/4): Introduction to learning probability distributions
Learning an underlying unknown probability distribution from observations using maximum likelihood. Three examples: estimating the bias of a coin, the German tank problem, and email spam detection.

Week 9 (11/7 – 11/11): Parameter estimation in graphical models
Given the graph structure of an undirected graphical model, we examine how to estimate all the tables associated with the graphical model.

Week 10 (11/14 – 11/18): Model selection with information theory
Learning both the graph structure and the tables of an undirected graphical model with the help of information theory. Mutual information of random variables.

Week 11 (11/21 – 11/25): Final project
Final project assigned

Week 12 (11/28 – 12/2): Final project


I’m SO taking this course. Hope this interests you as well!

How to become a Data Scientist in 6 months

Disclaimer: I’m not a data scientist yet. That’s still work in progress, but I’d recommend this excellent talk given by  Tetiana Ivanova to put an enthusiast’s data science journey in perspective.

MITx 15.071x (Analytics Edge) – 2016

I am auditing this course currently and just completed its 2nd assignment. It’s probably one of the best courses out there to learn R in a way that you go beyond the syntax with an objective in mind – to do analytics and run machine learning algorithms to derive insight from data. This course is different from machine learning courses by say, Andrew Ng in that this course won’t focus on coding the algorithm and rather would emphasize on diving right into the implementation of those algorithms using libraries that the R programming language already equips us with.

Take a look at the course logistics. And hey, they’ve got a Kaggle competition!


There’s still time to enroll and grab a certificate (or simply audit). The course is offered once a year. I met a bunch of people who did well at a data hackathon I had gone to recently, who had learned the ropes in data science thanks to Analytics Edge.

My First Data Science Hackathon

So after 8 months of playing around with R and Python and blog post after blog post, I found myself finally hacking away at a problem set from the 17th storey of the Hindustan Times building at Connaught Place. I had entered my first ever data science hackathon conducted by Analytics Vidhya, a pioneer in analytics learning in India. Pizzas and Pepsi were on the house. Like any predictive analysis hackathon, this one accepted unlimited entries till submission time. It was from 2pm to 4:30pm today –  2.5 hours, of which I ended up wasting 1.5 hours trying to make my first submission which encountered submission error after submission error until the problem was fixed finally post lunch. I had 1 hour to try my best. It wasn’t the best performance, but I thought of blogging this experience anyway, as a reminder of the work that awaits me. I want to be the one winning prize money at the end of the day.


screenshot-datahack analyticsvidhya com 2015-12-20 18-41-12


Statistical Learning – 2016

On January 12, 2016, Stanford University professors Trevor Hastie and Rob Tibshirani will offer the 3rd iteration of Statistical Learning, a MOOC which first began in January 2014, and has become quite a popular course among data scientists. It is a great place to learn statistical learning (machine learning) methods using the R programming language. For a quick course on R, check this out – Introduction to R Programming

Slides and videos for Statistical Learning MOOC by Hastie and Tibshirani available separately here. Slides and video tutorials related to this book by Abass Al Sharif can be downloaded here.

The course covers the following book which is available for free as a PDF copy.

Logistics and Effort:


Rough Outline of Schedule (based on last year’s course offering):

Week 1: Introduction and Overview of Statistical Learning (Chapters 1-2)
Week 2: Linear Regression (Chapter 3)
Week 3: Classification (Chapter 4)
Week 4: Resampling Methods (Chapter 5)
Week 5: Linear Model Selection and Regularization (Chapter 6)
Week 6: Moving Beyond Linearity (Chapter 7)
Week 7: Tree-based Methods (Chapter 8)
Week 8: Support Vector Machines (Chapter 9)
Week 9: Unsupervised Learning (Chapter 10)

Prerequisites: First courses in statistics, linear algebra, and computing.


MITx 6.00.2x Introduction to Computational Thinking and Data Science (Fall 2015)

MIT’s Fall 2015 iteration of 6.00.2x starts today. After an enriching learning experience with 6.00.1x, I have great expectations from this course. As the course website mildly puts it, 6.00.2x is an introduction to using computation to understand real-world phenomena. MIT OpenCourseware (OCW) mirroring the material covered in 6.00.1x and 6.00.2x can be found here.

The course follows this book by John Guttag (who happens to be one of the instructors for this course). However, purchasing the book isn’t a necessity for this course.

Introduction to Computation and Programming Using Python

One thing I loved about 6.00.1x was its dedicated Facebook group, which gave a community / classroom-peergroup feel to the course. 6.00.2x also has a Facebook group. Here’s a sneak peak:


The syllabus and schedule for this course is shown below. The course is spread out over 2 months which includes 7 weeks of lectures.

MITx 6.00.2x Fall 2015 Course Calendar
MITx 6.00.2x Fall 2015 Course Calendar

The prerequisites for this course are pretty much covered in this set of tutorial videos that have been created by one of the TAs for 6.00.1x. If you’ve not taken 6.00.1x in the past, you can go through these videos (running time < 1hr) to judge whether or not to go ahead with 6.00.2x.

So much for the update. Got work to do! 🙂

Funny Python

If a programming language is named after a sketch comedy troupe, one knows what to expect. Python IS a funny language with its own bag of surprises.

Monty Python’s Flying Circus

For instance, If you’ve just moved from a language such as C to Python and you’re missing curly braces (how can one not want whitespaces!!), and you try this:

>>> from __future__ import braces

from __future__ import braces
Click Image for Larger View

Or say, if you try importing this.

>>> import this

import this
A sense of humour is required for proper interpretation

Or if you ever wanted to know why XKCD’s Cueball left Perl for Python, you should know, that it was for gravity defying stunts that he couldn’t perform anywhere else. Just import antigravity!

>>> import antigravity

You’re led to this webcomic on your browser.

import antigravity

So the upshot is that you can get tickled and trolled by Python every now and then, keeping in line with its rich tradition of doing so (check out video below).