This was a hackathon + workshop conducted by Analytics Vidhya in which I took part and made it to the #1 on the leaderboard. The data set was straight-forward and quite clean with only a minor need for missing value treatment. This post will might be useful for people who want a walk-through on the steps involving data munging and developing machine-learned models.
The workshop ended with a basic hackathon with data given on age, education, working class, occupation, marital status and gender of individuals and one had to predict the income bracket of these individuals.
I’ve posted the data and my code and solutions in this GitHub repo. An IPython Notebook has also been shared.
I approached the problem first by attempting some feature engineering (other than missing value treatment) on the data, and then ran a basic logistic classifier and a random forest classifier. However it turned out that these models performed better without feature engineering, which shows the dataset was already quite clean and informative to begin with for this competition.
I later attempted gradient boosting with parameter tuning to maximizing scores.