Binning and Categorization


The Binning Process


We decided to try a different type of model to see if that would give us better results. To start us off on this new path we decided to bin the salaries into ranges. We thought this would be the model since it wouldn’t be trying to predict an exact amount anymore. But how many bins should we use?

We started by looking at our range of salary trying to break that into equal chunks. In doing so we had nine bins. One thing we noticed was that the lower three bins were significantly larger than the last six. We decided to combine the last six bins into one so each bin had values in the thousands. From there we added a column that would be the bin code. For example, if you made less than $1 million your salary code was a zero.



Salary Bin Amount of Players in Bin Binning Code
Less than $1 million 7437 0
$1 to $5 million 5172 1
$5 to $10 million 1248 2
More than $10 million 1156 3

A Slew of Classifiers Couldn't Help Us


Now we have a classification data problem, so our algorithms will be classifiers, not regressors! We tried KMeans, KNN, RandomForest, ExtraTrees, and Support Vector Machine(SVM). Upshot: no classifier could achieve > 50% accuracy. Again we were disappointed with the results…