Categorization

The Binning Process

We decided to try a different type of model to see if that would give us better results. To start us off on this new path we decided to bin the salaries into ranges. We thought this would be the model since it wouldn’t be trying to predict an exact amount anymore. But how many bins should we use?

We started by looking at our range of salary trying to break that into equal chunks. In doing so we had nine bins. One thing we noticed was that the lower three bins were significantly larger than the last six. We decided to combine the last six bins into one so each bin had values in the thousands. From there we added a column that would be the bin code. For example, if you made less than $1 million your salary code was a zero.

Salary Bin	Amount of Players in Bin	Binning Code
Less than $1 million	7437	0
$1 to $5 million	5172	1
$5 to $10 million	1248	2
More than $10 million	1156	3

A Slew of Classifiers Couldn't Help Us

Now we have a classification data problem, so our algorithms will be classifiers, not regressors! We tried KMeans, KNN, RandomForest, ExtraTrees, and Support Vector Machine(SVM). Upshot: no classifier could achieve > 50% accuracy. Again we were disappointed with the results…

Binning and Categorization

The Binning Process

A Slew of Classifiers Couldn't Help Us