First, Linear Regression of Each Independent Variable vs. ADJ Salary
We ran each individual variable of interest through several regressor algorithms - LinearRegression(), GradientBoostingRegressor(), RandomForestRegressor(), smf.ols()-
to get some benchmark scores.
No surprises … no variable achieved r2 higher than .17
-
We are not explaining much variance in our target.
-
Our salary predictions on these models won't be very good.
Next, Attempt a Multivariate Linear Regression with All of the Independent Variables
No dice … explained variance around .20. At this point we were getting discouraged … maybe predicting salary from offensive performance stats isn't really possible.
So we sought advice from Dom and the TAs. TA Colin suggested that we attempt binning the ADJ Salary target and then turn the problem into a classification problem, which opens the door to a whole slew of different machine learning algorithms to use!
We can do better!