Six weeks ago I wrote about a college basketball projection "system" that I had started working on, which uses a team's previous year's KenPom rating along with how many minutes they had returning to project how strong they'd be in the following year.
As with any simple equation that tries to predict something extremely complicated, there were many flaws with this, probably the most obvious being that it completely ignored incoming recruits. In an effort to change that, I went through the Scout archives and added the number of four- and five-star recruits each team had each year to the data set. The results looked like this:
Definitely promising; 5* recruits are worth a little less than twice as much as four-stars, and both of those variables are statistically significant. Something bothered me about this though; if a team returned most of their minutes, and also brought in some really good recruits, the regression would consistently overrated them. The reason for this seemed simple enough: there are only so many minutes to go around, so if a team has a lot of starters returning, the freshmen are going to have less of an impact.
To fix that I created a rating from a formula that considered RetMin%: (five-star * 2 + four-star * 1) * (1- RetMin%). I then had a burst of creativity, called that rating "RECRUIT", and included it in a new regression:
I couldn't have asked for much better results than that, with "RECRUIT" fitting into the regression perfectly and even bumping the adjusted R-squared up by about .02.
I would like to continue improving on this. Two realistic additions have come to mind so far. The first is adding some combination of KenPom rating in year N-2 and how many minutes the team is returning from that year. I tried this with just the KP rating from two years prior, and it didn't improve things much, but I think it might be worthwhile if we knew how many of those players were coming back, and thus how relevant that rating is. This would also prevent Indiana from having an unreasonbly good '09-'10 projection; Eric Gordon and D.J. White are not walking through that door.
The other thing I'd like to add eventually is RetPts%, which I think would make a small but meaningful difference. Just for fun, here are the 10 best '09-'10 projections using the "RECRUIT" regression described above, only looking at conferences I've collected data for (ACC, B10, B12, SEC):
As I noted on Twitter, Tennessee really does return all but nine minutes. Hopefully our SEC preview will be up a week from Monday.





7 comments:
.996 on Kansas presented without comment? That's a pretty comical rating.
Can you add coaching X's and O's to the ranking? Bye bye Tennessee, Oklahoma...
I like the process here, but is there any evidence that KenPom ratings are more predictive that basic score differential ratings? In theory, they are more advanced and *should* be more accurate. But in practice (for in-season predictions) I have not found this to be true. I do not have a favorite rating system to promote ; but I think there are some issues with KenPom that make it not very beneficial for predictions. Would like to see some kind of comparison to other rating systems, if somehow that became feasible.
".996 on Kansas presented without comment? That's a pretty comical rating."
I mean this isn't much of a comment itself. Kansas has already been discussed in the Big 12 rating, I'm not sure what else you want me to say. They are returning basically everything to a .951 team and are adding two 5* recruits and a 4*. They are going to do very well in a simple rating system like this, they could have easily come out over 1.000. There is a reason we're manually adjusting these ourselves.
"I like the process here, but is there any evidence that KenPom ratings are more predictive that basic score differential ratings?"
I think the more relevant question would be if there's any evidence that any other ratings system is substantially better than KenPom, since the KP rating are so readily available. As far as I know the answer is "no", but I'm certainly open to arguments to the contrary.
Any plans to adjust for a basketball team that's involved in an on-going turf war with the football team?
I'm interested in how adding a coach variable would change the regression results. As a test, I went back and calculated projections for past Illinois seasons and found that Illinois beat its projection every single year from 2005-09. Based on those results, I wonder if a team beating its projection the previous year is correlated with beating the projection the next year, and if this could be used as a measure of coaching in a regression.
Post a Comment