Tuesday, October 2, 2007

Evaluating April MLB Predictions (Again)

This was a lot of fun, and there's more to look at, so I thought it deserved another post.

First, the final standings. Methodology is slightly different, as I'm using RMSE (Root Mean Squared Error) rather than just average error. This penalizes large misses more (sorry, Buster), and is more widely used.

As we'll see, Neyer comes out on to pretty much any way you do this. Notice the top five are all at least based on statistical systems. Be sure to remember that when you're watching Baseball Tonight next March. Your time is probably better spent subscribing to BP.

Just for fun, I also looked at how many of the Vegas over/unders everybody chose correctly. A lot of this is luck- if the line is 83.5, and your prediction is 84, you get the same credit if that team wins 84 or 104. (BTW, Silver is Nate Silver from BP; he's the "PECOTA guy", among other things. As I understand it, he took the PECOTA predictions and just made adjustments where he saw fit.)Again, pretty meaningless, but Caple and Law both jump way up. Gammons is last, but I wouldn't be too worried about that- I'm pretty confident he didn't place any wagers.

Now, the following is very interesting. Sky beat me to this (although I believe he did straight difference rather than RMSE), but if we really want to know how good these predictions were, we should look at Pythag record.

Basically, Pythag record is how many games a team should have won, based on how many runs they scored and allowed. It's a better at predicting future performance than actual record, and thus is a better indicator of team strength. It's great for this exercise, so lets take a look.

These numbers are noticeably lower. This makes sense- there is less variation in Pythag record than actual record (I think). I found this table to be very interesting. Sports Interaction jumps up considerably, which doesn't surprise me. Think about how much money these guys have at stake with this stuff. It's their job to post a number that will entice people to bet equally on both the over and the under. I would hope they are good at it- it's a lot different than Steve Phillips handing in a list of completely arbitrary numbers to some ESPN editor.

Beyond that, the list is once again dominated by the "numbers guys". I am somewhat surprised that Olney did so poorly.

The obvious next step is to see who got lucky, and whose picks were better than they originally appeared. Since the average Pythag miss was 1.39 smaller than the average actual miss, I have taken that into account in the final column. A negative "Adj Diff" means you got lucky.

Phillips' picks were poor to begin with, and he got unlucky on top of that (which is only fair, considering his recent good luck in other areas). Olney had the worst picks according to Pythag by a pretty wide margin, but had luck on his side, which allowed him to almost catch up to Phillips.

The numerical systems all obviously did better than others regardless of what metric I've looked at, but they were also on the lucky side. I would be interested to see if this is also true in previous years (which would indicate that it's not actually luck). The PECOTA predictions for each year since 2003 are readily available, so when I have time I figure I'll look into those.

0 comments:

hoops