Monday, September 24, 2007

Evaluating April MLB Predictions

Everybody makes baseball predictions in late March/early April. A lot of people just predict who will win each division, and who will advance to the World Series. Anyone can do this- you really only have to have a general knowledge of the top teams.

There are also people who predict how many wins each of the 30 teams will have. There are various complications with this (Jayson Stark's predictions have the average team winning 83.6 games, which is quite unlikely), but the thing about this is you actually have to know what you are doing. People make these predictions differently- some rely strictly on numbers, others on "feel".

I found 13 sets of these predictions- 10 from ESPN (Gammons, Stark, Crasnick, Olney, Neyer, Kurkjian, Phillips, Law, Caple, Karabell), two from BP (PECOTA and BP Hit List), and also the over/unders from SportsInteraction.com (via SoSH). I thought I'd take a look at some of the best and worst individual predictions, as well as whose overall predictions were most accurate.

(Note: These lists aren't just based on who was the closest- I also factored in how far off the other predictions were. So predicting at team within two games if the average prediction was eight games off would be higher than predicting a team exactly if the average was just three games off.)

The Best

1. PECOTA, Chicago White Sox
Predicted wins: 72
On pace for: 71.0

The over/under for the White Sox was 89.5, and the ESPN analysts average prediction was 84.6. Chicago won 90 games in '06 after winning 99 in 2005. Much was made of PECOTA's pessimism, but this turned out to be, pretty easily, the best prediction of the year.

2. Jayson Stark, Seattle Mariners
Predicted wins: 85
On pace for: 86.6

The average for everyone else was 76.5, and PECOTA had them winning only 73 games. They only won 78 games in '06, while finishing last in the AL West. ESPN's preview had JJ Putz under "Bust", as they were worried about his elbow pains. I feel like that turned out OK for him.

3. Steve Phillips, Minnesota Twins
Predicted wins: 78
On pace for: 78.6

Steve Phillips: Not Smart! Well, for now at least. Nobody else at ESPN had the Twins winning less than 83 games, and PECOTA pegged them at 90. On the ESPN Message Boards, Twinsdude08 remarked that, "The Twins just have too much talent to not win the division." I don't know how Phillips came to 78 wins, but, as well see later, this accuracy certainly isn't a trend.

4. Rob Neyer, Washington Nationals
Predicted wins: 69
On pace for: 71.1

People (especially Buster Olney), thought the Nationals were going to be really bad. The second most optimistic ESPN prediction was 64 wins; six had them losing over 100 games. Neyer, who always refers to his predictions as "running the numbers", was more realistic- it's hard to lose 100 games in the NL, since all the other teams are really bad too.

5. Peter Gammons, Colorado Rockies
Predicted wins: 84
On pace for: 87.2

The Rockies have far exceeded all expectations- their over/under at SportsInteraction was 74.5 wins, and nobody else had them winning even 80 games. Even Gammons didn't see this coming, but everybody else was so far off that his prediction makes the list.

The Rest (Predictor, Team, Prediction, Actual Pace)
6. Phillips, Orioles, 70, 69.4
7. Karabell, Pirates, 69, 68.9
8. Caple, Marlins, 68, 68.6
9. PECOTA, Oakland, 80, 77.3
10. Stark, Red Sox, 96, 96

Now for the fun part...

The Worst

1. Buster Olney, Washington Nationals
Predicted wins: 49
On pace for: 71.7

Pretty much everyone was a little off on the Nats, but this one stands out. Sure, things didn't look good back in March, but 113 losses? No NL team lost more than 96 games in '05 or '06- it would be quite amazing if someone was actually that bad. Olney is a smart guy, but I'm not sure where he got 49 wins from.

2. Jim Caple, Kansas City Royals
Predicted wins: 54
On pace for: 70

I don't know, maybe people just think it's funny to pick teams to be amusingly bad. I kind of see Caple's reasoning here, as he predicted the other four AL Central teams to average 89 wins. But seriously, how did he see this playing out? Did he figure they would all go like 16-3 against the Royals? Thats the only way they could average 89 wins, since they have to play each other so many times.

3. Steve Phillips, Boston Red Sox
Predicted wins: 82
On pace for: 96

This only came out third in my little formula, but that may be generous. Boston was a mess in '06, and they still managed 86 wins. Nobody else had the Red Sox winning less than 90 games. Between this and repeatedly predicting the Yankees to miss the playoffs in August, I feel like Phillips just makes predictions for the shock value of them.

4. Keith Law, Seattle Mariners
Predicted wins: 65
On pace for: 86.6

Law and Stark didn't quite see eye to eye on this one, as their predictions were 20 wins apart, the highest such margin. Seattle has surprised people, but their over/under was 79.5 wins; there really wasn't any reason to think they would approach 100 losses.

The Rest (Predictor, Team, Prediction, Actual Pace)

5. Philips, Diamondbacks, 78, 90.8
6. Phillips, White Sox, 92, 71
7. PECOTA, Devil Rays, 78, 66.4
8. Karabell, Cubs, 75, 86
9. Stark, Reds, 85, 74.2
10. Karabell, Astros, 88, 70.5

Now, let's look at whose overall predictions were the most accurate. The table on the right is ranked by how close people were, on average of all 30 predictions.

The top three are all predictions based on numbers. PECOTA is 100% quantitative, and both Neyer and the Hit List rely heavily on numerical predictions.

Those are the only three that did better than Vegas. Neyer did really well- his picks are 19-10-1 against the over/unders so far. Even more impressive, of his seven predictions that had large discrepancies with Sports Interaction, he was right on six of them.

On the other end of the spectrum is, not surprisingly, Mr. Phillips. If you watch Baseball Tonight and SportsCenter (or are a Mets fan...) this probably doesn't come as much of a surprise. Luckily, Steve Phillips isn't paid a lot of money to analyze baseball for a living- if he was, his incompetence would be pretty embarassing.

Pictures: Pecota, Phililps, Olney, Law.

15 comments:

Eugene said...

Great blog entry.

Have you considered looking at BP's Predictatron and taking the average to get a "wisdom of the crowds"? Or comparing each of these pundits to Predictatron scoring?

Vegas Watch said...

I had actually forgotten about BP's Predictatron contest- that probably would have been a good thing to incorporate.

As for the "wisdom of crowds", I did use an average of the 13 predictions that I looked at to see how close the general consensus was, which essentially accomplishes the same thing.

Sky said...

Absolutely fantastic article -- especially the last bit that looks at the overall accuracy of projections. Nice to actually hold people accountable. I can't wait until the BPro "I told you so" article about the White Sox.

I assume you used absolute error for the list... Do things change at all using an error calculation penalizes large misses more?

I'd love to have access to your raw data if you don't mind. Thanks.

Anonymous said...

Diamond-Mind's projections: http://www.diamond-mind.com/articles/proj2007.htm

Vegas Watch said...

Thanks, Anon. Diamond Mind comes in with an average error of 5.48, which would put them fourth.

Sky- if you want to send me an e-mail I'll send over the spreadsheet.

Anonymous said...

How do you calculate the overall score?

These are my pre-season predictions: Neyer can verify them. I am curious as to how I rank.

NYY 93-69
BOS 90-72
TOR 84-78
BAL 78-84
TB 74-88
CLE 89-73
MIN 84-78
DET 79-83
CWS 77-85
KC 71-91
OAK 86-76
ANA 84-78
TEX 80-82
SEA 77-85
ATL 85-77
PHI 85-77
NYM 84-78
FLO 79-83
WAS 69-93
STL 85-77
MIL 84-78
CHC 81-81
HOU 78-84
PIT 76-86
CIN 71-91
SD 89-73
ARI 84-78
LA 82-80
COL 77-85
SF 73-89

MGL

Vegas Watch said...

I just took how far off they were on average.

For the record, MGL came in at 5.4, putting him in fourth.

Anonymous said...

Actually, I mis-calculated. I think it is 5.3 or a solid tie for 3rd. I feel a lot better now.

I am writing an article, possibly for THT (Hardball Times) looking at how each team "should have done" had each forecaster known everyone's playing time. I am using my own Superlwts and pitcher projections to estimate each team's average "talent" (wp) per game and then "playing out a season" 100,000 times or so.

These estimates of w/l are obviously more indicative of how "good" a forecaster is, unless you want to say that estimate of playing time, roles, injuries, etc. should be part of a forecaster's skill set (which is a legitimate argument I suppose).

That argument aside, obviously you cannot hold against any forecaster the fact that, for example, Harden hardley pitches at all, or that the Cardinals lose Carpenter for the season.

Here are the preliminary numbers for how many games each team "should have won" given my own pre-season projections for each player, and given their actual playing time. I don't account for the fact that so-and-so may have played with an injury for all or part of a season, unless you want to say that that is somehow included in the projection in the first place, which it may or may not have been. For batters, I uses hitting, fielding (UZR) and baserunning projections. For pitchers I tried to use their roles (in a crude way) to estimate the leverage they pitched in. For example, for most closers I "doubled" their IP since they tend to pitch in leverages that are around 2.0.

Anyway...

ARI 81
ATL 87
CHN 80
CIN 71
COL 81
FLO 78
HOU 78
LAN 82
MIL 85
NYN 83
PHI 82
PIT 78
SDN 89
SLN 82
SFN 77
WAS 72
ALA 83
BAL 75
BOS 89
CHA 76
CLE 91
DET 80
KCA 73
MIN 83
NYA 95
OAK 81
SEA 81
TBA 81
TEX 76
TOR 82

If you used those projections, you would be off by an average of 4.7, which would put you firmly in first place in the prediction contest herein. Of course that would be major cheating.

One useful thing about these kind of numbers is that if you are going to even start talking about the job that a manager did with a team, at the very least you have to start with these numbers. IOW< these are the wins that each team would have with a monkey/computer (or at least an average manager) at the helm. I am not trying to say that any differences are due to the manager, but certainly if you want to even try and evaluate a manager based on their team's w/l record as compared to some kind of expectation with a "neutral" manager at the helm, you would have to use something like these numbers as a starting point.

For example, I heard on the radio the other day some commentator say something like, "You have to give the Nats manager a lot of credit as most people predicted the Nats would win nor more than 60 games." Of couse, what idiot predicted them to win only 60 games? That same idiot probably SHOULD give him credit for their 72 or so wins. However, as you can see from above, a more accurate prediction, given the players' playing time and roles, was actually 72 wins, exactly equal to their actual expected wins! If you are forced to give credit/discredit to a manager, it would have to be those in charge of the Mets, Angels, Marlins, and Tigers, for example.

MGL

Vegas Watch said...

Great stuff, MGL. After seeing the reaction to this post, I was going to write another one next week looking at both actual records, and Pythag/3rd Order records. That would also give us a better idea of how good these predictions actually were, as it accounts for teams that got lucky/unlucky and don't have final records indicative of their actual levels.

Anonymous said...

Another interesting question, to put the "accuracy scores" of the various forecasters in perspective is:

If you knew the exact true wp of each team for every game (for example, when Boston plays TB, Boston is a .553 team and TB is a .464 team), on the average, at the end of the season, how far off would you still be on your w/l predictions, solely based on random fluctuation?

The answer is:

Around 5 games. IOW, that is the average "accuracy score" for the perfect forecaster. So scores in the low to mid 5's are not that much worse than perfect. Then again, as you can see from the scores from guys like Phillips, even a rudimentary, but poor, idea of the relative strengths of the teams going into the season will give you a "score" that is not that much worse than the perfect forecaster.

Another interesting thing to calculate would be how often a bad forecaster would beat a good one? IOW, what is the standard error of these forecasts? I don't have the time to do that now.

MGL

Anonymous said...

Here is one more interesting post:

These are 4 columns of wins for each team. One is my pre-season win totals. Two is my pre-season win totals by prorating each player projection by their actual playing time (and pitcher roles). Three is converting actual player performance and playing time into a win total (like a pythag record, but a 3rd order, or maybe 2nd order, record). Fourth is actual record. You can infer whatever you want about the players, team, manager, etc., from these.

NYY 93 94 96 93
BOS 90 90 94 96
TOR 84 82 88 83
BAL 78 75 78 69
TB 74 81 68 67

CLE 89 91 90 97
MIN 84 83 80 79
DET 79 80 87 88
CWS 77 75 72 70
KC 71 73 70 70

OAK 86 81 90 77
ANA 84 83 83 94
TEX 80 76 75 76
SEA 77 81 76 86

ATL 85 87 87 86
PHI 85 83 85 88
NYM 84 84 95 90
FLO 79 78 74 69
WAS 69 72 64 73

STL 85 81 75 75
MIL 84 85 88 84
CHC 81 80 91 86
HOU 78 78 70 71
PIT 76 78 73 69
CIN 71 70 65 73

SD 89 89 93 89
ARI 84 81 80 91
LA 82 82 80 83
COL 77 80 91 88
SF 73 77 75 72

Interestingly, the only teams that benefited a lot (more than 4 exected wins) or were hurt a lot by either injuries or changes in roles (I used the BP depth charts and playing time estimates before the season started) were OAK and TB, respectively.

Teams that were helped or hurt 4 wins were TEX and STL hurt and SEA and SF helped. The rest were withing 3 wins of their pre-season projection based on pre-season estimates of personnel, playing time, and roles, and actual personnel, playing time, roles, etc. That does not mean that most of the teams had the the same or almost the same personnel, playing time, etc. as BP predicted.

The difference between the second and third columns is essentially the difference between how the players on the team were expected to perform based on their pre-season projections and how they actually performed. Maybe, just maybe, you can give the manager and coaches some credit if there is a substantial difference, good or bad. Then again, the projections are essentially based on the last 4 years' performance, so maybe you only want to give credit (or demerits) to the first or second year managers and coaches. Who knows.

Teams with the greatest differences between how they were expected to perform and how they did are:

A lot better than expected:

TOR, DET, OAK, METS, CUBS, COL

A lot worse than expected:

TB, STL, HOU, CIN

Finally, you can infer anything you want from the difference between the 3rd and 4th columns, which is essentially the difference between their expected number of wins based on their overall offensive and defensive performance and their actual record. Obviously a lot of the difference must be attributed to luck (there is around a 6.5 game random SD in 162 games, right?). Maybe some of it can be attributed to other things that the players and/or manager and coaches have control over. Again, who knows.

Teams that have won a lot fewer games than they "should have" given their underlying performance, not in runs, but in actual component stats, like lwts for the batters, and component ERA for the pitchers:

BAL, OAK (they take the cake), FLO, and to some extent Mets and Cubs.

Teams that have overperformed are:

CLE, ANA, SEA, WAS, and CIN.

Finally, the only two teams who both performed better than expected based on their players' projections AND won more games than expected based on their underlying performance were Boston, Detroit, and Philly.

The teams that performed worse than expected AND won fewer games than expected from theor performance were:

TB, MIN, CWS, FLO, PIT, and SF.

If I absolutely had to give out best and worst manager awards based on these numbers (and I believe that evaluating managers is an impossible task and MOY awards are a bigger joke than GG awards), I would give the best to Leyland (DET) and the worst (by far) to Maddon (TB).

Sky said...

Here are three other "monkey predictions" to use as a baseline:

- last year's record
- regress last year's record some appropriate amount
- predict everyone at 81 wins

Vegas Watch said...

Good point, Sky- I was going to do that but never got around to it.

The average misses:

81 wins: 8.4
2006 record: 8.14
2006 Pythag: 6.91
Regressing 2006 Pythag: 6.76 (slightly better than Phillips)

Sky said...

Diamond-Mind comes in at 5.3

Anonymous said...

Another interesting tidbit:

If the top 4 forecasters other than Sports InterAction (the "Vegas Line") had to bet into the Vegas line, the results would be:

Neyer: 20-10
MGL 18-12
BP 17-13
Crasnick 17-13

For a total of 72-48, or a 60% win percentage. Even with a 30 cent line (you lose $1.15 or win $1.00), if you bet $100 per team, you would show a profit of $420, or around a 14% return on your money!

hoops