I'd like to know if the effect is magnified in the first four games or something, but that's going to take some more data collection.
Well, I got around to doing the data collection. I split the season into quarters, and did four separate regressions. In each one, the independent variable was "Differential", with the dependent variable being the team's ATS records in Q1 for one, Q2 for another, etc., etc. Here are the results:
I would like to make it clear that I didn't make this data up. Because if I had, it probably would have looked very similar. As expected, the effect is largest in the first four weeks; in fact, that's the only time it's statistically significant. It's possible, if not likely, that there's also an effect in games 5-16, but we'd need more data to show that with any confidence.
Just for fun, here are the five biggest discrepancies between opponents in September matchups. As before, a positive number means that team is overrated by the public, and negative means they're underrated.
The single most important aspect of sports wagering is gauging public perception, and comparing it to reality. If you find a team that's widely overrated, there's a good chance that there will be some value in betting against them. The easy part of this is figuring out the public perception; just watch ESPN, or listen to your buddies at work talk about their picks to win it all.
The more difficult part is figuring out the real truth. For MLB, we've found that Baseball Prospectus' PECOTA is the best measure of this; in the NFL, Pro Football Prospectus is very accurate. So, with the NFL season quickly approaching, I've been thinking about quantifying both "public perception" and "reality", and seeing how that plays out on the field.
For "reality", I took the PFP win projections from the last two years, and turned those into rankings for each of the 32 teams. (This isn't perfect, since SOS is also a factor in their predicted records, but it'll have to do for now). For "public perception", I went to the ESPN.com preseason power rankings (2006, 2007). Subtracting a team's ESPN rank from their PFP rank gives us the metric we'll use; if a team is overrated by the public, their differential will be positive. I then used that as the independent variable in a regression, with the dependent variable being each team's win % against the spread for the whole season. Here's what it gives us:
You couldn't ask for much better results than these. The P-value is 0.0003, so our "Differential" variable is clearly statistically significant. And the slope is in the direction we'd expect; a positive differential means a team is overrated, so the negative slope tells us that those teams have generally performed poorly against the spread. Each rank difference is worth .004 points on win %; if a team is rated #5 by ESPN, and #15 by PFP, their difference is 10, so we'd expect them to have an ATS W% of .459.
Obviously, one shouldn't blindly bet based on these numbers. But the differentials should give us an idea of which teams have value, especially at the beginning of the season. (I'd like to know if the effect is magnified in the first four games or something, but that's going to take some more data collection.) Here are the four expected ATS win percentages at each extreme for 2008:
Again, this is only a starting point. But if the Browns look like a sure thing in Week 1, you should probably think twice.
Okay, maybe that was a little too much information all at once. The last two numbers are the important part--the rotation has been excellent, much better than expected.
Dave Cameron wrote about Lee over at FanGraphs, explaining how unlucky he's been since the end of April. His May BABIP was .346, and as of last week it was .437 (!) in June. As you can see if you wade through the numbers above, his FIP (2.47) is actually right with his ERA (2.45).
Sabathia's FIP has also been very impressive; that isn't weighed down (or up, I guess) as much by those first four starts as his ERA is. He has been ridiculously good in his last 12 starts--88.1 IP, 6 HR, 16 BB, 93 K, 2.14 ERA.
So, they have two good starters, at least for now. Sadly, you need five. It's hard to have such a high ERA/FIP with a 3:1 K:BB ratio, but Byrd manages by never striking anyone out, and giving up a lot of homers. Laffey isn't nearly as good as his current 2.98 ERA, but he's serviceable. Sowers was very effective in AAA, but has gotten lit up on the majors in five starts.
Rest of way: Sabathia (125 IP), 3.40 ERA Lee (95 IP), 3.90 ERA (?) Byrd (85 IP), 5.20 ERA Laffey (85 IP), 4.45 ERA Sowers (70 IP), 5.35 ERA Carmona (40 IP), 4.20 ERA
Rest of way: 4.32 ERA (4.73 RA)
Bullpen Overall PECOTA: 4.10 ERA Actual: 4.87 ERA, 4.59 FIP
I am not going to go through every reliever, since the makeup of the bullpen is essentially the same as three months ago, but obviously this has been an area of weakness, with a -3.64 WPA. Their FIP suggest that they haven't been as bad as they've looked, but they're clearly not going back to last year's numbers (3.75 ERA).
Rest of way: 4.35 ERA (4.76 RA)
That puts the pitching staff's RA at 4.74. In making the conversion from ERA to RA, I've tried to make an adjustment for what looks like a slightly below average defense, multiplying ERA by 1.095 rather than the AL average of 1.077.
Combining this with yesterday's post, we have the Indians scoring 4.76 R/G while allowing 4.74. In terms of Pythag, that's a .502 winning percentage, which is actually quite close to the .497 that the BP PECOTA Postseason Odds is using. In light of that, the Indians' chances of reaching the playoffs are probably around 8 or 9 percent.
WSAS asked me last night if I thought they should trade Sabathia. As of right now, I think the answer is no, you hold on to him, since they're still in the race. A month from now I'm guessing my answer will be different, assuming they've either failed to make up ground on the three teams ahead of them, or fallen even further back. Even then, though, it's going to depend on who they can get back in a trade, weighing that against the picks they'd receive if they lost Sabathia to free agency, and determining if that difference is worth giving up last year's Cy Young winner.
It's just too complicated of a situation to blindly jump to either "trade him" or "keep him". If, right now, Colletti offered Billingsley and Kershaw (he might!), they'd have to do it. If the Indians have a 1% chance of reaching the playoffs on July 31, and the best offer is some B- prospect, they shouldn't. The end of this whole saga will almost certainly come between those two extremes, but without knowing exactly where it will fall, it's silly to jump onto one side or the other at this point.
The Marlins have gotten off to a very nice start, which has bumped them almost seven games above their PECOTA projection. This will not continue. They've only outscored their opponents by 13 runs, and even that is a product of their schedule. By the way, how about Scott Olsen's season so far? 60.2 innings, 26 walks, 27 strikeouts. .213 BABIP, 81.5% LOB%, 6.4% HR/FB ratio. 2.82 ERA. I wonder what happens next?
There's not much of interest to report here- the three contenders are within a game of each other. The Braves have had a great run differential thus far (+39), but that hasn't yet translated into wins. The Mets have been mediocre (although they've played a significantly harder schedule than Atlanta), but their lofty PECOTA allows them to still be the favorite.
Quite a gap here. That'll happen when you only have one real challenger coming into the season, and they dig themselves a seven game hole over the first seven weeks.
This isn't to say that the Cubs will run away with the division. It's very likely that at least one of the four teams below them will exceed their expected performance, and at least make things interesting. But they're certainly in the driver's seat at this point.
As an aside, the Reds have won six in a row to get within two games of .500, but are still 25:1 to win the division at BetUS.
The bottom three in this division are really awful. The Rockies, Giants, and Padres have been outscored by 39, 59, and 61 runs, respectively. The Padres are expected to finish 23 games out- that's the second largest margin in both their division, and all of baseball. So it's a two team race.
Justin Upton currently has a 138 OPS+. Hopefully he can keep that up- in the history of baseball, 12 guys have had an OPS+ of 135 or greater in at least 450 PAs at the age of 20 or younger. Hopefully he ends up being better than Dick Hoblitzel.
Here is the 40-game update of the the PECOTA/Pythag combination I wrote about a few weeks ago. I'm now using 84.5% PECOTA, and 15.5% Pythagorean record to determine each team's winning percentage the rest of the way. The NL standings will be up sometime in the near future.
Unless you are really high on the Rays, the Red Sox are probably in better shape than this, which certainly paints an optimistic picture for the Yankees. PECOTA was obviously a big fan, but it seems silly to expect them to be better than Boston from here on out. One could blame the mediocre start on injuries, but is that really reasonable? With the exception of Generation TreyUno, this is a very old team, and old people get hurt. That's just how the world works.
The Rays' odds to win it all at Sportsbook have finally been changed. Did "The Chosen Rob" cause this? We may never know. I would like to add that I saw most of the game today, and watching Kazmir pitch is an extremely painful experience. It is 2-2 or 3-2 on every single hitter.
Things are looking good for the Indians, but probably not this good. They have managed to outscore their opponents by 31 runs so far, but their third-order record is below .500. Their batting line (.235/.316/.359) is actually worse than their pitching line (.253/.315/.377). Things change with 2 outs and RISP however, when their offense has a .829 OPS, and their pitching staff has allowed a .509 OPS.
Their third-order record is almost identical to Detroit's, so based on that and PECOTA we should expect the teams to perform about equally from here on out. Because of their current six-game lead, the Indians still have the edge, but it's likely less dramatic than shown above.
I can't bring myself to take any of the other teams seriously, at least not yet. The Twins' third-order W% is .438. By almost any metric, the White Sox are overwhelmingly mediocre. And the Royals are still the Royals. Soria is awesome though, I'll give them that much.
So, you're saying that the Mariners aren't going to win 92 games? Really?
Well the Oakland bandwagon filled up quickly, didn't it. The best you can do on them to win the division at this point is +220. Their current record isn't an empty one either- they have the best run differential in the league. How long can you keep scoring a sufficient number of runs with a .358 SLG though? Probably about as long as you keep up a .408 OBP with RISP, I guess.
84 wins is probably about right for the A's, but I'd guess the Angels are better than a .521 squad, especially with Lackey returning.
"What does any of it mean? Is Lee a Cy Young candidate now? Let's assume 1) he's reasonably healthy for the rest of the season, and 2) beginning today he merely hits his career numbers: six innings per start, 4.37 ERA. If those things happen, Lee finishes the season with a 3.26 ERA, which last year would have been sixth-best in the league. Cy Young-worthy? C.C. Sabathia won the award last year with a 3.21 ERA; Josh Beckett finished second with a 3.27 ERA.
So is Lee a Cy Young candidate right now? Yeah, I think he is. Based purely on what he's done throughout his career, and not just this spring."
Before I get into this I want to quickly talk about this whole, "Yeah, but he hasn't faced any real offenses" criticism. It's true that Lee hasn't faced particularly stellar lineups: Oakland (twice), Minnesota, Kansas City, Seattle, New York, and Toronto. But how much of a difference has that made?
By AEQR, here is the strength of each offense, in R/G, that Lee has faced, after taking out their performance against him.
He faced the Yankees without A-Rod or Posada, and Toronto without Wells*. So knock them down to 4.60 and 4.50, respectively. Weighting the A's twice, that averages out to 4.32. The AL average in non-Cliff Lee starts is 4.58. So, yeah, he's faced bad offenses- 0.26 runs below average. So that bumps his ERA up to 0.93, his FIP up to 2.11, and his QERA up to 3.03. Somehow, I think he'll be okay.
So, is Cliff Lee a Cy Young candidate? To get an idea of where he stands, I used each player's PECOTA to finish out the season, assuming each pitcher would make 26 more starts. I added that to their current stats, and plugged it into the Cy Young Predictor formula (which takes into account wins, losses, IP, ER, Ks, and shutouts). I looked at both perennial contenders, and guys off to quick starts this year. Halladay and Saunders were also included, but didn't make the cut for the table below. Remember, this is being outrageously pessimistic about Lee going forward- PECOTA had his ERA at 4.95.
(Quick note: I understand that wins are not a good measure of pitcher value. I get it. But the question is whether Cliff Lee is a Cy Young candidate, rather than whether we should expect Cliff Lee to be the most valuable pitcher in the league this year. And, in the coming months, sports books will have odds on the former criteria, and not the latter.)
Here are the predicted standings:
Matsuzaka is the only guy who combines a tremendous start in the Cy Young categories (6-0, 2.45 ERA, 40 Ks) with an optimistic PECOTA (4.00 ERA). Whether he can keep this up while walking six guys per game is another story entirely.
The only guy that seems out of place here is Wang. He is 6-1 with a 3.12 ERA, and has a career ERA of 3.69. But PECOTA was down on him, probably because of the low K rate, expecting Wang to have a 4.40 ERA. So he should probably be higher on the list.
That's not the focus here though. The point is that, as Neyer hypothesizes, Lee is still a Cy Young candidate even if he goes back to his mediocre form of years past. Starting off 6-0 with a 0.67 ERA will have that effect. He's not at the top of the list, but he's in the conversation- he finishes 15-10, with a 3.74 ERA, and 158 Ks in 190 IP. Not your typical Cy Young numbers, but remember these are all averages, so nothing is going to jump off the page.
But what about if we're a little more optimistic about Lee's final 26 starts? How about a 4.00 ERA, 6.5 K/9, and wins in 38% of his starts.
That puts Lee's Cy Young Predictor score at 140.8, far ahead of his competitors- 16-7, 3.19 ERA. And, considering his FIP and QERA after 53.2 IP, I don't think a 4.00 ERA is an unreasonable expectation. In conclusion, not only is Cliff Lee a Cy Young candidate, he may even be the favorite at this point.
*I noticed, while looking through his game logs, that Lee has pitched the second game of a doubleheader twice this season. In those two starts, against Kansas City and Toronto, he's gone 18 innings, and hasn't allowed a run while striking out 14 and walking 2. I wonder if scoring is lower the second game of a doubleheader than on average. My guess is that it probably is- you have some guys sitting out, and others have already played a game earlier in the day. Wouldn't really make a difference, but interesting nonetheless.
Did you really think Tampa's six game winning streak would go unmentioned on this site?
At this point, every piece of purely objective analysis indicates that they will finish over .500. PECOTA pegged them at 88 wins in the preseason. They're currently 14-11, which works out to 90.7 wins over a full season. Their record in the adjusted standings is even a bit better than their actual record- based solely on their play so far, they'd be expected to win 87.6 games.
And yet...being "purely objective" is hard. They have never won over 70 games. They allowed 944 runs last year, thanks to a laughably bad defense. Their team payroll is $44MM- the entire pitching staff is making about as much as Barry Zito. None if this is particularly relevant, but it's tough to ignore.
If we were able to ignore the second paragraph, and just went by the information in the first one, would would Tampa's odds for winning the AL East be? Definitely better than the Blue Jays, right? Well, at both BetUS and Bodog, Toronto is 5:1 to win the division. Tampa is 25:1 at BetUS. For comparison, the Royals are 22:1 to win the Central, and the Marlins are 15:1 to win their division.
25:1 is too high. Those are very good odds. Should they be +233, as PECOTA suggests? No, that'd be ridiculous. But there's a whole lot of room between +233 and +2500.
Matchbook is a very good place to look when considering things like this, since you can bet either side- you can bet that the Rays don't win the division. Currently, that prop is being offered at -1500. This presents an arbitrage situation- one can bet on the Rays at +2500 at BetUS, and against them at -1500 on Matchbook, and lock in about 2.5% profit.
But I'm more interested in what the market thinks the Rays' chances are. That -1500 has been available all afternoon, and so has Tampa winning the division at +860. That nobody has jumped at either tells us that their true odds are between 6.3% and 10.4%. Let's be conservative and say 7%. This would put their true odds at 13:1. A far cry from PECOTA's +233, but not close to 25:1 either.
As you probably noticed when I mentioned the odds for the Royals and Marlins to win their division, it's rare that you can find a decent longshot in the "To Win Division" odds at sites like BetUS and Bodog. The 2008 Rays are a pretty rare team, though. Even so, I doubt they will be 25:1 for now- people are starting to pay attention now that they're winning baseball games on the field, rather than just looking good according to some crazycomputer.
I like to think that I understand the concept of "small sample size" fairly well. I'm aware that we're only about 1/8 of the way into the season, and it's too early to get caught up in the order of the current standings.
I try to put things in perspective by looking at BP's PECOTA Playoff Odds report. See, that's better. The Orioles only have a 1.25% chance of reaching the playoffs. The world makes more sense now.
But I find even this analysis lacking at this point in the season. If you look at the "Pct3" column, you will notice that each team's win percentage is the same as it was in the preseason. Now, I don't think we should expect the Tigers to win 31.6% of their games the rest of the way, but 56.2% seems a little high, doesn't it? It seems like that should be some kind of happy medium.
So, I decided to try to determine what this "happy medium" is. I went and took the PECOTA projections from the past five seasons, along with each team's Pythagorean record through their first 20 games. I used these two as the independent variables in a regression, with the dependent variable being each team's winning percentage in games 21-162. I did this again at the 40-, 60-, 80-, 100-, and 120-game marks. (Note: This took forever.) Here is what I came up with:
This is interesting- notice except for the 80-game mark (which is just weird), the PYTHAG variable slowly rises as more games are played. This is what we would expect. Looking at the P-value, it doesn't become clearly significant until after 100 games, but I think it'd be hard to argue that it's not significant before that. I'm pretty confident that if you did this for the last 10 or 15 seasons, rather than just the last 5, we would be sure that they are significant.
This data is probably better seen in a graph. Here is the weight we should give to PECOTA, versus the weight we should give to PYTHAG, at each point:
The weirdness at the 80-game mark continues to be an annoyance, but I think this gets the general point across. Right now, it's about a 90/10 PECOTA/PYTHAG spit. The PYTHAG portion increases by about 5.5% after each 20-game stretch, until we're at a 63-47 split in mid-August. It's hard to look any further than that, because you start trying to predict a really small sample, but in October it looks like it ends up at almost an even split, if this trend continues linearly.
Using the 90-10 split, we can create what I think are pretty accurate projected standings right now. These take into account the team's record so far; the W% column is their expected winning percentage the rest of the way. The next column is how many game PECOTA predicted them to win prior to the season, and then finally the difference between their current win prediction and PECOTA's original one.
In the East, PECOTA still expects the Yankees to be the superior team the rest of the way, but Boston's current 3.5 game lead means both teams have an equal chance of winning the division. The Rays have outscored opponents by 6 runs, so their expected W% hasn't been significantly decreased, but their 8-11 start has dropped their expected W total by 2.5.
The West has gone essentially according to plan- the part I'm interested in here is the Central. BP's PECOTA Playoff Odds Report paints an optimistic picture for the two current cellar dwellers, expecting them to win the division 80% of the time. That changes significantly when we look at it this way- the White Sox are 11-7, and their Pythagorean record is even better, at 12-6. This has caused their expected W% to rise from .475 to .495, a very significant boost. I am not smart enough to run a Monte Carlo simulation, but I'd guess that the division champ breakdown for CLE/DET/CHW would be around 40/30/25, with about 5% left over for the Royals and Twins.
Now for the NL:
The Braves and Phillies started the year each expected to win 86 games, and Atlanta is currently only one game ahead of Philadelphia in the standings. But the Braves have outscored their opponents by an impressive 35 runs, while the Phillies are only +2. This causes Atlanta's predicted record to be about 3.5 games better.
Not much has changed in the Central, except for the fact that the Cardinals' hot start has allowed them to vault ahead of the Reds. Impressively, the Pirates being even worse than expected- they've been outscored by 42 runs already (allowing 6.7 R/G will have this effect).
In the West, the Diamondbacks have gone from a dead heat with the Dodgers in the preseason to having an expected 8 game cushion. Arizona has been the best team in the majors to date, outscoring their opponents 116-65. Meanwhile, the Dodgers have been somewhat unlucky, with their Pythag record (11-8) being three games better than their actual record (8-11).
Comparing preseason predictions from various sources is a lot of fun, and also a pretty good way to get a feel for what's expected from teams from various sections of the media. What I've done here is take predictions from a few different places- 5 ESPN analysts in their season preview, three Yahoo! guys, the SI staff, Joe Sheehan (AL, NL), and PECOTA- and find the biggest discrepancies bewteen win totals for each team. The first largest differences are below, followed by a discussion of why there's such a lack of consensus, and who looks to be correct.
By coincidence, this ended up being all AL teams, which is fine by me.
That being said, 75 is really low. The PECOTA projection does come with something of a disclaimer, since it has Ichiro hitting .303/.346/.384. In 4774 career ABs, Suzuki has hit .333/.379/.437; this is his age 34 season, but thats a huge drop, and Ichiro has outperformed his PECOTA pretty much every year.
Still, even if we bump PECOTA's projection up to 77, that's a 15 win difference. And this is far from an isolated incident. The four ESPN guys (Stark, Kurkjian, Olney, and Phillips) and the three Yahoo! guys (Henson, Brown, and Passan) have the winning an average of 90 games. Sheehan, Law, and various computer projections predict an average of 79 victories.
They won 88 games last year, while being outscored by 19 runs. The high predictions employ the "88 wins + Bedard" logic. The others are starting with a baseline of 79, and giving them a boost for Bedard but factoring in some regression for their aging lineup. I don't think it's particularly hard to figure out who to side with here.
Tampa Bay Rays Average: 77.3 High: PECOTA, 88 Low: Steve Henson (Yahoo!), 72
The thing you have to love about PECOTA is that it's 100% unbiased. When it runs the numbers and comes up with 88 wins for a team that's never won 70, it doesn't adjust that to something that seems a little more reasonable. This paid off with the White Sox prediction last year; considering its history of success (not limited to that one example, obviously), the extreme predictions for Seattle and Tampa are hard to ignore.
I don't really know who this Steve Henson fellow is, but that's okay- he's got some wacky predictions, which are always appreciated. Here is his analysis on the Rays:
"The Rays are improving but are still middle-school level to the Red Sox graduate students."
This is a little over the top, but I think that's the mainstream consensus. Personally, I have no idea how many games this team is going to win (although I'd certainly take the over on 72). There's no big Pythag gap here- last year their expected record was 67-95, and their actual record was 66-96. Three things are causing the huge expected jump- a vastly improved defense, additions to the bullpen, and the development of young players. They were a horrible fielding team last year, but PECOTA expects them to be a little above average this season. The biggest upgrade is going from Brendan Harris (-19 in Dewan's system) embarrassing himself at short to Jason Barlett's +18 glove. The also have Upton finally spending a full year in center, and the (eventual) addition of Longoria to the lineup will allow Iwamura to slide over to second.
Combine that with the addition of Matt Garza, and the progress of Kazmir, Shields, Sonnanstine and Co., and it's easy to see that their run prevention will be much improved. PECOTA has a team that allowed 944 runs last year decreasing that by a whopping 226 runs. Without looking it up, I'm going to go ahead and assume that that'd be the largest reduction in the history of baseball; that's about three months worth of runs for the Giants' offense.
Henson's prediction of 72 wins for the Rays is insanely low; 88 is high, but not that high. It's hard to both see and quantify these internal improvements- switching up defensive alignments, young players improving, old ones regressing- which is why PECOTA is so far off from the general consensus.
Texas Rangers Average: 73.0 High: Joe Sheehan (Baseball Prospectus), 80 Low: Steve Phillips, 64 This is not a fair fight. I watched Phillips' "analysis" of the Rangers on their ESPN season preview page, and I must say, he didn't really enlighten me. He doesn't think Millwood and Padilla are top of the rotation starters, which is reasonable. He goes on to explain that Texas is going to have to outslug their opponents. I don't know how he came to that 64 number (he probably doesn't either), but we should remember that they do get to play almost 60 games against that increasingly horrific division.
Sheehan is bullish on their offense; he has Texas scoring 840 runs, which is 60 higher than PECOTA. He seems to be high on Blalock who absolutely tore it up (.313/.405/.656) after returning after missing three months last year. Because of his disappointing '05 and '06 campaigns, PECOTA is very down on Blalock with a projected .263/.331/.436 line, so that's probably causing a decent amount of a difference. Because of how unique he is, Josh Hamilton is obviously a hard guy to find comparisons for; PECOTA has him going from .292/.368/.554 last season to .283/.349/.481 this year. This makes some sense, since last year was in the easier league and a better hitters park, but it still seems low. In writing this paragraph, I have convinced myself that the Rangers are going to score a whole lot of runs this season, and certainly win a lot more than 64 games. Baltimore Orioles Average: 63.6 High: Steve Henson, 70 Low: Buster Olney (ESPN), 56 Olney does love the extreme predictions- 49 wins for the Nationals last year is one I'll never forget. This one is much more sane though. They have a decent outfield, but they forgot about the whole "shortstop" thing, and that is a truly awful rotation in an impossible division. Our new friend Henson thinks they will win just two less games than the Rays; now that is a bet I'd like to make.
Toronto Blue Jays Average: 86.2 High: Joe Sheehan, 91 Low: PECOTA, 78
This is very interesting- a third huge discrepancy in the East, but this time between two "people" that look at things similarly. These are the only two sets of projections that also offer RS/RA, which is helpful. PECOTA has Toronto at 762/775, while Sheehan predicts 761/676. So it's pretty clear where the disagreement is here.
This may be partially caused by different opinions on their defense- they are good, it's just a question of how good. But I think it's mostly their top 3 starters. Burnett can opt out of his deal at the end of the year (thanks, Keith). PECOTA has him throwing 185 innings with a 3.83 ERA; it's worth noting that in his last contract year he threw 209 innings with a 3.44 ERA in 2005, his last contract year. That's certainly too optimistic of an expectation, but it's been shown that players perform better in contract years, and I don't believe PECOTA takes that into account. So that's something to keep in mind. Staying healthy is the first step, obviously.
PECOTA has Halladay at a 4.06 ERA, which is certainly conservative, as his career ERA is 3.63- I'm assuming that's caused by his relatively weak peripherals.
Finally, PECOTA is very low in McGowan, with a 4.60 ERA. Obviously, it hasn't been reading The Baseball Analysts. Beyond that intriguing article, I've read a few other things on McGowan. I think he's expected to improve on last year's 4.08 ERA, and certainly beat his PECOTA projection. So yeah, it looks like Toronto will have some excellent run prevention this year, as one can reasonably expect their top three starters to be significantly better than what PECOTA suggests.
‘’Then if you’re a nice guy, they are going to treat you the same way. [Expletive] it, be an ####### then. I would rather be an ####### winning than be a nice guy [expletive] losing. Give me an ####### who can win, don’t give me a nice guy who can [expletive] lose.’"
ESPN jumped the gun on the UAB-Memphis game. How does that happen? Is it really so difficult to wait two minutes?
The 2007 Seattle Mariners won 88 games, finishing second in the AL West. In the offseason they added Erik Bedard and Carlos Silva to a rotation that saw Jeff Weaver, Horacio Ramirez, Cha Seung Baek, and Ryan Feierabend combine to make 68 starts last year while compiling a 6.49 ERA.
They did lose a few guys from last year's team, including Jose Guillen (signed with Royals), Ben Broussard (traded to Rangers), Jeff Weaver (unsigned), and George Sherill (Bedard trade). But the additions to the rotation clearlymake up for these losses. In other words, Steve Phillips will likely predict that the Mariners win 90 games.
The issue is not their pitching, which PECOTA has as a little better than league average. The problem is they are projected to score 691 runs, which is the lowest number in the AL.
How does an 88-win team seemingly improve, yet become a 73-win team? A few possibilities:
Pythagorean Record This is the obvious one. Seattle was outscore by 19 runs last year, and their third-order record was just 78- 84. If you are setting the baseline on the 2008 team at 88, you are wildly overestimating their true talent level.
Ichiro PECOTA has Ichiro hitting .304/.346/.384, for a VORP of just 14.7. This is a guy with a career line of .333/.379/.437, whose average VORP over the last three years has been 48.1. PECOTA is consistently down on Ichiro (last year it had him at .310/.353/.400; he hit .351/.396/.431), so you have to think they are being unfairly docked a few wins here.
Age The average age of the 30 MLB teams last year was about 28. Weighted for playing time (from here), the average age of the 2008 Mariners' lineup is 30.3, with Ichiro, Ibanez, Sexson, Vidro Johjima, and Wilkerson all on the wrong side of 30. This is another thing that goes unnoticed by he mainstream media. There is a significant difference between going from 28 to 29, and 30 to 31, and I think that is part of the reason their projection is as low as it is.
Defense Seattle was 27th in the majors last year with a defensive efficiency of .678. This also goes back to their age- it is likely that they will be just as bad, if not worse, this year, as everybody is a year older, and they traded away a very strong defender in Jones.
73 wins seems a little extreme, but I think a projection of around 76 is entirely reasonable. Either way, articles like this are sure to be written, but we'll have to wait and see if there's a post like this come September.
Sportsbook has posted odds on who will lead the majors in HRs in 2008. This kind of thing is extremely difficult to handicap but, with the help of PECOTA, I thought I'd give it a shot.
Here are a few of the bets I think there's some value in.
Ryan Braun Odds: 15:1 PECOTA homers: 39, 2nd in MLB
Braun was an absolute beast last year, hitting 34 HRs in only 492 PAs en route to winning the NL RoY (the selection was obviously debatable, but for reasons other than his bat). That's a homer every 14.5 PAs. His odds are the clear outlier among the '08 predicted HR leaders- the best you can do on anyone else in the top four is 5:1.
Rick Ankiel Odds: 300-1 PECOTA homers: 30, 13th in MLB
Yes, seriously. PECOTA is very bullish on Ankiel- other projections dont' have him nearly as high. Will he lead the majors in HRs? No, probably not. But it's important to remember how good these odds are. Betting $20 would net you $6000. And hey, stranger things have happened.
Carlos Pena Odds: 50-1 PECOTA homers: 33, 6th in MLB
Carlos Pena hit 46 home runs last year. Vlad Guerrero hit 27- he's never topped 45, and hasn't even reached 40 since 2000 (with a team that no longer exists). Yet they both have the same odds. I could do this comparison with Pena and pretty much anyone else, since his remarkable '07 season really flew under the radar. Sure, it doesn't really match up with his career stats, but it's hard to argue with getting the guy who was 4th in the majors in HR at 50-1.
Alfonso Soriano Odds: 25-1 PECOTA homers: 35, 5th in MLB
The value here is because Soriano is coming off a "down year", hitting only 33 homers (mostly because he missed 27 games). Still, he hit 46 in '06, while playing half his games at RFK. He should probably be in the range of 10-1, but the line is skewed by last year's low output.
Othere decent bets: David Wright (100:1), Adrian Gonzalez (200:1), Nick Swisher (75:1), Grady Sizemore (300:1). Edit: Dunn (18:1), too. Good point.
I've posted all the odds here; feel free to post your pick(s) in the comments. The full PECOTAs are available here; a BP subscription is required.
This was a lot of fun, and there's more to look at, so I thought it deserved another post.
First, the final standings. Methodology is slightly different, as I'm using RMSE (Root Mean Squared Error) rather than just average error. This penalizes large misses more (sorry, Buster), and is more widely used.
As we'll see, Neyer comes out on to pretty much any way you do this. Notice the top five are all at least based on statistical systems. Be sure to remember that when you're watching Baseball Tonight next March. Your time is probably better spent subscribing to BP.
Just for fun, I also looked at how many of the Vegas over/unders everybody chose correctly. A lot of this is luck- if the line is 83.5, and your prediction is 84, you get the same credit if that team wins 84 or 104. (BTW, Silver is Nate Silver from BP; he's the "PECOTA guy", among other things. As I understand it, he took the PECOTA predictions and just made adjustments where he saw fit.)Again, pretty meaningless, but Caple and Law both jump way up. Gammons is last, but I wouldn't be too worried about that- I'm pretty confident he didn't place any wagers.
Now, the following is very interesting. Sky beat me to this (although I believe he did straight difference rather than RMSE), but if we really want to know how good these predictions were, we should look at Pythag record.
Basically, Pythag record is how many games a team should have won, based on how many runs they scored and allowed. It's a better at predicting future performance than actual record, and thus is a better indicator of team strength. It's great for this exercise, so lets take a look.
These numbers are noticeably lower. This makes sense- there is less variation in Pythag record than actual record (I think). I found this table to be very interesting. Sports Interaction jumps up considerably, which doesn't surprise me. Think about how much money these guys have at stake with this stuff. It's their job to post a number that will entice people to bet equally on both the over and the under. I would hope they are good at it- it's a lot different than Steve Phillips handing in a list of completely arbitrary numbers to some ESPN editor.
Beyond that, the list is once again dominated by the "numbers guys". I am somewhat surprised that Olney did so poorly.
The obvious next step is to see who got lucky, and whose picks were better than they originally appeared. Since the average Pythag miss was 1.39 smaller than the average actual miss, I have taken that into account in the final column. A negative "Adj Diff" means you got lucky.
Phillips' picks were poor to begin with, and he got unlucky on top of that (which is only fair, considering his recent good luck in other areas). Olney had the worst picks according to Pythag by a pretty wide margin, but had luck on his side, which allowed him to almost catch up to Phillips.
The numerical systems all obviously did better than others regardless of what metric I've looked at, but they were also on the lucky side. I would be interested to see if this is also true in previous years (which would indicate that it's not actually luck). The PECOTA predictions for each year since 2003 are readily available, so when I have time I figure I'll look into those.
Everybody makes baseball predictions in late March/early April. A lot of people just predict who will win each division, and who will advance to the World Series. Anyone can do this- you really only have to have a general knowledge of the top teams.
There are also people who predict how many wins each of the 30 teams will have. There are various complications with this (Jayson Stark's predictions have the average team winning 83.6 games, which is quite unlikely), but the thing about this is you actually have to know what you are doing. People make these predictions differently- some rely strictly on numbers, others on "feel".
I found 13 sets of these predictions- 10 from ESPN (Gammons, Stark, Crasnick, Olney, Neyer, Kurkjian, Phillips, Law, Caple, Karabell), two from BP (PECOTA and BP Hit List), and also the over/unders from SportsInteraction.com (via SoSH). I thought I'd take a look at some of the best and worst individual predictions, as well as whose overall predictions were most accurate.
(Note: These lists aren't just based on who was the closest- I also factored in how far off the other predictions were. So predicting at team within two games if the average prediction was eight games off would be higher than predicting a team exactly if the average was just three games off.)
The Best
1. PECOTA, Chicago White Sox Predicted wins: 72 On pace for: 71.0
The over/under for the White Sox was 89.5, and the ESPN analysts average prediction was 84.6. Chicago won 90 games in '06 after winning 99 in 2005. Much was made of PECOTA's pessimism, but this turned out to be, pretty easily, the best prediction of the year.
2. Jayson Stark, Seattle Mariners Predicted wins: 85 On pace for: 86.6
The average for everyone else was 76.5, and PECOTA had them winning only 73 games. They only won 78 games in '06, while finishing last in the AL West. ESPN's preview had JJ Putz under "Bust", as they were worried about his elbow pains. I feel like that turned out OK for him.
3. Steve Phillips, Minnesota Twins Predicted wins: 78 On pace for: 78.6
Steve Phillips: Not Smart! Well, for now at least. Nobody else at ESPN had the Twins winning less than 83 games, and PECOTA pegged them at 90. On the ESPN Message Boards, Twinsdude08 remarked that, "The Twins just have too much talent to not win the division." I don't know how Phillips came to 78 wins, but, as well see later, this accuracy certainly isn't a trend.
4. Rob Neyer, Washington Nationals Predicted wins: 69 On pace for: 71.1
People (especially Buster Olney), thought the Nationals were going to be really bad. The second most optimistic ESPN prediction was 64 wins; six had them losing over 100 games. Neyer, who always refers to his predictions as "running the numbers", was more realistic- it's hard to lose 100 games in the NL, since all the other teams are really bad too.
5. Peter Gammons, Colorado Rockies Predicted wins: 84 On pace for: 87.2
The Rockies have far exceeded all expectations- their over/under at SportsInteraction was 74.5 wins, and nobody else had them winning even 80 games. Even Gammons didn't see this coming, but everybody else was so far off that his prediction makes the list.
The Worst 1. Buster Olney, Washington Nationals Predicted wins: 49 On pace for: 71.7
Pretty much everyone was a little off on the Nats, but this one stands out. Sure, things didn't look good back in March, but 113 losses? No NL team lost more than 96 games in '05 or '06- it would be quite amazing if someone was actually that bad. Olney is a smart guy, but I'm not sure where he got 49 wins from.
2. Jim Caple, Kansas City Royals Predicted wins: 54 On pace for: 70
I don't know, maybe people just think it's funny to pick teams to be amusingly bad. I kind of see Caple's reasoning here, as he predicted the other four AL Central teams to average 89 wins. But seriously, how did he see this playing out? Did he figure they would all go like 16-3 against the Royals? Thats the only way they could average 89 wins, since they have to play each other so many times.
3. Steve Phillips, Boston Red Sox Predicted wins: 82 On pace for: 96
This only came out third in my little formula, but that may be generous. Boston was a mess in '06, and they still managed 86 wins. Nobody else had the Red Sox winning less than 90 games. Between this and repeatedly predicting the Yankees to miss the playoffs in August, I feel like Phillips just makes predictions for the shock value of them.
4. Keith Law, Seattle Mariners Predicted wins: 65 On pace for: 86.6
Law and Stark didn't quite see eye to eye on this one, as their predictions were 20 wins apart, the highest such margin. Seattle has surprised people, but their over/under was 79.5 wins; there really wasn't any reason to think they would approach 100 losses.
The Rest (Predictor, Team, Prediction, Actual Pace)
5. Philips, Diamondbacks, 78, 90.8 6. Phillips, White Sox, 92, 71 7. PECOTA, Devil Rays, 78, 66.4 8. Karabell, Cubs, 75, 86 9. Stark, Reds, 85, 74.2 10. Karabell, Astros, 88, 70.5 Now, let's look at whose overall predictions were the most accurate. The table on the right is ranked by how close people were, on average of all 30 predictions.
The top three are all predictions based on numbers. PECOTA is 100% quantitative, and both Neyer and the Hit List rely heavily on numerical predictions.
Those are the only three that did better than Vegas. Neyer did really well- his picks are 19-10-1 against the over/unders so far. Even more impressive, of his seven predictions that had large discrepancies with Sports Interaction, he was right on six of them.
On the other end of the spectrum is, not surprisingly, Mr. Phillips. If you watch Baseball Tonight and SportsCenter (or are a Mets fan...) this probably doesn't come as much of a surprise. Luckily, Steve Phillips isn't paid a lot of money to analyze baseball for a living- if he was, his incompetence would be pretty embarassing.