tag:blogger.com,1999:blog-7965585998403674176.post-70655201739133815992008-04-21T22:52:00.008-04:002008-04-22T00:08:54.962-04:002008-04-22T00:08:54.962-04:00Finding The Happy MediumI like to think that I understand the concept of "small sample size" fairly well. I'm aware that we're only about 1/8 of the way into the season, and it's too early to get caught up in the order of the <a href="http://mlb.mlb.com/mlb/standings/index.jsp">current standings</a>.<br /><br />I try to put things in perspective by looking at BP's PECOTA <a href="http://www.baseballprospectus.com/statistics/ps_oddspec.php">Playoff Odds report</a>. See, that's better. The Orioles only have a 1.25% chance of reaching the playoffs. The world makes more sense now.<br /><br />But I find even this analysis lacking at this point in the season. If you look at the "Pct3" column, you will notice that each team's win percentage is the same as it was <a href="http://baseballprospectus.com/fantasy/dc/index.php?">in the preseason</a>. Now, I don't think we should expect the Tigers to win 31.6% of their games the rest of the way, but 56.2% seems a little high, doesn't it? It seems like that should be some kind of happy medium.<br /><br />So, I decided to try to determine what this "happy medium" is. I went and took the PECOTA projections from the past five seasons, along with each team's Pythagorean record through their first 20 games. I used these two as the independent variables in a regression, with the dependent variable being each team's winning percentage in games 21-162. I did this again at the 40-, 60-, 80-, 100-, and 120-game marks. (Note: This took <span style="font-style: italic;">forever</span>.) Here is what I came up with:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_Xtn5bp8dZEg/SA1Vp0Joo7I/AAAAAAAABZA/mH8Evh92um4/s1600-h/pecotapythagtable.bmp"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_Xtn5bp8dZEg/SA1Vp0Joo7I/AAAAAAAABZA/mH8Evh92um4/s400/pecotapythagtable.bmp" alt="" id="BLOGGER_PHOTO_ID_5191900122290103218" border="0" /></a>This is interesting- notice except for the 80-game mark (which is just weird), the PYTHAG variable slowly rises as more games are played. This is what we would expect. Looking at the P-value, it doesn't become clearly significant until after 100 games, but I think it'd be hard to argue that it's <span>not</span> significant before that. I'm pretty confident that if you did this for the last 10 or 15 seasons, rather than just the last 5, we would be sure that they are significant.<br /><br />This data is probably better seen in a graph. Here is the weight we should give to PECOTA, versus the weight we should give to PYTHAG, at each point:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_Xtn5bp8dZEg/SA1W3EJoo8I/AAAAAAAABZI/avOp14FlkpU/s1600-h/pecotapythaggraph.bmp"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_Xtn5bp8dZEg/SA1W3EJoo8I/AAAAAAAABZI/avOp14FlkpU/s400/pecotapythaggraph.bmp" alt="" id="BLOGGER_PHOTO_ID_5191901449434997698" border="0" /></a>The weirdness at the 80-game mark continues to be an annoyance, but I think this gets the general point across. Right now, it's about a 90/10 PECOTA/PYTHAG spit. The PYTHAG portion increases by about 5.5% after each 20-game stretch, until we're at a 63-47 split in mid-August. It's hard to look any further than that, because you start trying to predict a really small sample, but in October it looks like it ends up at almost an even split, if this trend continues linearly.<br /><br />Using the 90-10 split, we can create what I think are pretty accurate projected standings right now. These take into account the team's record so far; the W% column is their expected winning percentage the rest of the way. The next column is how many game PECOTA predicted them to win prior to the season, and then finally the difference between their current win prediction and PECOTA's original one.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_Xtn5bp8dZEg/SA1c5UJoo9I/AAAAAAAABZQ/4avb17gRD7s/s1600-h/alstandings.bmp"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_Xtn5bp8dZEg/SA1c5UJoo9I/AAAAAAAABZQ/4avb17gRD7s/s400/alstandings.bmp" alt="" id="BLOGGER_PHOTO_ID_5191908085159470034" border="0" /></a>In the East, PECOTA still expects the Yankees to be the superior team the rest of the way, but Boston's current 3.5 game lead means both teams have an equal chance of winning the division. The Rays have outscored opponents by 6 runs, so their expected W% hasn't been significantly decreased, but their 8-11 start has dropped their expected W total by 2.5.<br /><br />The West has gone essentially according to plan- the part I'm interested in here is the Central. BP's PECOTA Playoff Odds Report paints an optimistic picture for the two current cellar dwellers, expecting them to win the division 80% of the time. That changes significantly when we look at it this way- the White Sox are 11-7, and their Pythagorean record is even better, at 12-6. This has caused their expected W% to rise from .475 to .495, a very significant boost. I am not smart enough to run a Monte Carlo simulation, but I'd guess that the division champ breakdown for CLE/DET/CHW would be around 40/30/25, with about 5% left over for the Royals and Twins.<br /><br />Now for the NL:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_Xtn5bp8dZEg/SA1iDUJoo-I/AAAAAAAABZY/7XfGGWx-R5g/s1600-h/nlstandings.bmp"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_Xtn5bp8dZEg/SA1iDUJoo-I/AAAAAAAABZY/7XfGGWx-R5g/s400/nlstandings.bmp" alt="" id="BLOGGER_PHOTO_ID_5191913754516300770" border="0" /></a>The Braves and Phillies started the year each expected to win 86 games, and Atlanta is currently only one game ahead of Philadelphia in the standings. But the Braves have outscored their opponents by an impressive 35 runs, while the Phillies are only +2. This causes Atlanta's predicted record to be about 3.5 games better.<br /><br />Not much has changed in the Central, except for the fact that the Cardinals' hot start has allowed them to vault ahead of the Reds. Impressively, the Pirates being even worse than expected- they've been outscored by 42 runs already (allowing 6.7 R/G will have this effect).<br /><br />In the West, the Diamondbacks have gone from a dead heat with the Dodgers in the preseason to having an expected 8 game cushion. Arizona has been the best team in the majors to date, outscoring their opponents 116-65. Meanwhile, the Dodgers have been somewhat unlucky, with their Pythag record (11-8) being three games better than their actual record (8-11).Vegas Watchhttp://www.blogger.com/profile/02353166004125421683VegasWatch@gmail.com18