I like to think that I understand the concept of "small sample size" fairly well. I'm aware that we're only about 1/8 of the way into the season, and it's too early to get caught up in the order of the current standings.
I try to put things in perspective by looking at BP's PECOTA Playoff Odds report. See, that's better. The Orioles only have a 1.25% chance of reaching the playoffs. The world makes more sense now.
But I find even this analysis lacking at this point in the season. If you look at the "Pct3" column, you will notice that each team's win percentage is the same as it was in the preseason. Now, I don't think we should expect the Tigers to win 31.6% of their games the rest of the way, but 56.2% seems a little high, doesn't it? It seems like that should be some kind of happy medium.
So, I decided to try to determine what this "happy medium" is. I went and took the PECOTA projections from the past five seasons, along with each team's Pythagorean record through their first 20 games. I used these two as the independent variables in a regression, with the dependent variable being each team's winning percentage in games 21-162. I did this again at the 40-, 60-, 80-, 100-, and 120-game marks. (Note: This took forever.) Here is what I came up with:
This is interesting- notice except for the 80-game mark (which is just weird), the PYTHAG variable slowly rises as more games are played. This is what we would expect. Looking at the P-value, it doesn't become clearly significant until after 100 games, but I think it'd be hard to argue that it's not significant before that. I'm pretty confident that if you did this for the last 10 or 15 seasons, rather than just the last 5, we would be sure that they are significant.
This data is probably better seen in a graph. Here is the weight we should give to PECOTA, versus the weight we should give to PYTHAG, at each point:
The weirdness at the 80-game mark continues to be an annoyance, but I think this gets the general point across. Right now, it's about a 90/10 PECOTA/PYTHAG spit. The PYTHAG portion increases by about 5.5% after each 20-game stretch, until we're at a 63-47 split in mid-August. It's hard to look any further than that, because you start trying to predict a really small sample, but in October it looks like it ends up at almost an even split, if this trend continues linearly.
Using the 90-10 split, we can create what I think are pretty accurate projected standings right now. These take into account the team's record so far; the W% column is their expected winning percentage the rest of the way. The next column is how many game PECOTA predicted them to win prior to the season, and then finally the difference between their current win prediction and PECOTA's original one.
In the East, PECOTA still expects the Yankees to be the superior team the rest of the way, but Boston's current 3.5 game lead means both teams have an equal chance of winning the division. The Rays have outscored opponents by 6 runs, so their expected W% hasn't been significantly decreased, but their 8-11 start has dropped their expected W total by 2.5.
The West has gone essentially according to plan- the part I'm interested in here is the Central. BP's PECOTA Playoff Odds Report paints an optimistic picture for the two current cellar dwellers, expecting them to win the division 80% of the time. That changes significantly when we look at it this way- the White Sox are 11-7, and their Pythagorean record is even better, at 12-6. This has caused their expected W% to rise from .475 to .495, a very significant boost. I am not smart enough to run a Monte Carlo simulation, but I'd guess that the division champ breakdown for CLE/DET/CHW would be around 40/30/25, with about 5% left over for the Royals and Twins.
Now for the NL:
The Braves and Phillies started the year each expected to win 86 games, and Atlanta is currently only one game ahead of Philadelphia in the standings. But the Braves have outscored their opponents by an impressive 35 runs, while the Phillies are only +2. This causes Atlanta's predicted record to be about 3.5 games better.
Not much has changed in the Central, except for the fact that the Cardinals' hot start has allowed them to vault ahead of the Reds. Impressively, the Pirates being even worse than expected- they've been outscored by 42 runs already (allowing 6.7 R/G will have this effect).
In the West, the Diamondbacks have gone from a dead heat with the Dodgers in the preseason to having an expected 8 game cushion. Arizona has been the best team in the majors to date, outscoring their opponents 116-65. Meanwhile, the Dodgers have been somewhat unlucky, with their Pythag record (11-8) being three games better than their actual record (8-11).
Monday, April 21, 2008
Finding The Happy Medium
Subscribe to: Post Comments (Atom)
18 comments:
This post is the kind of stuff to shove up Rick Reilly's ass. Good stuff.
Good work. It's interesting how pecota is still weighted much more heavily even late into the season.
I'm surprised pct3 is the same. On the bottom of the pecota adjusted postseason odds page, it says they are taking the aeqr pythag wpct and regressing that toward the pecota projections, but they obviously aren't.
One more thing - the pecota adjusted playoff odds page says it is regressing performance toward the pecota predicted winning percent, then running a simulation of the rest of the season using the log5 method to determine who wins the game, but the pecota projected winning percent is based on simulations of the entire season. So the pecota projected winning percent is not an approximation of a team's adjusted winning percent (like for example, the kenpom.com pythag wpct), but an approximation of their winning percent given their schedule. So it would then assume the cubs and red sox (both projected to go 91-71) are equal by pecota, while pecota actually finds the red sox better, but having a harder schedule.
Excellent post. Nothing else needs to be said
2nd to evil monkey. muchas gracias
Check-plus.
Excellent, excellent post. Thanks.
"I am not smart enough to run a Monte Carlo simulation."
Hogwash. If you are smart enough to do regression analyses, you are twice smart enough to do a Monte Carlo. Take your projected wpct and run through the remaining schedule an arbitrarily large number of times. (BP does one million.) The log5 method is a quick and effective way to get a result for each game based on the two team's wpct.
All you need:
1) A database with the 2008 MLB schedule.
2) Your projected wpct for each team for the remainder of the schedule.
3) A program to read the database and your projections, run the simulations, and report the results.
4) A snack to munch on while the program runs.
If you are saying you are not smart enough for #3, I still say hogwash. But if you'll give me (or point me to) #1, I'll make you #3.
Awesome.
Now, how do those projected win totals compare to current online odds? Can we make some money by taking advantage of others' overreactions?
Fastness, Evilmonkeycma, DCThrowback, Passive Voice, and Bobby S- Thanks for the insightful comments. (I'm kidding, I'm glad you like the post.)
Sky- Looking at Matchbook, the ones that jump out are ATL O85.5 +125, and KC U72.5 +150. I'm not sure where else win totals are still posted.
Skoor- This would be fun. My only concern is that the PECOTA numbers already account for SOS, so teams that play hard schedules would be getting penalized twice. I could do #1, although it would take awhile unless there is something like that freely available.
You should be using actual record, not pythad. If you are trying to predict what the actual record of the team is going to be at the end of the season at this stage, just take their actual record and pro-rate the rest of the season using whatever projection system you want. Pythag tells us how many games the team should have won up until this point, but frankly, we don't care how many games the team should have won since we KNOW how many games the team has won. Whether a team is 21-1 by winning all 1 run games or 21-1 by blowing out their opponents, I don't care. I know the team won 21 games. I don't need to, nor should I correct for that. The proration of the projection system over the next 140 games is going to take care of that regression to the mean in luck.
"These take into account the team's record so far; the W% column is their expected winning percentage the rest of the way."
Actual record is what is used for games already played. The Pythag record is only used to help determine their winning percentage from here on out.
"My only concern is that the PECOTA numbers already account for SOS, so teams that play hard schedules would be getting penalized twice."
That's a concern of mine, but it's not the only one. You are also using Pythag based on actual RS/RA, rather than AEQR/AEQRA (expected RS/RA via the batting lines produced, adjusted by SOS). Both of these concerns relate to #2 on my list of things needed to do a Monte Carlo sim.
Davenport's solution is to regress the AEQR/AEQRA Pythag to .500, then take a random sample from a normal distribution around the regressed WPCT. The flaw in that, IMHO, is that teams do not regress to .500. Your work above indicates that teams regress strongly to their PECOTA projections. And that's exactly what Davenport does in his PECOTA-adjusted Playoff Odds. Perhaps we should find out if he is using regression values similar to the ones you derived.
"That's a concern of mine, but it's not the only one. You are also using Pythag based on actual RS/RA, rather than AEQR/AEQRA (expected RS/RA via the batting lines produced, adjusted by SOS)."
I wanted to use that, but figured I wouldn't be able to find it for the past data.
What I did is essentially the same thing as combining Davenport's original Playoff Odds, and his PECOTA Playoff odds. For example, the original odds have the Diamondbacks winning 95.4 games, and PECOTA has them winning 90.9 games. My prediction of 93.3 comes down right in between- it's higher than (90.9+(95.4-90.9)*.1) because he is regressing to .500 to get that 95.4 number; if he was using straight Pythag or straight AEQR/AEQRA, I'd assume they would be expected to win more than 95.4 games at this point.
Interesting article.
It would be a nightmare to compile all of the necessary data, but I wonder if weighting the pythagorean projections based upon the schedule a given team has already played versus the schedule a given team has yet to play would provide more accurate predictions.
For instance, the Twins have played the Royals 6 times thus far. In those games, they scored a total of 19 runs and allowed 15. The pythagorean expectation would project them to win about 62% of their remaining 12 games against the Royals. Given that they have 140 games left, you could give .62 a weight of 12/140. Rinse and repeat for the rest of the teams, applying the overall pythagorean projection towards any teams that they haven't played. Then, sum up the respective pythagorean shares to determine a weighted pythagorean expectation.
Then again, that's a lot of work. In any case, great article.
"It would be a nightmare to compile all of the necessary data, but I wonder if weighting the pythagorean projections based upon the schedule a given team has already played versus the schedule a given team has yet to play would provide more accurate predictions."
I think it probably would be more accurate. I also think it'd probably take hundreds of hours to do that. I'd like to incorporate some kind of schedule component, but have yet to figure out some kind of reasonable way to do it.
If you have the MLB schedule in a database, it is not that hard to implement chuckdickens's approach. But would it really be more predictive? The sample size for a single team v team matchup is vanishingly small, especially for interdivisional (or even interleague) matchups, and hence especially prone to results very deviant from the mean. Would the deviations even out over an entire schedule? Perhaps, but isn't that a tacit admission that the value of Pythag prediction is in the aggregate?
What I would want to do if we incorporated strength of schedule is take the PECOTA predictions and figure out how difficult each team's schedule has been so far. Then, incorporate it either as a separate variable, or to help figure out how much weight to give to Pythag W%.
Wish I could add something as lucid as those commenters above, but all I have is a request to keep us updated at the 40 game mark.
Great work.
Post a Comment