Opening Day and the first weekend of baseball games is over, but baseball has just begun. It’s been half a year since regular season baseball has been played, and since humans love to draw meaning from events regardless if it makes sense to do so or not, it’s a natural response to make predictions about the rest of the year from these few games.
Unfortunately, and frustratingly, the games that have been played tell us essentially nothing about what the rest of the season. Just about anything can happen in a three- or four-game stretch of time. In a 162-game season, we need more information before we can start to accurately evaluate a team.
How much more information? That’s the trick, isn’t it? Famously, the Royals started 2009 off 18-11 and proceeded to win 35% of their remaining 133 games. More recently on these digital pages, Max wrote that historical trends suggested that the hot start the Royals had in April of 2021 probably meant that the team would be good. They weren’t—after playing at a .630 winning percentage in their first 20 games, they ended up with a .457 winning percentage overall.
To help answer this question, let’s take a look at more data—a whole year’s worth of data, to be precise. I looked at the standings on the first day of every month for the 2022 season and compared each team’s winning percentage to their final winning percentage. It resulted in a handful of graphs. First up is the May 1 standings. How close are they to the final standings?
Standings on May 1, 2022
The answer is...less than you might think.
For those of you unversed in what an r squared is, it’s basically a fancy term that describes how much of one variable can be explained by the other variable. An r squared of 1.0 means that 100% of one variable can be explained by the other variable, and an r squared of 0.5 means that 50% of one variable can be explained by the other variable.
In May 1 of 2022, the relationship between a team’s winning percentage and their final winning percentage was only an r squared of 0.338; in other words, 66% of the variance of the team’s final winning percentage could be explained by other variables than the team’s record. But without getting too into the weeds here (because a team’s record is itself a function of the talent on the team, which is the same talent that impacts the non-winning percentage factors, and so on and so on), it’s clear that there are just a lot of weird outliers here.
The biggest outlier is the Cincinnati Reds, whose .136 winning percentage on May 1 was more than .250 below than their final winning percentage of .383. Other big outliers include the White Sox (.381 in May vs. .500 at the end of the year), the Marlins (.571 in May vs. .426 at the end of the year), the Phillies (.478 in May vs. .537 at the end of the year), the Orioles (.364 in May vs. .512 at the end of the year), and the Guardians (.455 in May vs. .568 at the end of the year).
Is June better? The answer is yes, a lot better:
Standings on June 1, 2022
Now, by June 1, the standings are still not destiny; at that time, seven teams had either a winning record and ended up with a losing record or vice versa at the end of the year. But the corollary is also true: on June 1, 23 of 30 teams—77%—would end the season with the type of record, be it over or under .500, that they had at that point.
Additionally, by June 1, the distance between a team’s final winning percentage narrows considerably. The median team’s winning percentage on June 1 last year differed by 0.050 points, which translates to about eight wins one way or another over a full season. Eight wins is certainly enough to make or break a playoff run, but regardless, after two months of baseball you know a heck of a lot about a team. Indeed, the r squared relationship here between teams’ June 1 records and end of season records is 0.587. In other words, a team’s June 1 record directly accounts for the variance in said team’s final record.
Standings on July 1, 2022
As you might expect, by July, broad results are mostly locked in. Only two teams last year with a winning record on the first of July ended up with a losing record, and only two teams with a losing record on the first of July ended with a winning record. Furthermore, the r squared is over 0.75—not as big of a jump from May to June, but as the amount of games played goes down, it becomes harder and harder to deviate from a team’s current winning percentage.
So, what does this mean for the Royals this season? When can we judge how good they are? Late-season surges are always a possibility, as well as late-season slumps—the Royals carried a winning record past the midway point in 2016 and 2017 only to see it slip away in the second half of the year.
However, based on data from across the league, we won’t have a solid grasp on how good the Royals will be until June. There’s just too much noise in the data before then, and while it feels good to complain about, say, getting swept in the first series of the year, the first three games of the year just aren’t very predictive by itself.
If you want the best idea of where the Royals will probably end up before we get to June, projection systems are still your best bet. Those projection systems all say the same thing: the 2023 Royals are probably going to be really bad. It’ll take a few months of data to start to prove those projection systems wrong, if they are wrong. The baseball season is a marathon. Expect big variance early in the year. It’s part of the beauty of the game.