Probability: Why Hitting Streaks are Impressive (And Why They’re Not)

As it seems to happen every year, baseball was recently aflutter with a hitting streak chasing Joe DiMaggio‘s legendary 56 game streak. This time, it was Jackie Bradley Jr.‘s 29 game hitting streak, between April 24 and May 25. During this stretch, Bradley collected 44 hits, including 20 XBHs to triple-slash .415/.488/.783 and raised his batting average for the season from .222 to .350.

Then, his teammate Xander Bogaerts followed it up with a 26 game hitting streak of his own from May 6 to June 2. During this time, Bogaerts collected 45 hits, 13 of which were XBHs, and triple-slashed .385/.419/.581.

Past Hitting Streaks of 25 Games or More

Using Baseball-Reference’s Play Index, Bradley and Bogaerts are two of the 39 batters to achieve 25+ game hit streaks since 2000, and were the first pair of teammates to have a hitting streak of that length start and end in the same season since Luis Castillo and Kevin Millar did it for the 2002 Marlins. Remarkably, they are also the 7th and 8th Red Sox players to achieve a hitting streak of at least 25 games since 2000, joining Nomar Garciaparra (26 games, 2003), Johnny Damon (29 games, 2005), Manny Ramirez (27 games, 2006), Victor Martinez (25 games, 2009), Dustin Pedroia (25 games, 2011), and David Ortiz (27 games, 2012-2013). Long streaks don’t just seem to happen in baseball every year; they seem to happen for the Red Sox alone that often!

A large part of the compulsion to watch hitting streaks is how invested fans become in the chase of DiMaggio’s streak. There is breakaway coverage and live look-ins to at-bats, constant updates on where the batter is in the lineup, or in which inning he collects the most hits. Another part of the interest comes from the sheer skill needed to amass such a streak.

Determining the Probability of a Hitting Streak

Long streaks seem to indicate that a player is “locked in at the plate” or “seeing the ball well”. And that is likely true, to some extent. Just as players go through slumps when they’re hurt, they may be equally likely to have particularly good conditioning for a stretch, or face pitchers that they match up against well. There is conflicting evidence as to whether or not a “hot-hand effect” truly exists, or not. But one thing is certain: regardless of whether or not a player is “hot” or if that effect exists, the probability of a hitting streak is quite low the longer it goes on.

To build a model of hitting streak probability, let us first imagine a simple coin flip. When flipping the coin there are two possibilities, heads and tails, each with equal likelihood. If we were to flip the coin four times, there would be a certain probability that at least one flip out of the four was heads (we’ll call this a “head series”). If we did this series of 4 flips 162 times (that is, 648 times total, looked at in sets of 4), there would be a certain probability that there would be a long stretch of consecutive head series.

To adapt this model to baseball, we’ll change the “flips” to “plate appearances”, the “series” to “games”, and set the probability to “hitting average”.

It is important to note the difference in “hitting average” and “batting average“. While batting average is an actual statistic that gets tracked, hitting average is a statistic that we’ve developed for the purpose of this model. A player is very likely to get at least 4 plate appearances in a game, but if they earn a walk in 2 of them, then they will only be credited with 2 at-bats, the statistic that gets used to calculate batting average. In other words, batting average tracks the probability a player will get a hit when they are trying to get a hit, where as hitting average tracks the probability that a player will get a hit within a certain plate appearance, regardless of game context. To give context, the hitting average for the 4367 separate player-seasons since 2000 that had at least 300 PAs in a season is .243 (the corresponding batting average is .270).

From here, we will determine the probability that a player gets at least one hit in any one game. To do this, we subtract the probability that they will get no hits in 4 plate appearances from 1. For example, if a player had a .250 hitting average, the probability that they will get 0 hits in 4 plate appearances is the probability of not getting a hit
$(1 - 0.250 = 0.750)$ to the fourth power: $0.750^{4} \approx 0.6836$. This means that this player has a 68.36% chance of getting at least one hit in any one game.

What we want to figure out is the probability that they will get a long stretch of hits over a season. It is reasonable to note, however, that hitting streaks are not season sensitive. For instance, Ortiz’s 27 game streak occurred almost equally between the end of his 2012 season and the beginning of 2013. The longest recent streak, by Jimmy Rollins at 38 games, occurred mostly in 2005 and extended into 2006. But in the scope of modeling how likely it is to happen each year, we will limit the calculations to a 162 game stretch.

To determine the probability that a 25 game hitting streak will occur during a 162 game season is different from estimating the probability that a player will get a hit in 25 consecutive games. To use the analogy from before, the probability of 25 consecutive heads in 25 flips is different than the probability of a stretch of 25 consecutive heads amongst a total of 162 flips. Knowing this, we will use the work of French mathematician Abraham de Moivre from 1738 and used here (before the invention of baseball, but it still applies!).

Charting the Probability of Hitting Streaks by Average

By running different hitting averages and streak lengths through the calculation, we can chart the data to see how likely it is for a player to get a hit streak of a certain length. To make this chart compatible with the more common batting average, simply take the batting average and subtract around 0.030 (30 points) of batting average to estimate the hitting average.

To find the probability of a hitting streak, subtract 30 points from the player’s batting average to create their “hitting average” (or talent-level batting average). This normalizes for the effect of BBs and HBPs to turn BA into H/PA instead of H/AB.

Interesting things to note from the chart:

1. A player with a .200 hitting average is more likely to go an entire season without a hit (0.9997 probability of a hitting streak of at least 1 means 0.0003 of no hits) than Joe DiMaggio was to get a 56 game hitting streak (.289 hitting average translates to < 0.0000 probability)
2. After about 20 games, it becomes substantially more unlikely for the streak to continue, and this accelerates after 25 games. This bears out what we observe in the fanbase and media attention to streaks.
3. Medium length streaks (~15 games) are much more likely and common than we probably realize as fans.

Why Hitting Streaks are Impressive

Analyzing Jackie Bradley Jr.’s Hitting Streak

With this chart, we see how improbable Jackie Bradley Jr.’s hitting streak really was. Even considering the fact that the streak greatly increased both his batting and hitting averages, his career hitting average through a little more than 1000 PAs since his debut in 2013 is only .209. Consulting our chart, we can see that there is approximately a 1-in-33,333 (0.003%) chance of a hitting streak of this length for someone of that profile. In real world terms, this is like flipping a coin 15 times and getting all heads.

However, in fairness to Bradley, his true talent may be much higher than this number. Throughout his minor league career, Bradley maintained a .249 hitting average. To split the difference, we’ll combine his minor league and major league stats to put his hitting average at .233. This changes the odds to about 1-in-4,750 (0.021%), or about 7 times more likely. Again, in real world terms, the probability of getting struck by lightning at some point in your life is 1-in-3,000, or 1.5x times as likely. In some ways then, Bradley’s streak was like catching lightning in a bottle!

Analyzing Xander Bogaerts’s Hitting Streak

Almost completely counter to the Bradley example, however, Xander Bogaerts’s streak was pretty unsurprising. Bogaerts’s career hitting average is .271. A 26 game hitting streak with this hitting average has a probability of 0.6%, or 1-in-160. In real world terms, Xander was 3 times more likely to accumulate this hitting streak than getting a flush in 5-card poker (about 1-in-505, or 0.2%).

Since 2000, there have been 1477 player-seasons with a player having a hitting average of at least .270. Of these, only 5 have contained hitting streaks of at least 26 games, for a percentage of 0.34%. If we extend this to hitting averages of at least .260 (since Bogaerts’s is currently a little higher due to the streak itself), we see that the percentages line up almost exactly, with 10 instances, coming to 0.64%.

Essentially, a hitting streak like Xander Bogaerts’s seems to happen almost every year because at any time there are about 100 players with similar hitting averages to his in the league and we expect almost one of them to have a 26 game hitting streak, probabilistically.

Why Hitting Streaks Aren’t Impressive

A case for the unlikeliness of hitting streaks like Jackie Bradley Jr.’s simultaneously shows both why hitting streaks are impressive and why they are not. For someone like Bradley to have a hitting streak that long against such great odds, it would seem to require great skill. But the very fact that he does not possess great (relative) skill is what makes the odds so long. To see this easily, looking at someone like Xander Bogaerts and the much greater likelihood of his hitting streak diminishes the impressiveness of the feat.

The case most frequently used against hitting streaks states that a hitter could amass a streak (of any length) simply by triple-slashing .200/.200/.200. This would obviously hurt his team, but the hitting streak would be maintained. While this argument is valid, nothing quite close to it has happened in recent memory.

Since 2000, the worst batting average over a streak of at least 25 games was Casey Blake‘s .317, the worst OBP was Nolan Arenado‘s .383, and the worst SLG was Willy Taveras‘s .426. On their own, each of these are more than acceptable production from any level of player, so the argument that the player can hurt their team during a hitting streak, while true, doesn’t hold up to the evidence.

The argument that does hold weight, however, is that a hitting streak does not imply that the team itself is performing well, even though the player might be. The aggregate record across the 39 instances is 587-493, for a .544 winning percentage. This prorates to about 88 wins for the team in a year, certainly a respectable season. But this number is skewed by some very good stretches from teams, including the 2016 Red Sox. In a little over a third of the cases (15/39), the team had a .500 record or less. These include some of the more high profile streaks, such as Chase Utley‘s 35 game streak in 2006 (the Phillies went 17-18), or Andre Ethier‘s 30 game streak in 2011 (the Dodgers went 13-17). So even in some of the hitting streaks with the most media attention, team success is not a guarantee. This could be due to any number of reasons, including opposing teams overexerting to stop the team in question, good players playing on bad teams, etc.

Hitting Streaks: the Ultimate Verdict

Summing it all up, there still remains conflicting evidence as to whether or not hitting streaks are statistically impressive or not. On the one hand, they are probabilistically difficult for any one hitter to accomplish. But simultaneously, there can be over 100 active players at any one time ready to go on a long streak. When players like Jackie Bradley Jr. go on long streaks, it causes us to re-evaluate what their baseline skills are. But if a player like Bradley can go on such a long tear, doesn’t this also mean that it might not be that impressive? Finally, how does the team’s outcome during the stretch matter? There are plenty of teams that have fallen on their face despite the success of an individual.

Ultimately, it seems that this will be a debate that will continue indefinitely. There are too many variables to definitively state whether or not the hitting streak is a statistically impressive feat.

Readers, what do you think? Is the hitting streak statistically impressive, or just a stretch of good luck? What factors do you use to make your decision? Which hitting streaks of the past few years have been your favorite to watch? Which player do you think is likely to pull off the next long streak? Leave your thoughts in the comments below or on Twitter @SaberBallBlog. Don’t forget to subscribe to SaberBallBlog by clicking the green “Follow” button in the menu, and follow on Twitter for all of the latest updates on the MLB!