Written By: Jack Hinde
Edited By: Jovan Popovic & Matt Fuda
The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".
It’s no secret that batters are striking out at an all-time high rate. Every year before 2020 had set a record for MLB strikeouts in a single season. Though 2020 was shortened to 60 games, if the rate was extended to a 162-game season it would have been the second highest ever, behind only 2019. Here is a look at MLB strikeout totals since the expansion to 30 teams.
Much has been discussed about these rising totals, a lot of these discussions claiming that this has a strong hand in killing the game of baseball. It isn’t hard to understand the sentiment– there’s no entertainment in watching your favourite player make that long walk back to the dugout – but could it be possible that there is value in the way that the game is evolving?
The bottom line is that nobody ever wants to make an out, but is striking out the worst way to do it? It’s tough to deny that it’s the most embarrassing, but could a situation exist in which this is actually the lesser of the evils?
To evaluate this, this article will analyze a statistic called run expectancy. Run expectancy is based on the fact that baseball can be broken down into what is known as a game state. That is to say the number of outs and the combination of runners either on or not on a certain base contribute to the game state. The inning, the score and the pitch count are the other attributes of a game state, though they will not be applied in the analysis. For example on October 14, 2015 the Toronto Blue Jays were playing the Texas Rangers in Game 5 of the ALDS, Jose Bautista inherited the game state of the bottom of the seventh inning, the score tied 3-3, runners on first and third base and a 1-1 count, and what happened there was recorded as perhaps one of the most exciting plays in MLB history.
What is important about game states–specifically the baserunners and outs combinations, referred to from this point on as base/out states–is that they are finite. There are 8 different ways to place runners either on or not on base, and you can be batting with 0 outs, 1 out or 2 outs, so there are 24 total base/out states (3 types of numbers of outs x 8 different runner positions).
Using Retrosheet’s play-by-play data, one can look at these base/out states, and examine what happened after them, over the rest of the inning. If we focus on how many runs were scored after these states, then we can average the actual number of runs scored subsequently to get what is known as a base/out state’s run expectancy - the expected number of runs a team will score in an inning after entering the state. Using the play-by-play data over the 2018, 2019 and 2020 MLB seasons, we can compute each of these 24 run expectancies into a run expectancy matrix.
We accept the convention that 0 indicates that the base is empty, 1 indicates that the base is occupied and the bases are ordered first, second then third. So for example 011 indicates that runners are on second and third base. The code used to compute these run expectancy values can be found in Marchi, Albert and Baumer (2019) [1].
Every plate appearance starts with a given base/out state and at the conclusion of it ends in a new base/out state, so from this we can determine a change in the run expectancy that is known as the run value. If runs have scored then that play’s run value will include the number of runs added that scored. The formula, from Marchi, Albert and Baumer (2019) [1] is:
Run Value = Runs Scored + New Run Expectancy - Original Run Expectancy
We will look at two examples to show how this is applied. We consider a play where a batter begins their plate appearance with a runner on second base and one out, a single is hit, allowing the runner to score, the batter tries to take advantage of the throw home but it is cut-off by an infielder and thrown to second base, causing the second out of the inning. The original state was 010, 1 out and since the batter was thrown out at second the bases end empty, so the new state is 000, 2 outs, and 1 run scored on the play. The run value would then be 1 + 0.11 - 0.71 = 0.4 runs. [MF1] These run values are all in terms of expected runs, but the sample size is very large, giving a good basis for prediction. We see that in many cases a play can have a negative run value, this mostly happens when the batting team records an out.
This is where our conversation of the different types of outs is introduced. We can look at all the run values of all possible outs to determine if a base/out state exists where a strikeout provides the most productive out in terms of run value. We divide the data from 2018-2020 into three categories, where the play is either a strikeout, a groundout or a flyout, and then compare them. We have that in the period from 2018-2020 there were 99,616 strikeouts, 94,817 groundouts and 98,880 flyouts(popouts and caught line drives).
We can limit our analysis of base/out states to situations in which there are 0 or 1 outs, since those that occur with 2 outs have no effect on the run value becausethe inning is over regardless. Therefore, we only need to consider 16 different base/out states.
It is possible to get an intuitive sense of how the type of out made might have an effect on the run value by reviewing the rules surrounding how baserunners may advance on a play. On a flyout, the baserunner may choose to “tag up” on the base they occupied before the play and try to advance once the ball has been caught by the fielder. If they are tagged out before reaching the new base then they are out, but if the ball is hit deep to an outfielder then it will be too long of a throw to have a chance to catch the baserunner. In the latter situation, players may tag up and run to home, scoring a run. On a groundout, runners may advance as soon as the ball is hit. If the base behind the baserunner is occupied then that runner must run–as they are in a “force play”– so the ball only needs to be thrown to another fielder standing on the base. Players may score on a groundout. On a strikeout, baserunners may advance freely, though because of the nature of a strikeout the catcher has the ball already, and thus is ready to make a throw as soon as they see somebody running. On the rare occasion that the third strike is dropped by the catcher, the batter may still be safe at first if they can beat the force play at first base.
With an intuitive sense of how the type of out may have an effect on the scoring, we look at the mean run values of the three outs across all states.
So we see that in fact the strikeout is not the most counterproductive out. In fact, the groundout had the largest cost to teams expected run output. Perhaps other runners being forced out on base in a double play has such a negative effect on a team’s scoring that given the choice between always striking out and grounding out, on average a team would save 0.1 runs for themselves every time they chose to strikeout. Of course, this is an oversimplification, as not all situations are the same.
Next, consider each of the base/out states, and evaluate which type of out best benefits each situation by comparing mean run values in the state. In every base/out state we actually see that there isn’t an out that provides a positive run value, so the absolute value is applied to the run value. The following plot should be interpreted in terms of cost to a team’s expected runs. The base/out state is interpreted just as in the matrix above, with the number at the end representing outs.
Clearly there are cases where an out is most harmful and cases where it is least harmful. It is clear to see that in the case where nobody is on base, the type of out has no effect on the run expectancy. If we count the number of occurrences where an out is the least harmful, we see that strikeouts are the least harmful in 3 of the 16 states, groundouts are least harmful in 4 of the 16 states and flyouts are the least harmful in 7 of the 16 states. We say that with 1 out and a runner on first base, a flyout and strikeout have the same effect. We see that strikeouts are the most harmful in 8 of the 16 states, groundouts are the most harmful in 5 of the 16 states and flyouts are never the most harmful with one out and a runner on first base.
When there is a runner on third base or runner on second and third, we see that a strikeout has much more of a negative effect than that of hitting into an out. This is especially pronounced with 1 out, as the runner on third fails to score, and the force out is set up at first base. These are certainly situations in which it is advantageous to have a hitter who rarely strikes out.
With a runner at first base, grounding out is the most harmful out to make. This makes sense as the defense turning a double play quickly hurts a team’s chance of scoring. There is virtually no difference between striking out and flying out, although the runner on first base has the option to tag up and try for second base. This is rarely attempted with 1 out as getting thrown out on the basepaths would end the inning. With 0 outs, we actually see that runners trying for second base were enough to provide a negative return in run expectancy of a flyout. Second base is the closest base to the outfield, and is thus the shortest throw that an outfielder has to a base, if the runner standing on first is exceptionally fast or perhaps the fielder has a weak arm then perhaps tagging up is an advisable move, but as a whole, runners tagging up actually hurt teams enough that on average teams should prefer their player strike out to taking that chance.
With runners on first and third we see that with 0 outs the groundout and flyout are quite close in terms of run expectancy as a double play will still score the team a run, but with 1 out this difference becomes much larger, the same double play will end the inning.
In the case of 1 out and runners on first and third or the bases loaded, a flyout provides the least harm by far. This is a situation in which a player who can make consistent contact in the air has the best tendencies for the situation.
We now have an estimate for how much each type of out hurts a team in a given situation. Another question that could be asked is how much did each out hurt teams from 2018-2020? The frequency of each event in the 3 seasons by base/out state can be calculated using Retrosheet’s data, and since the amount it lowers a team’s run expectancy every time it happens is also known, we can compute the sum of expected runs lost by each team from 2018-2020. This is plotted below.
It should be no surprise that the most costly base/out states are those with no runners on base. This makes sense as the majority of plate appearances are taken in these states since there have only been 19 players who finished a season with an on-base percentage over 0.500 and nobody has finished their career reaching base at a rate over 0.500.
What is interesting from this graph is the base/out state where there is a runner on first base. We see from 2018-2020 MLB teams lost nearly 1000 runs to the groundout more than they did to the strikeout with nobody out, and a large number with one out.
“Never let the fear of striking out keep you from playing the game” – Babe Ruth
When Babe Ruth said this famous quotation, the idea of the marginal benefit between the different ways to make an out was obviously not what he had in mind. Nevertheless what we have shown is that if the occurrence of an out is certain, then it’s not always to the batting team’s benefit to avoid the strikeout. Of course, this analysis is baseball in a bubble. It is never certain that a batter will make an out before the plate appearance starts, and if such a batter existed, they would have no reason playing in the Major Leagues. In reality, the job of the batter is much more complicated than deciding what out costs his team the least in the given situation. The batter is at the plate to do something to produce runs for their team, and this happens often enough to create players who are undeniably better hitters than others.
What has been assumed for the sake of analysis, which is an incorrect assumption about the game of baseball is that when a ball is hit, it is known whether it is going to be a groundout or a flyout. This is simply not the case. What makes baseball exciting is that a groundout that isn’t fielded is no longer just that, it is either an error or a hit. The same happens for a flyball, if it isn’t caught it can be an error or a hit for any number of bases, in fact the difference between a flyout and a home run can come down to whether or not it travelled over the fence. Though the overwhelming majority of the time a strikeout is just that – a strikeout.
All of that is to say that there are things that make strikeouts and outs made by a fielder different. There is always a chance that the ball can find a gap between where the fielders are standing, allowing the batter a base hit or perhaps more. There is always a chance that the batter can make it to first base before the throw, “beating out a hit”. There is always a chance the fielder can make an error. So that’s not to say that striking out is some hidden strategy that nobody has ever thought up, and it’s certainly never the most entertaining play. In fact it’s not that simple because on any given play anything can happen. To summarize, sometimes a strikeout could have been something worse, but other times itis actually the least negative outcome.
[1] Marchi, M. Albert, J. Baumer, B. 2019. Analyzing Baseball Data with R: Second Edition. CRC Press.
Comments