top of page
Search
Writer's pictureMSC

A Dive into Baseball Statistics: Part 1


I thought it might be fun to talk about some issues with Major League Baseball’s most famous statistics. Why would I think this would be fun? I like baseball far too much. More importantly for you, why would you find this even remotely fun? Maybe because you miss baseball. But chances are, your favourite thing about this article will be that it doesn’t have anything to do with William Nylander. Either way, you’re still reading and that’s good enough for me.


One of the most prestigious individual accomplishments in baseball for a hitter is winning the Triple Crown, which means having the highest batting average, most home runs and highest RBI total in their league in a season. Miguel Cabrera won the Triple Crown in 2012, and before that, the last hitter to accomplish it was Carl Yastrzemski in 1967. While it is certainly an impressive accomplishment, the three statistics that make up the Triple Crown are, while commonly used, not very good at describing how valuable a hitter actually is.


Miguel Cabrera: Triple Crown winner, 2012


Homeruns:


Let’s start by looking at the easy one, home runs. Home runs are exciting, and furthermore, almost every time that a batter hits one, it seriously affects the outcome of the game. Every run in the final game of this past World Series was scored via a home run. Yet, home runs alone cannot tell you how valuable a hitter is, because, even for the most prolific home run hitters, most plate appearances do not end in home runs.


I’m going to give you guys an example, but before I do, I feel obligated to give you all a warning. I’m going to be a bit mean to beloved former Blue Jay Edwin Encarnacion in this next bit, and that might get some of you feeling a little defensive. So before I go any further, I want to remind you all that he signed with the Indians for 3 years and 60 million dollars after turning down the Jays’ offer of 4 years/$80M. So really, I’m standing up for you guys. I’m not the bad guy here.


With that out of the way: in 2018, Mookie Betts and Edwin Encarnacion both hit 32 home runs. If that was the only stat that you knew about both players, you might think that they had similar seasons in terms of the value that they provided to their teams. Here are some of their other stats from this past season:



In 2018, Mookie Betts had a batting average that was 100 points higher than Encarnacion’s. Betts struck out 41 fewer times than Encarnacion. Betts got on base in 43.8% of his plate appearances, a rate more than ten percent higher than Encarnacion’s 33.6%. Betts stole 30 bases, Encarnacion stole 3. These were not equivalent seasons. Home runs might give a good indication of how well a player can hit for power, but offer no information about their contact, ability to reach base or speed. So while home runs are incredibly impactful, and their value must be weighed when determining how valuable a player is, they are only one piece of that puzzle.


Mookie might have struck out fewer times, but in terms of racking up strikes, he has the lead if you count his career bowling statistics.


RBI:


Next, let’s consider the run batted in, or RBI, the most flawed hitting stat in baseball. Quick refresher: A hitter records an RBI when, as a result of the outcome of his plate appearance, a run is scored. The only exceptions are when the batter reaches base on an error or the batter hits into a double play, where it is assumed that the defense allowed the run to score in exchange for the extra out. The issue with RBIs is that the batter is reliant on his team to get on base to provide him with opportunities to get RBIs. A leadoff hitter will record fewer RBIs than the batter hitting third in the order, because the leadoff hitter always comes up to bat with the bases empty in his first at bat, and afterwards, the hitters that precede him are the ones at the bottom of the lineup, who tend to be the weakest hitters. A batter in the third spot in the order, however, has the luxury of having two of the team’s best hitters preceding them, so they will come up to bat with runners on base much more often. A hitter cannot control how often the players who hit immediately before him in the lineup get on base, so measuring how often they drive in runs is not an accurate measure of their true hitting talent.


Coming back to our previous comparison, Edwin Encarnacion had the ninth highest RBI total in baseball in 2018; Mookie Betts had the 49th highest. Encarnacion also usually hit fourth in the order, behind both leadoff hitter Francisco Lindor and José Ramirez, who usually hit third. Lindor and Ramirez are two of the top 15 position players in major league baseball. Mookie Betts, on the other hand, hit after the eighth and ninth hitters in the Red Sox order, who usually struggled to get on base. In any case, I promise I’m done picking on Encarnacion from here on out, since it’s starting to feel a bit mean-spirited.


The RBI stat has way too many external factors and moving parts built into it to be a useful evaluation of a hitter’s overall talent. If you are looking for a stat to specifically describe how good a hitter has been at driving in runners who are on base, it probably makes more sense to look at their batting average with runner in scoring position (which is to say on second or third base). The reason is that since these are rate stats, they are not influenced by the frequency with which the hitter finds themselves in those situations, but rather only their performance when they do come up to the plate with runners on second and/or third. Just keep in mind you will be dealing with smaller sample sizes.


Batting Average:

Finally, let’s take a look at batting average, the most prevalent hitting statistic in baseball. Batting average has two serious shortcomings in its ability to describe the overall value of a hitter: it does not account for walks, and it weighs each hit equally, whether it’s a single or a home run. Batting average is popular because it’s intuitive: it answers the question of “what percentage of the time does this player record a hit rather than an out?”, which is probably the most useful statistic to be aware of if you are watching a baseball game, since most viewers want a general idea of the chances of a hit occuring in the current at-bat. This is especially true with runners in scoring position, as mentioned previously. However, in terms of quantifying a player’s value as a hitter, we could probably do a bit better, at the cost of some of the intuitive sense of what the statistic means.


First, let’s consider each of the two problems in isolation. In order to account for how often a player reaches base, via hits or other means, most commonly walks and hit-by-pitches, we can use on-base percentage, or OBP. A player’s OBP represents what percentage of their plate appearances result in them reaching base, and it better rewards players who don’t get many pitches in the strike zone to swing at, and use their plate discipline to turn that into walks.


The other players who we need to consider are power hitters, who are not given extra credit for the fact that a higher percentage of their hits are normally doubles and home runs when calculating batting average. The solution is to use slugging percentage (SLG), a stat that works exactly like batting average except that whereas batting average considers hits, slugging uses the total bases of each hit. So whereas when calculating batting average, a single and a home run would each count as one hit, for slugging percentage they would count for one and four bases respectively.

This means that while a perfect batting average is 1.000, a perfect slugging percentage is 4.000. A player’s slugging percentage tells you how many bases they get in each at-bat on average (to be clear, this only includes the bases they reach on their own hit, not any bases they might advance to later in the inning). If you would prefer to consider how many bases a player collects on each hit, you can divide their slugging percentage by their batting average. So a player with a .450 SLG and a .300 batting average would get 1.5 bases per hit.

Combined:

So now we have two stats, OBP and SLG, which, when combined with batting average, can give us a much better impression of a player’s overall hitting talent than batting average alone. However, OBP and SLG each only solve one of the two issues with batting average, which is why I want to introduce one final stat, OPS, which stands for on-base plus slugging. It is simply the sum of a hitter’s OBP and SLG, so it rewards hitters for both their ability to reach base and their power, making it the best statistic to assess a player’s overall hitting skill that we’ve discussed in this article. OPS’ biggest drawback is that it’s quite abstract: while it is easy to look at a player’s batting average and know what it means about the player, looking at a player’s OPS does not intuitively give you a sense of how that player’s at-bat might end. Here’s a trick for evaluating OPS: treat OPS like school grades.


This method is not perfect (average OPS is more like .700-.766), but it gives you a general idea of how good a player’s OPS is compared to the league as a whole and is easy to apply at a glance.


To make this all more concrete, I’m going to compare the 2017 seasons of two hitters. Let me introduce you to Miami’s Dee Gordon and Texas’ Joey Gallo. Gordon collects a lot of hits, but he doesn’t draw many walks and he doesn’t hit the ball very hard, so almost all of his hits are singles (with one notable exception). Gordon is the type of player whose batting average makes him look way better than he is. Gallo is the opposite: he doesn’t get too many hits, but an astonishingly high percentage of them are home runs. Because of this, pitchers try to avoid giving him pitches in the strike zone, so he provides the rest of his hitting value through taking walks. Therefore, both his OBP and SLG are going to look way more impressive than his batting average. Here is a table of both players’ relevant statistics from 2017:


Gordon has almost a 100 point lead in batting average, and yet when you look at their on-base percentage, Gallo reaches base almost as frequently, a much narrower difference than the batting average might suggest thanks to Gallo taking more walks. Furthermore, Gallo has a sizable lead in slugging percentage, collecting more than two and a half bases per hit, whereas Gordon averages only about one and a quarter bases per hit. All together, Gallo has a significant lead in OPS despite a batting average that is roughly only two thirds of Gordon’s. By our grade scale, Gallo is a well above average hitter, whereas Gordon is probably a slightly below average hitter, despite sporting a batting average above .300. The takeaway here is that if you want to look at a single stat to assess hitting talent, the best choice on the table above would be OPS. However, looking at a player’s batting average, OBP and SLG together gives you the best idea of the player’s specific strengths and weaknesses as a hitter.


Gallo has quite the power stroke. Could he be a blueline presence the Maple Leafs are looking at trading for to anchor the power play?


I hope that this article has shed some light on some of the serious shortcomings of traditional baseball statistics. Batting average, home runs and runs batted in are still valuable statistics that do a good job of describing what a player has accomplished in a season, but they largely fail to accurately describe a player’s overall value as a hitter when considered in isolation. In the next couple weeks, I’m going to write a followup piece introducing some useful advanced statistics, where I’ll argue that despite Miguel Cabrera being the first player in 45 years to win the Triple Crown, he was a less valuable player than Mike Trout that year, and as such Cabrera should not have won an MVP. Or maybe I’ll just speculate about which NHL teams are most in need of a first line winger.


All statistics were retrieved from Fangraphs.com


By: Thomas Nachshen and Khashayar Akbari

29 views0 comments

Recent Posts

See All

Opmerkingen


bottom of page