Introduction
Over the past few years, interest in evaluating individual
pitches using Statcast data has increased quite a bit. These models, especially
the ones created by Eno Sarris and Cameron Grove, have become very popular and
are commonly quoted when debating how good a pitcher is relative to others.
However, while grading individual pitches and averaging to give an overall
score for a pitcher’s arsenal is common, there has been relatively little work
done on using individual pitch grades to evaluate batter performance. In this
article, I want to evaluate a hitter’s plate discipline by comparing their
chase rate against what the expected chase rate is based on pitch quality
derived by Statcast data, with the difference being called “Chase Rate Over
Expected”.
Methodology
The dataset I used consisted of all pitches thrown in the
“chase zone” from 2018 through 2022. The Statcast data provided by Baseball
Savant does not directly say if a pitch was in the “chase zone,” so I manually calculated
it based on the dimensions defined by Tom Tango (cited below).
To model the probability of a pitch in the chase zone
getting swung at, I used a generalized additive model (GAM), with a mix of smooth
and non-smooth terms. The smooth terms are an interaction term of vertical and
horizontal location and an interaction term for vertical and horizontal
movement of the pitch, and the non-smooth terms are speed, spin rate,
extension, the count, and a binary term that is 1 if the batter and pitcher are
the same handedness, and 0 if not.
Results
Who was in the top 5 in chase rate over expected (CROE) last
year, minimum 100 pitches seen in the chase zone.
Top 5
Name |
Chase |
xChase |
CROE |
Austin Barnes |
11.1% |
28.1% |
-17.0% |
Cavan Biggio |
10.9% |
26.7% |
-15.8% |
Max Muncy |
12.4% |
27.8% |
-15.4% |
Sam Hilliard |
14.6% |
28.2% |
-13.7% |
Triston Casas |
11.5% |
25.2% |
-13.7% |
Bottom 5
Name |
Chase |
xChase |
CROE |
Francisco Mejia |
54.9% |
27.1% |
27.8% |
Javier Baez |
51.3% |
25.5% |
25.8% |
Oscar Gonzalez |
49.1% |
25.7% |
23.3% |
Hanser Alberto |
48.1% |
25.6% |
22.5% |
Harold Ramirez |
48.5% |
27.5% |
21.0% |
How stable is CROE year over year? Regressing CROE from the
past year to the next year, we get an R^2 of 0.58, which is a decent amount.
How stable is chase rate year over year? Fairly stable, but regressing previous year chase rate against the next season’s chase rate has a lower R^2 than CROE year over year.
Finally, can previous CROE be used to forecast next season’s chase rate better than chase rate from last year? This regression has a slightly higher R^2 than the one that uses prior season chase rate, indicating that CROE does better in forecasting chase rate than previous season chase rate.
One of the implicit assumptions with this methodology is
that I am giving all of the credit for the residuals to the batter. If a batter
is expected to chase 20% of the time and he only chases 17%, that 3%
improvement is all due to the batter’s skill. This is a strong assumption.
While I do not think the batter should receive 100% of the credit, because it
is impossible to be that correct and the underlying xChase model has natural uncertainty, I do think the batter should receive most of
it. The two main factors that the pitcher has control over, which are deception
and tunneling, apply to relatively few pitchers that I did not think would
drastically affect any one hitter’s rating. However, there are some interesting
articles on the drawbacks of “over-expected” ratings in football, and I would
encourage readers to check them out. I linked to Robby Greer’s post below.
Conclusion
Overall, I think CROE is a good first step in evaluating
hitter’s swing decisions based on the quality of the pitches seen. In the
future, being able to control for a hitter’s swing path in evaluating swing
decisions for all pitches against certain pitch characteristics would be an
interesting thing to study. For instance, it may make sense for a hitter with a
flat bat path to swing at more pitches up in the zone than a hitter with a
steep bat path. In general, using pitch grade models to not just evaluate
pitchers but hitters as well is an important next step in the Statcast era of
sabermetrics.
Tom Tango post on defining “Chase Zone”
http://tangotiger.net/strikezone/zone%20chart.png
Robby Green post on “Over Expected” metrics
https://www.nfeloapp.com/analysis/over-expected-explained-what-are-cpoe-ryoe-and-yacoe/