Sunday, January 1, 2023

Chase Rate Over Expected

Introduction

Over the past few years, interest in evaluating individual pitches using Statcast data has increased quite a bit. These models, especially the ones created by Eno Sarris and Cameron Grove, have become very popular and are commonly quoted when debating how good a pitcher is relative to others. However, while grading individual pitches and averaging to give an overall score for a pitcher’s arsenal is common, there has been relatively little work done on using individual pitch grades to evaluate batter performance. In this article, I want to evaluate a hitter’s plate discipline by comparing their chase rate against what the expected chase rate is based on pitch quality derived by Statcast data, with the difference being called “Chase Rate Over Expected”.

Methodology

The dataset I used consisted of all pitches thrown in the “chase zone” from 2018 through 2022. The Statcast data provided by Baseball Savant does not directly say if a pitch was in the “chase zone,” so I manually calculated it based on the dimensions defined by Tom Tango (cited below).

To model the probability of a pitch in the chase zone getting swung at, I used a generalized additive model (GAM), with a mix of smooth and non-smooth terms. The smooth terms are an interaction term of vertical and horizontal location and an interaction term for vertical and horizontal movement of the pitch, and the non-smooth terms are speed, spin rate, extension, the count, and a binary term that is 1 if the batter and pitcher are the same handedness, and 0 if not.

Results

Who was in the top 5 in chase rate over expected (CROE) last year, minimum 100 pitches seen in the chase zone.

Top 5

Name

Chase

xChase

CROE

Austin Barnes

11.1%

28.1%

-17.0%

Cavan Biggio

10.9%

26.7%

-15.8%

Max Muncy

12.4%

27.8%

-15.4%

Sam Hilliard

14.6%

28.2%

-13.7%

Triston Casas

11.5%

25.2%

-13.7%

 

Bottom 5

Name

Chase

xChase

CROE

Francisco Mejia

54.9%

27.1%

27.8%

Javier Baez

51.3%

25.5%

25.8%

Oscar Gonzalez

49.1%

25.7%

23.3%

Hanser Alberto

48.1%

25.6%

22.5%

Harold Ramirez

48.5%

27.5%

21.0%

 

How stable is CROE year over year? Regressing CROE from the past year to the next year, we get an R^2 of 0.58, which is a decent amount.

How stable is chase rate year over year? Fairly stable, but regressing previous year chase rate against the next season’s chase rate has a lower R^2 than CROE year over year.

Finally, can previous CROE be used to forecast next season’s chase rate better than chase rate from last year? This regression has a slightly higher R^2 than the one that uses prior season chase rate, indicating that CROE does better in forecasting chase rate than previous season chase rate.


Is it fair to give all of the credit to the batter?

One of the implicit assumptions with this methodology is that I am giving all of the credit for the residuals to the batter. If a batter is expected to chase 20% of the time and he only chases 17%, that 3% improvement is all due to the batter’s skill. This is a strong assumption. While I do not think the batter should receive 100% of the credit, because it is impossible to be that correct and the underlying xChase model has natural uncertainty, I do think the batter should receive most of it. The two main factors that the pitcher has control over, which are deception and tunneling, apply to relatively few pitchers that I did not think would drastically affect any one hitter’s rating. However, there are some interesting articles on the drawbacks of “over-expected” ratings in football, and I would encourage readers to check them out. I linked to Robby Greer’s post below.

Conclusion

Overall, I think CROE is a good first step in evaluating hitter’s swing decisions based on the quality of the pitches seen. In the future, being able to control for a hitter’s swing path in evaluating swing decisions for all pitches against certain pitch characteristics would be an interesting thing to study. For instance, it may make sense for a hitter with a flat bat path to swing at more pitches up in the zone than a hitter with a steep bat path. In general, using pitch grade models to not just evaluate pitchers but hitters as well is an important next step in the Statcast era of sabermetrics.

Tom Tango post on defining “Chase Zone”

http://tangotiger.net/strikezone/zone%20chart.png

Robby Green post on “Over Expected” metrics

https://www.nfeloapp.com/analysis/over-expected-explained-what-are-cpoe-ryoe-and-yacoe/