Wednesday, March 23, 2022

Stacking Models-Properly Weighing Scout Opinion with an Analytical Model

As someone who does a good amount of programming and math, but also spends a lot of time scouting players in person, I think a lot about how the two skills interact. The best mental model I have come up with so far is to think of each group as an element of a stacking algorithm, a common technique in machine learning that answers the question: “Given multiple machine learning models that are skillful on a problem, but in different ways, how do you choose which model to use (trust)?” (Machine Learning Mastery). I think this best sums up the question at hand.

For stacking algorithms to work the best, you want to combine models that have skill in prediction but produce results that are mildly uncorrelated with each other. In this framework, the two models we have are the scouting reports and a statistical model. Both have predictive power, but scouts tend to be more optimistic on toolsy players, whereas models tend to prefer safer players. A stacking algorithm finds value in both and figures out what weight to give each model. Even if a model has a better overall track record of picking major leaguers than a scout, if the model has blind spots that the scout is able to cover, stacking will pick up on that and return a projection that accounts for said blind spot. It is a mutually beneficial relationship, and one I think about when scouting live. If I know a model is good with projecting tool X, but struggles with tool Y, I will spend my time at the park prioritizing tool Y so that I can provide value. I know that if there is a proper stacking model in place, both sides will be valued properly.

It's not groundbreaking or controversial that eye test opinion and model predictions should both be listened too. Where I think this conversation at times loses nuance is that there is a default assumption that we should weigh each group equally. This is a bad idea and shows a lack of understanding of each group’s skillset. If I know I struggle with a certain demographic, I do not want my opinion weighed equally with a more skilled predictor. On the flip side, if I have a history of projecting something well, I want it to be weighed more heavily overall. Having a commitment to back testing and knowing what your scouting, player development, and analytical groups do well and what they struggle with is an important exercise to do to make sure everyone feels validated and to make the best possible prediction, even if it is difficult.

Further Reading:

https://machinelearningmastery.com/stacking-ensemble-machine-learning-with-python/

No comments:

Post a Comment