Implementation and validation of a probabilistic open source baseball engine (POSBE): Modeling hitters and pitchers

Rhett Tracy Schaefer, Purdue University

Abstract

This manuscript details the implementation and validation of an open source probabilistic baseball engine (POSBE) that focuses on the hitter and pitcher model of the simulation. The simulation produced outcomes that parallel those observed in actual professional Major League Baseball games. The observed data were taken from the nineteen games played between the New York Yankees (NYY) and Boston Red Sox (BOS) during the 2015 season. The potential hitter/pitcher outcomes of interest were singles, doubles, triples, homeruns, walks, hit-by-pitch, and strikeouts. The nineteen game series was simulated 1000 times, resulting in a total of 19,000 simulations. The eighteen hitters and twenty-seven pitchers were each divided into four groups based on similar characteristics: Hitters 1-5 in the batting order, Hitters 6-9 in the batting order, Starting Pitchers, and Relief Pitchers. Using the Kolmogorov-Smirnov test, the simulated data were compared against the observed data to obtain appropriate p-values. The calculated p-values were all greater than 0.05 indicating that the POSBE algorithm predicts hitter and pitcher outcomes as they relate to empirical observation.

Degree

M.S.

Advisors

Whittinghill, Purdue University.

Subject Area

Sports Management|Statistics|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS