Are Baseball Batting Lineups Optimal? (Part 2)
Part 1. Background
In Reference 1 we did a lot of permutations of baseball batting lineups to determine if the current, standard lineups were optimal. The essence of the analysis was the use of a realistic baseball simulation and pure permutations of batting orders to see if a specific team could do better than their standard lineup against an opposing team and pitcher.
The results looked extremely promising in that re-arranging the batter order based upon an opposing team’s starting pitcher could generate more wins. Or a higher probability of wins. Since baseball, as with all other sports, are subject to the weird outliers, the best we can do is to look at a longterm average and use that lineup.
Now let’s re-look at the results to reinforce if a change in batting order is good or bad.
Part 2. Batter Lineup
Recall that Reference 1 used a standard batting order based upon sabermetric batting orders (Ref. 2) to permute batting lineups and see if we could do better.
In the past, batter lineups have been determined by (Ref. 2):
Lineup Objective
1 Fastest player on the team; running ability is the most important quality
2 All about bat control; this player likely isn’t your best; they just need to make consistent contact and advance the leadoff baserunner before the big boys take the plate
3 Likely the best overall hitter on the team; highest batting average and very high RBI; should be able to drive in the leadoff hitter
4 The coveted clean-up spot; the most powerful hitter on the team; slugging ability is more important than ability to get on base or consistency
5 Good hitter with strong RBI numbers
6 Average hitter with reasonable RBI numbers
7 Average hitter; often the weakest baserunner on the team
8 Average hitter; ability to make contact is prioritized; backup for #2 spot
9 Worst hitter on the team; likely the pitcher in NL games (before the use of the designated hitter)
Is this the best way to approach an upcoming game, an upcoming pitcher, or even every game? Why not tailor the lineup for each team and pitcher that you are going to face?
Part 3: Analysis Methodology
Recall that the Python baseball simulator of Reference 3 was used to perform the previous study and this study. The previous study did some comparisons with the standard team batting orders and found that the baseball simulator, with the use of sabermetric rules, was able to duplicate win percentages to the third decimal point. In addition, the baseball simulator was able to predict improved batting order lineups to use against specific teams and pitchers.
Thus we took the results that we generated and plotted them in a different way to include the statistical parameters of the runs. In addition to averages we’ll be plotting plus/minus one standard deviations for each set of results. (With baseball and with many sports, you will get more than one standard deviation of results, but you can’t plan for that or you might never get a good solution.)
As described in Reference 1, we ran both the 2022 Los Angeles Dodgers against the 2022 Milwaukee Brewers and we ran the 2022 Pittsburgh Pirates against the 2022 Milwaukee Brewers. In the season the Dodgers had a 0.571 win percentage whereas the baseball simulator predicted 0.573. And during the 2022 season the Pirates had a 0.421 win percentage against the Brewers whereas the baseball simulator produced a 0.433 Pirates win percentage. Both of these numbers gave us confidence in the baseball simulator. In addition, the Pirates averaged 3.65 runs per game and baseball simulator yielded 3.86 runs per game. It’s pretty close considering exact pitching matchups were not duplicated. Nonetheless, these results give us confidence that the baseball simulator is a very good predictor of outcomes (on average). And is a good tool to use to study batting lineups and other decisions that a team must make.
The most common lineup for each team selected was used as the “zero run” or “index run” to use as a comparison with other lineups. From here the permutations of batting order begins.
Different simulations were made by permuting the first three batters in the lineup only, the first four batters only, the first five batters only, and the first six batters only. For each simulation with different batting lineups, two hundred games were played against each of the top five starting pitchers for the opposing team. This gives one the ability to see which batting lineup a team should use against a specific opposing pitcher. For these two hundred games, the permuting team’s starting pitcher followed the team’s normal rotation.
Part 4: Additional Results
Reference 1 produced a number of realistic and interesting results. Specifically it was able to duplicate the real season team matchup win percentages. In addition, it was able to generate batting orders for both the 2022 Dodgers and the 2022 Pirates that could significantly increase their overall win percentages.
The results presented herein expand on those results to include standard deviations for the averaged results as well as the impact of batting lineup on run production. We’ll focus only on the Pittsburgh Pirates since they are a cellar team in the central division of the national league and can use all of the tools that they can get to jump out of the cellar.
Figure 1 shows some results presented in Reference 1 to help you understand the methodology. The standard, typical lineup that the Pirates used is shown in the heavy black line and other permutations of the first six batters are shown in various colors. We neglect the “low game average” transients at the beginning of the figure and only take averages across the last 100 games (of the 200 games simulated). This gives the Pirates a chance to see each of the five Milwaukee starting pitchers a number of times and gives random matchups with Pirates’ starting pitchers. You can see that the Pirates win percentage, for any of the 720 permutations of batting order, tends to stabilize out to its own average level.
Figure 2 shows typical average results for the 3 permutation batting order along with standard deviations. Again the black dots show the standard Pirates’ lineup win percentage against each of the first five Brewers starting pitchers. The green dots show the average win percentage across all of the 3 batter permutatins (yielding six lineups) and the green dashes show the average plus and minus one standard deviation. One can look at these two sets of data and say that the typical Pirates’ batting order is pretty much an average batting order. The sabermetric rules for making the batting lineup produce a more or less average result. However, the blue symbols show the maximum found from the 3 permutation runs (avarage and plus/minus one standard deviation) and the red symbols show the minimum found. Simply by changing the batting order of the first three hitters from the standard lineup can produce a 0.100 change in winning percentage.
Figure 3 shows the typical average results for the 6 permutation batting order along with standard deviations. Recall that 6 permutations means calculating 720 different batting orders across 200 games for each of the opposing pitchers. The blue again represents the maximum averages (and standard deviations) across those 720 different batting orders. In this case, the Pirates can increase their win percentage by almost 0.200. Repeated across just the central division, this would change the Pirates from a 62 win, 100 loss team to an even 81 win, 81 loss team. And this does not account for the other 86 games against other opponents that they face.
Part 5. Next Steps (Similar to Part 1)
Current baseball lineups may follow accepted practices and rules, but they may not be optimal depending upon the opponent and the specific pitcher starting the game.
To determine an optimal batting order, you wouldn’t want to just run all of the permutations each time. It would be better to find a logical, algorithmic way to determine the batting order. And then determine the “rules” that the algorithm is using to set the lineup.
Thus, the next steps are to perform a moderate rewrite of the baseball simulator code and wrap an optimization routine or artificial intelligence algorithm around it. Currently the baseball simulator can run games but is not set up to be a fitness function for use with any sort of optimization algorithm. This rewrite is moderate because the baseball simulator was not written in a modular way and has global variables throughout (which can be difficult to interpret with an optimization routine). Once the rewrite is complete and debugged, a genetic algorithm or artificial intelligence algorithm can repeatedly call the baseball simulator for function evaluations in search of the best batting order. The genetic algorithm would be appropriate where there is no gradient information available. (It is, after all, a discrete integer optimization problem.) Regarding artificial intelligence, it is envisioned a deep learning algorithm or a Monte Carlo Tree Search algorithm would be most appropriate for this effort.
References
1. Manning, Ray, “Are Baseball Batting Orders Optimal?, Medium, https://medium.com/@ray-90807/are-baseball-batting-lineups-optimal-fa7c3a8dbf27.
2. Cwiklinski, Patrick, “How Sabermetrics Influence Batting Order Strategy”, 9 May 2024,https://www.sportsbettingdime.com/guides/strategy/batting-order-sabermetrics/.
3. Ryan, B., “Baseball-Simulator”, https://github.com/benryan03/baseball_simulator/, 2020.