Are Baseball Batting Lineups Optimal?

Ray Manning
11 min readJun 18, 2024

--

Part 1. Background

I read Sayar Banerjee’s article on the statistical basis at the heart of the book and the movie Moneyball as well as the entire baseball sabermetrics movement (Ref. 1). The main point of moneyball (i.e., baseball sabermetrics) is to determine ways to win Major League Baseball (MLB) games and get into the playoffs with a limited budget. And how to select undervalued players based upon the needs of the team.

Mr. Banerjee’s article gives some background of the moneyball push and explains the basis of the movie. He then gives his own statistical checks and Python code snippets to back up the numbers tossed around in the movie.

Banerjee’s article is fine about selecting players to trade for or to remove. But how do you select the batting lineup once you have a team selected?

Patrick Cwiklinski (Ref. 2) provided methods of how sabermetrics determined the batting order of a lineup with pre-selected players. Is it best? Can we do better?

The set of results derived from the following discussion was performed in support of a future project in order to provide background material for determining batter lineups. The future project involves utilizing a genetic algorithm or artificial intelligence methods to determine an optimal batting lineup against an opposing team and pitcher.

Part 2. Batter Lineup

In the past, batter lineups have been determined by (Ref. 2):

Lineup Objective

1 Fastest player on the team; running ability is the most important quality

2 All about bat control; this player likely isn’t your best; they just need to make consistent contact and advance the leadoff baserunner before the big boys take the plate

3 Likely the best overall hitter on the team; highest batting average and very high RBI; should be able to drive in the leadoff hitter

4 The coveted clean-up spot; the most powerful hitter on the team; slugging ability is more important than ability to get on base or consistency

5 Good hitter with strong RBI numbers

6 Average hitter with reasonable RBI numbers

7 Average hitter; often the weakest baserunner on the team

8 Average hitter; ability to make contact is prioritized; backup for #2 spot

9 Worst hitter on the team; likely the pitcher in NL games (before the use of the designated hitter)

Is this the best way to approach an upcoming game, an upcoming pitcher, or even every game? Why not tailor the lineup for each team and pitcher that you are going to face?

Or why not try some new things like Billy Martin pulling the batting order out of a hat on numerous occasions? (Ref. 3)

Part 2. Baseball Simulator

In order to try out different batting lineup strategies, a baseball simulation is needed. Reference 4 contains a Python “program that simulates a complete baseball game”. Two teams are chosen from specific years and the baseball simulator runs through the complete game, pitch-by-pitch, to try and duplicate the outcome of that game. The game is probabilistic so blowouts and upsets can occur, but on average, the simulator should replicate a team’s performance for a given year.

As Reference 4 states:

“For each at-bat, the program takes into account the batter’s batting average, the pitcher’s earned run average, and the pitcher’s pitch count to determine whether the batter or pitcher has an edge, and how big of an edge. A player with an edge over the other will be more likely to have a positive result for each pitch, and that likelihood increases with larger edge percentages.”

The code scrapes data from baseball-reference.com (Ref. 5) to get player performance data for batters and pitchers for a selected team and year. In theory one could play the 1968 Detroit Tigers against the 2020 Los Angeles Dodgers. But we’ll be restricting our lineup studies to teams and games within a single season.

In addition, the Python baseball simulator uses actual pitch distribution data accumulated over the last 25 years as described in Reference 6.

The basic code from Reference 4 was modified to run multiple games for averaging purposes and modified to allow the permutation of players in a batting order. After debugging and making sure that everything was working properly, the printing of summary data after each game, such as box scores, was commented out to cut down on runtime. (More on this later.)

Part 3: Analysis Methodology

The most common lineup for each team selected was taken from Reference 5 and used as the “zero run” or “index run” to use as a comparison with other lineups. From here the permutations of batting order begins.

Different simulations were made by permuting the first three batters in the lineup only, the first four batters only, the first five batters only, and the first six batters only. For each simulation with different batting lineups, two hundred games were played against each of the top five starting pitchers for the opposing team. This gives one the ability to see which batting lineup a team should use against a specific opposing pitcher. For these two hundred games, the permuting team’s starting pitcher followed the team’s normal rotation.

For this study we looked at trying to improve the batting lineup of two different teams against one opposing team. We tried to improve the batting lineup of the 2022 Los Angeles Dodgers against the 2022 Milwaukee Brewers as a test case. This was a good test case since the Dodgers finished with an overall 111 wins, 56 losses — a win percentage of 0.685, the best in baseball that season. We matched them with the Milwaukee Brewers with an overall 86 wins and 76 losses — 0.531 win percentage. Head-to-head, the Dodgers had 4 wins and 3 losses against the Brewers — a 0.571 win percentage. Could we find a lineup that would make the Dodgers win more games against the Brewers?

We also looked at trying to find the best batter lineup for the Pittsburgh Pirates. The Pirates had 62 wins and 100 losses to be among the worst in baseball. The Pirates went 8 and 11 against the Brewers that season for a win percentage of 0.421. Could we find a lineup that might help the Pirates get out of the cellar in their division?

For each of these studies, the total number of games simulated was calculated as

3 perms: 6 batting orders times 5 opposing Brewer pitchers times 200 games = 6000

4 perms: 24 batting orders times 5 opposing Brewer pitchers times 200 games = 24000

5 perms: 120 batting orders times 5 opposing Brewer pitchers times 200 games = 120000

6 perms: 720 batting orders times 5 opposing Brewer pitchers times 200 games = 720000

The 720000 games needed to run 720 different batting orders against 5 pitchers for 200 games took about 13 hours to run on an Intel Xeon v3 with 4 cores and 8 logical processors. Even though this is more than 900 games per minute, this was long enough and we decided to not continue with 7, 8 or 9 permutations as run times would be excessive. (As if they already were not.)

Part 4: 2022 Dodgers Results

The various Los Angeles Dodgers’ batting order permutations against each opposing pitcher were simulated as described above. A typical set of simulations for the 3 permutation case is shown in Figure 1. Note that we are only permuting the first three batters from a typical Dodgers’ lineup to give six curves.

Figure 1. Dodgers Complete Ensemble, 3 Permutations, Pitcher 1

The heavy black line represents the 200 games played with the typical Dodgers’ lineup against the first Milwaukee pitcher. After the initial “low game win percentage” transient dies out, the long term average for this lineup against this pitcher is right near 0.500. Which means each game is a toss up. An almost 0.100 win percent gain can be obtained using the light blue batting average. It doesn’t seem like much, but remember for this set of simulations we only permuted the first three batters.

Figure 2 shows the same type of results for the 6 six batter permutations, i.e., run all permutations of the first six batters in the lineup while leaving the last three alone. This figure shows the complete set of 720 batting lineups against Milwaukee pitcher 3 for 200 games each.

Figure 2. Dodgers’ Complete Ensemble, 6 Permutations, Pitcher 3

In this case the typical Dodgers’ batting order win percentage over the last 100 games is just above 0.500. Note that there are many batting orders that can produce averages over the last 100 games at least 0.100 to 0.200 percentage points higher.

Figure 3 shows some overall results. The baseline run, with a typical Dodgers’ batting lineup, produces an average win percentage over the last 100 games of 0.573. This compares well with the actual win percentage from 2022 of 0.571. The closeness of the simulation to the actual results gives us confidence in the baseball simulator rules and use of sabermetric statistics in deciding game outcomes. Three permutations across all 5 Milwaukee pitchers produces an average win percentage of 0.653.

Note also that as you change the batting order more, you get higher average win percentages. The six permutation average of 0.719 is greater than 5, 4, and 3 permutations. As you would expect, the more freedom at setting the batting order can find good combinations of hitters that work best.

Figure 3. Overall Dodgers’ Win Percentage with Lineup Permutations

Finally, Figure 4 shows the win percentage trends against Milwaukee pitcher 1 as you change more batters in the Dodgers’ batting order. Again the heavy black line shows the 0.490 win percentage of the Dodgers’ typical batting order against pitcher 1. As you increase the number of permutations you can find better lineups against pitcher 1. But you can also find worse batting orders against pitcher 1. A complete set of these trends for all five starting Milwaukee pitchers used is available upon request.

Figure 4. Ddgers’ Win Percentage Trend with Permutation Size

Part 5. 2022 Pirates Results

The same methodology and parameters used in the Dodgers’ batting order determination was used for the 2022 Pittsburgh Pirates. Recall that the Pirates were 8 and 11 against the 2022 Milwaukee Brewers for a win percentage of 0.421.

Baseline batting order using the baseball simulator gives an average of 0.433 compared with the actual of 0.421. Close enough to give us confidence that the baseball simulator is adequately predicting game outcomes.

Figure 5 shows the results of the six batting orders associated with permuting the first three players from a Pirates’ typical lineup. The heavy black line is the baseline Pirates’ lineup. In this case, there isn’t much of a difference when you can only change the first three batters.

Figure 5. Pirates’ Complete Ensemble, 3 Permutations, Pitcher 1

Figure 6 shows the six permutation runs against pitcher 4 of the Milwaukee Brewers. Again, this shows 720 traces for the six permutations across 200 games. For these 200 games, the Pirates’ starting pitcher followed a normal rotation.

In this case, permuting the first six Pirates’ batters can produce an average of 0.627 across the last 100 games of each simulation and across all five potential Milwaukee starting pitchers. If this were realized, then the Pirates would have won 12 of the 19 games against Milwaukee instead of the 8 that they actually won. This may not seem like much, but add this up for each division rival and the Pirates would be approaching an 81 win and 81 loss record (compared to the actual 62 and 100 record). That’s only using the best batting order for their own division.

Figure 6. Pirates’ Complete Emnsemble, 6 Permutations, Pitcher 4

Figure 7 shows a breakdown of the permutation runs for Pittsburgh against Milwaukee. The baseline runs from the baseball simulator produce a 0.433 win percentage compared with the actual 0.421. This again gives us confidence in the reasonableness of the baseball simulator. Again, in general, the more permutations of a batting order that you make can produce better win percentages or worse win percentages.

Figure 7. Overall Pirates’ Win Percentage with Lineup Permutations

Figure 8 shows the Pirates’ win percentage against Milwaukee pitcher 2 as you change more batters in the Pirates’ lineup. The heavy black line shows the baseline Pirates’ win percentage against pitcher 2 near 0.430. Increased permutations of the batting order can generate much better win percentages but also can generate worse ones. The important thing is that we know which of these batting orders improve the situation and which ones degrade it. Trends for the various permutations of Pirates’ batting order against all five of the Milwaukee pitchers are available upon request.

Figure 8. Pirates’ Win Percentage Trend with Permutation Size

Part 6. Next Steps

Current baseball lineups may follow accepted practices and rules, but they may not be optimal depending upon the opponent and the specific pitcher starting the game.

To determine an optimal batting order, you wouldn’t want to just run all of the permutations each time. It would be better to find a logical, algorithmic way to determine the batting order. And then determine the “rules” that the algorithm is using to set the lineup.

Thus, the next steps are to perform a moderate rewrite of the baseball simulator code and wrap an optimization routine or artificial intelligence algorithm around it. Currently the baseball simulator can run games but is not set up to be a fitness function for use with any sort of optimization algorithm. This rewrite is moderate because the baseball simulator was not written in a modular way and has global variables throughout (which can be difficult to interpret with an optimization routine). Once the rewrite is complete and debugged, a genetic algorithm or artificial intelligence algorithm can repeatedly call the baseball simulator for function evaluations in search of the best batting order. The genetic algorithm would be appropriate where there is no gradient information available. (It is, after all, a discrete integer optimization problem.) Regarding artificial intelligence, it is envisioned a deep learning algorithm or a Monte Carlo Tree Search algorithm would be most appropriate for this effort.

References

1. Banerjee, Sayar, “Linear Regression: Moneyball — Part 1”, 15 April 2018, https://towardsdatascience.com/linear-regression-moneyball-part-1-b93b3b9f5b53.

2. Cwiklinski, Patrick, “How Sabermetrics Influence Batting Order Strategy”, 9 May 2024,https://www.sportsbettingdime.com/guides/strategy/batting-order-sabermetrics/.

3. Weinreb, M., “Throwback Thursday: Billy Martin Picks the New York Tankees’ Lineup Out of a Hat”, Vice, 21 April 2016, https://www.vice.com/en/article/qky8m7/throwback-thursday-billy-martin-picks-the-new-york-yankees-lineup-out-of-a-hat.

4. Ryan, B., “Baseball-Simulator”, https://github.com/benryan03/baseball_simulator/, 2020.

5. “Baseball Reference”, https://www.baseball-reference.com”.

6. Bouldin, J., “Pitch Outcome Distribution Over 25 Years”, https://www.baseball-fever.com/forum/general-baseball/statistics-analysis-sabermetrics/81427-pitch-outcome-distribution-over-25-years.

--

--