Predicting the Outcome of NFL Games (Other Methods)

Ray Manning
6 min readJun 25, 2018

Part 1. Background

I previously had written about applying neural networks to predicting the outcomes of National Football League (NFL) games. After some trial and error and learning, I managed to derive neural networks that could quickly be trained and could predict the outcomes of greater than 75% of the games to within 1 or 2 points. I considered that a success and started looking at other methods to predict the outcome of these games.

As a side note, I was only using NFL game statistics and outcomes as an experiment. My main goal wasn’t to go beat the Las Vegas spread. I was more interested in developing generic methods that I could apply to derive outcomes from known input statistics. I was most interested in gearing up for being able to predict child abuse cases or who was susceptible to human trafficking based upon known demographic and economic statistics. But that data is harder to come by, so I started the “generic tool development effort” with NFL football statistics and outcomes — both of which are widely known to a high level of confidence.

Thus I’ll still be looking at NFL statistics and trying to predict the outcome of games and confidence levels with other methods of analysis. The dataset that I’ll start with is the set of team statistics and points differentials for each game for the 2017 season which was completed about six months ago. A total of 256 games are available and 58 total team statistics are available.

Part 2. Basics

Everyone whose taken a basic statistics class thinks, “Oh let me just compute the correlation between some statistics and the outcomes and I’ll have the problem solved”. That’s certainly the first step in our journey, so Figure 1 shows the relationship between the visiting team’s points differential and the visiting team’s passer rating.

For example, Reference 1 entitled “NFL quarterback passer rating is best predictor of wins”, goes on to state

You can directly correlate a team’s likelihood to win a game based solely on the performance of the quarterback. Wins and losses correlate almost perfectly with passer rating. Of course with any data set you will have outliers, but there is no dismissing passer rating as a prime, if not THE PRIME predictor of team victory.

Yet if you perform a pure linear statistical correlation with win points differential you get a correlation of about 0.56 for the 2017 season. Yes, it is a strong correlator of success in a game, but it doesn’t “correlate almost perfectly” with results.

Figure 1 shows the linear correlation between a team’s margin of victory and passer rating. The red dots are each individual game from the 2017 season and the blue line is the linear correlation line with a correlation coefficient of 0.56. (Just ignore the orange dots for now.) This isn’t a perfect correlation at all and, in fact, you can see that a passer rating near 120 could yield a win of 25 points or a loss of 10 points. In fact, passer ratings as high as 134 yielded a ten point loss and passer ratings as low as 60 yielded a 6 point win. (This is ignoring the far outlier of a passer rating of 48 yielding a 21 point win!)

Figure 1. Correlation Between Passer Rating and Margin of Victory

One could always argue that you need to combine the passer rating with other statistics to obtain the final outcome of a football game. I wholeheartedly agree. But I do not know yet the coefficients of the combinations that need to yield a “prefect correlation”.

Part 3: Experiment Time

I initially set out thinking that an n-dimensional ellipsoid, with n set at the number of statistics available, could yield a good representation of the outcome of a game (or any other outcome that one is trying to model whether it is related to child abuse, economic performance, or stock price). The mathematics of an n-dimensional ellipsoid (with two axes of size, n components of center, and no components of angular orientation) proved mathematically daunting.

Thus I set out to derive n 2-dimensonal ellipses where n is still the number of statistics available. These are the orange dots back in Figure 1. For the quarterback’s passer rating, you derive the minimum area ellipse that envelopes the data. From Figure 1 you can see a few of the game results (shown as red dots) land exactly on the ellipse points (shown in orange). This was a good indicator that I had the minimum area ellipse that captured the data.

In addition, as shown in Figure 2, you can provide a minimum area ellipse that captures “most” of the data and relies on the red actual game response data from some games being outliers from the model. Thus you are capturing 90% or so of the games with the allowance of “freak games”.

Figure 2. Two Dimensional Statistical Ellipse Allowing Outliers

If you think about combining the ellipse shown in Figure 2 along with other statistics, you might get to a point where multi-valued game results (i.e., 20 point wins and 10 point losses corresponding to the same quarterback rating) can be resolved via other dimensions in the statistical space. For example, did a team happen to have a punt return and interception return for touchdowns which automatically gives a 14 point swing in a game’s outcome?

I moved along to Gaussian mixture models where a probabilistic model assumes that all data is generated from a mixture of Gaussian distributions with unknown parameters. It is very similar to an n-means clustering model to incorporate the covariance measurements of the data and the centers of the Gaussian distributions for each parameter (or statistic).

The Gaussian mixture models tend to tell you which data points (i.e., game outcomes) are outliers. Figure 3 shows the results of running a Gaussian mixture model on a single statistic for an NFL football game, namely, the quarterback passer rating.

Figure 3. Gaussian Mixture Model

In Figure 3 the results are separated according to the color (representing the mean points differential of the ellipse) and the size of each individual game dot (with a larger size representing a higher score differential). The Gaussian model tends to cluster a number of outcomes together based upon the statistic shown (i,e., visitor passer rating) and also shows the outliers (i.e., game results represented as dots NOT enclosed within an ellipse).

Obviously there is a trend up and to the right as a higher quarterback rating represents a better points differential result. Yet there are a number of outlier games as seen by the quarterback rating near 115 with a -25 point game result or a quarterback rating near 48 with a +20 game result.

These results play into the thought that the result of a National Football League game is not a single dimensional (or single statistical) result. There are other factors involved that need to be combined to yield the overall result.

Part 4. Next Steps

We have developed some additional tools, besides neural networks, that can help explain the results of National Football League game outcomes based upon a team’s statistical performance.

The next steps include:

1. How do you combine the results of individual statistical graphs to yield an overall outcome?

2. Is this technique “generalizable” to other unrelated fields such as the stock price of a company based upon company sales/employment statistics, the probability of a child’s injury being child abuse based upon their interview/medical records, and the probability that a “sex worker” has been trafficked based upon their interviews and other relevant inputs.


1. “Rodgers, Luke, “NFL quarterback passer rating is best predictor of wins”, 247 sports, 18 November 2015,