College Football by the Numbers: A Methodology of the Matrix

I've described aspects of the Matrix as it has evolved, but I think its about time that I give it one coherent description for anyone interested.

The Matrix uses three ratings- a general performance rating based on margin of victory (which is used for rankings), a recent performance rating, and a win/loss rating. The general rating and win/loss rating are calculated with a progressive adjustment model derived from the Elo chess rating system. Ratings are adjusted according to the improbability that a given outcome would occur. The model simulates the season a few hundred times, allowing smaller adjustments with each round, until, through automated trial and error, it arrives at the ratings associated with the least improbability.

For both the general performance rating and win/loss rating, the model assumes that a team's performance will vary and the probability of a particular performance level will fall somewhere on the normal curve. The ratings, therefore, theoretically represent the mean. The larger the point margin, the less effect an additional point will have on ratings, so the effect of "running up the score" is minimal. When estimating the improbability of an event, the model barely differentiates an 18 point win and a 40 point win.

The win/loss rating, obviously, uses only wins and losses and ignore the margin of victory. The factor actually has very little effect on the outcome of model, but I have included it for the sake of comprehensiveness. For the most part, close games really are primarily by luck, and so it is best that the model does not overemphasize the winning of the game. Because the model uses a marginal progressive adjustment method, it is able to handle undefeated teams without the problems faced by MLE approaches.

After the general and win/loss ratings have been calculated, a recent performance ratings is calculated using the deviation of a teams margin of victory from the expected margin of victory. Obviously, greater weight is given to more recent games.

The final component of the Matrix are the Navy adjustment factors. Essentially, these factors compare a team's opponent against past opponent in terms of its relative dependency on the pass and run and then adjusts the expected outcome to match any advantages or disadvantages a team may experience in match-ups. For example, if a team has plays terrible pass defense and now has to play Texas Tech, it should be expected to under-perform relative to its general, recent and win/loss performance ratings.

The general performance rating, win/loss rating, recent performance rating, and Navy adjustment factors are then weighted and used to estimate the margin of victory (along with an adjustment for home field advantage). Finally, I use a consistency rating (how predictable a team's performance has been) to estimate the probability of a suggested outcome (of a team winning or covering the spread).

Results:

These results are only relevant for the results before week 11, 2007.

Top 5 overall:
1. Ohio State
2. Oregon
3. West Virginia
4. LSU
5. Missouri

(Note: After the OSU lost and WVU struggled against Louisville, Oregon has taken the top spot and Oklahoma and Kansas have moved into the top 5)

Oklahoma fans might see a problem that Missouri is ranked higher than their own Sooners. This is a good example, though, where the model has punished Oklahoma more for the greater improbability of their loss to Colorado. Because both teams have only one loss and Missouri loss to a better team than Colorado (who just happens to be Oklahoma), Missouri is ranked higher. Oklahoma is 6th and only 2/10's of a point behind the Tigers.

Top 5 Win/Loss
1. Ohio State
2. Kansas
3. Hawaii
4. LSU
4. Oklahoma
4. Arizona State

Obviously, a win/loss rating should give extra kudos to undefeated teams. The three-way tie for 4th is a bit of an anomaly, but here the Sooners have the advantage over Missouri.

Top 5 Consistency
1. Kansas
2. Florida International
3. Utah State
4. Arizona State
5. Ohio State

Two types of teams find themselves among the most consistent. The surprisingly successful teams that just seem to win every week and the really, really bad teams that will always play poorly against D1A competition. I thought it was interesting that Kansas has been the most consistent team this season and they are 9-0 against the spread this year.

The five most unpredictable teams -
1. UCLA
2. Utah
3. Central Michigan
4. Iowa State
5. UNLV

Fitting.

Navy adjustment factor:

You can't produce a ranking from the adjustment factor, but we can guess which teams are going to have a tough match-up this weekend. The team most likely to get unusually lit up through the air this week was, coincidentally, Navy who gave up almost 500 passing yards and 62 points in a winning effort against the 1-7 (now 1-8) Mean Green of North Texas.

Recent Performance:

Again, it doesn't make much sense to rank teams on their recent performances, because it is relative to their general performance, but the hottest team going into this weekend was Iowa State (relative to their performance all season). Unfortunately for Boston College, another very hot team is Clemson - and a cold team is, well, BC.

When dealing with all these factors, I think it is important to consider their relative importance. The Matrix has the power to explain about 65% of the variance of point margins for games involving D1A teams this season. About 61% is explained by the general performance rating alone and the other 4% by the other adjustment factors and ratings. The win/loss rating barely makes an appearance, and is really just included so the model can be comprehensive and "hybrid," which is such a popular term is sports rating these days.

The model is still somewhat fluid as I make minor adjustments to deal with problems as they arise, but these are the general principles on which it is based. I will continue to publish rankings and predictions, and I will add other stats - consistency, recent performance, match-up warnings, unexpected results, etc.

P.S. according to the Matrix, the most unlikely outcome involving two D1A teams was Notre Dame over UNLV and #2 was UNLV over Utah.

1 Comment:

matt said...: Very nice blog. I'm curious as to how the matrix has fared picking winners against the spread. How does it fare against significantly larger spreads (20+ points). It would seem logical to me that consistent teams (Kansas) can cover those large spreads because well, they are so consistent. Perhaps an inconsistent team that is a very large favorite, would be a good bet to not cover?

Also concerning the inconsistecy rating, I am not surrised that UCLA, Utah, and UNLV are in the top 5 (how did UNLV beat Utah 27-0?!). What about Central Michigan? It seems they have been pretty consistent in conference play (undefeated) and pretty consistent in nonconference play (lost every game save Army). Do your ratings for consistency adjust for strenght of opposition? That may explain some of Central Michigan's inconsistency.; November 12, 2007 at 10:02 AM

College Football by the Numbers

Sunday, November 11, 2007

A Methodology of the Matrix

1 Comment:

Post a Comment

Past Posts

Popular Posts

CFBTN Links