Individual: Stats | Heisman | Fantasy    Team: Rank | Rank2 | Summary | Picks | Pick All | Champs    Conf: Rank | Standings | VS. | [?]

Wednesday, October 31, 2007

The Matrix - Week 10 Predictions

I feel like I've got some really interesting numbers on my last blog, and suggest that visitors give it a look, but I'm pretty stoked about this blog, too. This week I am presenting the Matrix, the culmination of my experiment into college football prediction models (plus whatever refinements I might want to make later).

A quick overview of the matrix. It uses two general ratings--play throughout the season and play weighted by the last games. To find the best ratings, I use an automated trial and error system that runs the teams through the season a few hundred times to find the best fit. I then adjust these two ratings for match ups in the game--offensive and defensive play against the run and pass and home field advantage.

Before we get to this week's picks, I want to quickly review my picks from last week. I finished the day 2-3, but to my credit, Boston College did not deserve to win. Morelli proved me right in Happy Valley, and Sanchez showed that USC's problems go much deeper than the play of the quarterback. I still have a hard time believing the PAC 10 is any good with Arizona State undefeated and Oregon in second (after the beat down they got last year in the Holiday Bowl). If USC is having an off year, the entire PAC 10 drops a few rungs in my opinion.

Now with a few notable picks (in a less exciting week).

Game 1. LSU @ Alabama
The only reason the spread is under 10 is that the game is at Alabama and people are waiting anxiously for Saban to perform some wonder. Keep waiting. I've heard this talk that Nick will bring his A game against his old team - but the coach doesn't play and has players don't have much of an A game. The hard hitting, low scoring game (both teams will complete less than 60% of their passes) will appear closer than it really was.

The Matrix:
LSU by 13 points, 66% of covering a -7.5 point spread

Game 2. Oregon @ Arizona State
It's disappointing to think that the Ducks, after franchise-establishing wins at Michigan and against USC could fall to the Sun Devils and out of national championship contention. It is harder to believe that Arizona State is actually good. I've heard that Oregon has "far and away the best offense in the country." That's a bunch of bologna, but they'll still win this one.

The Matrix:
Oregon by less than a point (51% chance of winning), 31% chance against the 7 point spread

Game 3. Rutgers @ Connecticut
I ragged on Connecticut last week and they proved me wrong. Rutgers' performance, on the other hand, made me look like a prophet. The Matrix favors Rutgers in every statistical category it measures but the score.

The Matrix:
Connecticut by 2.2 (57% chance of winning), toss up against the spread (-2)

Game 4. Wisconsin @ Ohio State
Ohio State will beat up on another very weak Big 10 opponent. The Ohio State University might have a legitimately good team, but the rest of the Big 10 is soft. I understand the South when they moan about a lack of balance in college football - if BC and the OSU play for the national championship at the end of the season, I don't see any good reason to recognize the winner as the best team in the country. In my opinion, if national championship game does not include West Virginia or LSU (both teams that lost only once, on the road, against more talented teams than any that BC and Ohio State play all year), or if it includes any team other than those two, Oregon or Ohio State, I won't bother to watch. But Ohio State, with powerful wins over Akron and Kent State, which is better known for a shooting than football, has played well this season.

The Matrix:
Ohio State by 23 (94% chance of winning), 70% chance against the spread (-15.5)

Game 5. Texas A&M @ Oklahoma
Its a bad week for college football and I could only find 4 games of real importance, so I picked the 5th game as a homer. Stat of the game--the Matrix sees Oklahoma holding the Aggies to 3 yards per carry. I'd like to console myself by saying that A&M was only one stupid call away from beating Oklahoma last year, but the same coach that made that same stupid call (to kick the field goal) made the same stupid call last week against Kansas and seems, in fact, to be perfecting the art of stupid calls.

The Matrix:
Oklahoma by 20 (92% chance of winning), 47% chance against the spread (-21)

The most interesting game this week is Navy at Notre Dame. The Midshipmen haven't pulled out a win in a little less than a half-century (1963, I believe), but this year they get to play the JV. Navy is favored by 3.5, the Matrix gives them less than a point, but my gut tells me it won't be that close. Navy's run attack is good enough to put up points against anyone, and Notre Dame's offense is bad enough to stop themselves on an empty field.

Picks of the week:
I'm not a gambler, and I don't suggest it personally, but I do like outsmarting the folks in Vegas, so I present three games where the Matrix believes they are off the mark.

Illinois is favored by 12 and still the Matrix gives them a 72% chance of covering at Minnesota. Not only do the Gophers have an appallingly bad defense, their offense isn't half as good as the Big 10 likes to believe it is - but they did manage 21 against the might Bison of North Dakota State.

UTEP is favored by 7 at Rice, but they'll win by more than 15, 70% chance of covering. Even Texas, that has struggled against almost everyone this season (including powerhouses like Arkansas State and UCF), blew out Rice.

Iowa State has played tougher in recent weeks against Missouri and, especially, Oklahoma, but they are still one of the worst teams in FBS. Kansas State will cover the 14 point spread and win by 30+, 90% chance against the spread. Need I mention that Iowa State is the only other team against which Texas has looked competent.

Click the image below to see the rest of the picks. Rankings based on the matrix will be released starting next week.

CSV file

Sortable Table


Monday, October 29, 2007

Why Some Teams are Good, Part 2 - The Importance of Population

Obviously, a team has a better chance of landing a recruit if he lives nearby (or, in the case of Joe McKnight, they might be wishing they had stayed closer to home). In this blog I provide some evidence to support a claim I made in part 1 that increasing population increased the talent pool and, therefore, led to better football teams.

I picked 8 states more or less at random. I tried to include states from a variety of regions, with a variety of sizes and that have experienced a variety of population trends. I have included both Nebraska and Oklahoma, and, honestly, I don't know why.

Ratings come from Soren Sorenson, who you will find listed in the Statistics Hall of Fame. I have added 5000 to all scores so that they are all positive (Sorenson's system ranges from -4000 to +4000, +or- a thousand). Population data is drawn from the Census. Census data is collected every ten years and I have used my own estimates to fill in the gaps.

I have looked at states as a whole, adding together the ratings of all teams in that state, because teams in the same state recruit for players in the same talent pool. For now, I am ignoring population growth in the region (e.g. Georgia benefits from population growth in Florida), and characteristics of the population (e.g. old people in Arizona don't play football), but some day I will look at those issues in more detail.

So, first we begin in 1950.

The 8 states are Nebraska and Oklahoma, which I already mentioned, Florida, Arizona, New York, Indiana, North Carolina and Alabama. The graphic on the left shows the teams as they were ranked in 1950, color coded by state. Florida State, UCF, USF, and Buffalo did not have D 1 programs at the time (or, in some cases, did not have a football team, or just started admitting boys to the school).

This chart is important because, from here on, I will be focusing on indexed values for the state, so that indexed value will always reference back to this starting point. For example, 1950 was a good year for Oklahoma and Army (perhaps the two best teams in the country). This will be important to keep in mind.

This next chart demonstrates an important principle as well. This compares the percent of the total points held by a state (with their scores added together) of all the points available against the percent of the US population in that state. So, New York, despite Army's success, was under-performing. Anyone who has been to a high school football game in Dallas and in Rochester knows why this is happening. It shouldn't surprise anyone that Oklahoma performed the best giving their population size. Alabama was facing a unique challenge in segregation. It would be another 20 years before Sam Bam Cunningham would convince Bear Bryant to integrate, allowing Alabama to dip much deeper in its talent pool.

The population of most states would grew over the next 50 years, but some grew much faster than others. Florida and Arizona are good examples of states that blew up in terms of population, while New York stagnated.

In the following charts, I present data for each state in terms of their performance and their population over the 5 decades from 1950 to 2000. The black line is the team's performance. It is a running four year average which I use under the assumption that players from a cohort will play for a team for four years. The red line is the indexed population, where 100 is equal to the population in 1950. The blue line is based on the same principle, but represents the percent of the US population represented by that state, so that if a team's population is growing slower, but slower than the entire US, the blue line will fall but the red line will rise. The red line, therefore, represents the real talent pool and the blue line the relative talent pool, and because teams are good relative to each other, we should focus on the blue line. (You can click the charts to see a bigger version.)


Nebraska had some kicking teams in the 70's and the mid to late 90's, which shows up in their chart. The population as a percent of the US population was actually going down, but Nebraska kept spitting out world class teams. It makes me think that Osborne may have been a much better coach than we give him credit for. Arizona's performance isn't improving with its rapidly growing population. I think two things are at issue. First, Arizona doesn't have as strong of a football culture as the rest of the South and, second, Arizona's programs might be experiencing a bit of a lag.

I was a little surprised to see how well Indiana fits the pattern. Notre Dame has a unique advantage to recruit nationally and should be able to overcome general demographic shifts. Notre Dame claims their challenges are rooted in high academic standards, so I guess I'll have to look at that claim another day.

Alabama has been generally outplaying its population since the 50's but, like all the others, its performance is generally falling with the decline in its relative population size. The effect of integration on performance is still a little unclear, but something I will definitely look at more closely in the future.

But the overall results from this little experiment are clear--population trends in a region definitely effect the performance of that regions teams. The black lines tend to go where ever the blue lines are going. It also shows that we can't ignore culture, quality of coaches and the power of programs to attract players from long distances.

Sunday, October 28, 2007

Week 9 Rankings


Here are rankings for week 9 from some of the major polls. I should have my own rankings in the next week or two for comparison.

The image on the left (click to see a larger version) contains rankings from a number of different polls. Below are rankings from the major polls.




Mine
BCS
Coaches
AP
Sagarin Massey
Ohio St . 1 1 1 3 1
Boston College . 2 2 2 10 7
LSU . 3 3 3 1 2
Arizona St . 4 7 7 4 6
Oregon . 5 5 5 6 4
Oklahoma . 6 4 4 9 11
West Virginia . 7 6 6 7 5
Virginia Tech . 8 9 8 20 10
Kansas . 9 10 12 2 3
South Florida . 10 12 11 5 9
Florida . 11 11 9 8 8
USC . 12 8 9 24 20
Missouri . 13 13 13 13 12
Kentucky . 14 15 14 12 14
Virginia . 15 18 21 22 18
South Carolina . 16 17 15 14 15
Hawaii . 17 14 16 48 46
Georgia . 18 19 20 19 16
Texas . 19 16 17 31 34
Michigan . 20 21 19 21 21
California . 21 20 18 17 19
Auburn . 22 23 23 11 13
Connecticut . 23

16 23
Alabama . 24 24 22 25 17
Penn St . 25 22 24 29 24
Wake Forest . 26

28 31
UCLA . 27

15 25
Rutgers . 28
25 23 35
Boise St . 29

45 49
Purdue . 30

32 38
Texas A&M . 31

34 42
Georgia Tech . 32

26 26
Tennessee . 33

35 27
Oklahoma St . 34

30 33
Maryland . 35

43 44
Clemson . 36

38 32
Wisconsin . 37 25
53 50
Air Force . 38

50 48
Illinois . 39

44 43
Kansas St . 40

18 22
BYU . 41

40 36
Texas Tech . 42

33 30
Oregon St . 43

36 40
Michigan St . 44

41 47
Florida St . 45

42 39
Cincinnati . 46

27 29
Miami FL . 47

47 41
Vanderbilt . 48

46 37
Colorado . 49

39 51
Navy . 50

57 62
Nebraska . 51

61 69
Troy . 52

63 45
Arkansas . 53

37 28
New Mexico . 54

59 56
Fresno St . 55

58 52
Utah . 56

52 53
Mississippi St . 57

60 55
Northwestern . 58

68 63
Wyoming . 59

67 71
Washington . 60

49 57
Bowling Green . 61

69 72
Stanford . 62

56 59
Indiana . 63

55 58
East Carolina . 64

71 65
Ball St . 65

66 68
FL Atlantic . 66

75 70
UCF . 67

64 64
C Michigan . 68

74 73
Pittsburgh . 69

62 66
Louisville . 70

51 54
North Carolina . 71

54 61
Houston . 72

65 60
Tulsa . 73

81 74
TCU . 74

70 67
Miami OH . 75

80 79
UTEP . 76

86 85
Akron . 77

82 88
Notre Dame . 78

78 77
W Kentucky . 79

84 87
Iowa . 80

77 80
Duke . 81

72 84
NC State . 82

73 76
Army . 83

95 102
Mississippi . 84

76 75
Kent . 85

98 98
Southern Miss . 86

93 82
Baylor . 87

96 95
Nevada . 88

85 86
New Mexico St . 89

101 96
Temple . 90

99 100
Syracuse . 91

97 90
Washington St . 92

91 81
Middle Tenn St . 93

83 83
W Michigan . 94

89 92
Buffalo . 95

94 101
Arizona . 96

79 78
San Jose St . 97

102 93
Louisiana Tech . 98

92 99
Toledo . 99

105 111
Minnesota . 100

87 103
San Diego St . 101

88 89
Arkansas St . 102

103 97
UNLV . 103

100 94
Ohio . 104

106 109
UAB . 105

108 105
Memphis . 106

107 104
LA Monroe . 107

110 106
Iowa St . 108

104 110
Colorado St . 109

90 91
Tulane . 110

111 107
E Michigan . 111

109 108
Rice . 112

115 112
North Texas . 113

120 116
Florida Intl . 114

118 120
LA Lafayette . 115

116 117
SMU . 116

119 119
N Illinois . 117

114 114
Marshall . 118

112 113
Idaho . 119

117 118
Utah St . 120

113 115

Thursday, October 25, 2007

Week 9 Picks and Prediction Model (PM) 3.0

This week I will start with picks, and then describe the prediction model I used to generate the picks below. I'm also getting a big head so I thought someone might be interested in my own picks--and that way we can see if I'm smarter than my own computer. The prediction model (PM 3.0 this week) and I will go head to head on 5 games a week, picking winners and against the spread, and then I will also post PM 3.0's picks for the rest of D 1-A (aka FBS).

If you are interested in spreads, covers.com is the place to go. I have included the handicap for the home team in parentheses.

Game 1. Ohio State @ Penn State (+4)
I don't think this game will be as close as it looks like it should be. Sure, its in Happy Valley, and, sure, Ohio State and Penn State statistically look very similar--except in one very important area, the win/loss record. Watch Morelli to crack like Woodson at SC and OSU will win this walking away.

Me:
To Win: Ohio State
Against the Spread: Ohio State

PM 3.0:
To Win: Ohio State
Against the Spread: Ohio State

Game 2. West Virginia @ Rutgers (+6.5)
Again, the better team is on the road. Pat White will be healthy (or as healthy as he ever is) and West Virginia will be flying around the field again. It is important in this game to consider match ups. South Florida beat WV (at home) because they had the speed on defense to contain Slaton and White. Rutgers beat South Florida (at home) because that speed didn't translate well when Rice was slamming it down their throats. Rutgers, so far, has been a flat, uninspiring team with the exception of one Thursday night. West Virginia will break it open in the second half and score to many points for Rice to keep up.

Me:
To Win: West Virginia
Against the Spread: West Virginia

PM 3.0:
To Win: Rutgers
Against the Spread: Rutgers

Game 3. South Florida @ Connecticut (+4.5)
I have included this game only because I can. Who would have predicted at the beginning of the year that this game would pit two ranked, one-loss teams against each other with Big East title hopes alive? But seriously, I can't get myself to believe that UConn has a good team--when has Connecticut ever produced a good athlete? And I'm not the only one to think this.1 South Florida is definitely the better team, but cold weather and inexperience may slow them down. They still win easily.

Me:
To Win: South Florida
Against the Spread: South Florida

PM 3.0:
To Win: South Florida
Against the Spread: South Florida

Game 4. USC @ Oregon (-3)
I was worried that Mark Sanchez off the bench might give SC the spark they needed to be a good football team again. Fortunately, he's not everything he was supposed to be. It looks like Booty's finger will be well enough and he will lead his team to another mediocre performance. A note on USC--their big victories are against Nebraska (cupcake) and Notre Dame (wedding cake). They lost to Stanford (cheese puff) and almost lost to Arizona (lost little child). Oregon's beat down of Michigan was impressive, but that was a Michigan team that is still recovering from the week 1 train wreck. Both teams are talented, but with PAC-10 talent - either could win by 30 or flake out and lose to my high school team. I take USC, because they have more raw talent to start with.

Me:
To Win: USC
Against the Spread: USC

PM 3.0:
To Win: USC
Against the Spread: USC

Game 5a. Boston College @ Virginia Tech (-3)
See Game 4. Two teams that have not been all that impressive, but, to their credit, they have been winning a lot of games. I'm taking Virginia Tech to knock off the first top 10 team this weekend on Thursday, but it will be close.

Me:
To Win: Virginia Tech
Against the Spread: Boston College

PM 3.0:
To Win: Boston College
Against the Spread: Boston College

Game 5b. Kansas @ Texas A&M (+2.5)
I had to include this game for a number of reasons. First, this might be Kansas's only weekend in the top 10, so we must take a moment to recognize it. Second, I would like to note that Kansas is actually very good and undefeated for a reason (the same reason that BC is undefeated but without the same level of respect). After Saturday Kansas will have two cupcakes (Iowa State and Nebraska) and a road game in Stillwater before the final match up against Missouri. Finally, I have included this game so I can point out that, while they are getting no love from the national media, the Aggies have only lost twice and they are tied for first in the South. The outcome of the game depends on the Aggie passing game. If Kansas can put 8 in the box all night, they win and cover the spread; if not, and A&M burns the secondary a time or two at Kyle Field, it could be very interesting. One last quick note on Kansas--they have covered the last five weeks.

Me:
To Win: I abstain
Against the Spread: I abstain

PM 3.0:
To Win:
Kansas
Against the Spread: Kansas

The Rest: Click image to see a legible version
It includes the probability for each team of winning and beating the spread and the yards. Obviously, if a team has a better than 50% chance of winning then they are "favored".




















PM 3.0

Prediction Model 3.0 is my first model to account for match ups. The method I have chosen to do this is too simple, but I'm building on trial and error for now. The basic idea is that teams have relatively consistent run to pass ratios. I use time of possession and plays per second to estimate how many plays a team will have in a game (adjusted for how long their opponent will have the ball) and then estimate the number of run and pass plays each team will run. Using their average yards per run play, completion percentage, and average yards per completion, adjusting for the other teams defensive strengths, I can get a figure on the number of total yards a team should have. I then use the basic rating system I used in PM 2.11 and give a bonus to the team that will generate more yards.

In these circumstances, the only real variables that I have to decide on are the adjustments I will be using. I have decided to use a k of 3/sqrt(1+t) where t is the week in which the game took place. The figure, therefore, should stabilize as the season progresses, which I believe mirrors reality.

The adjustment of the rating is Rating + 10*(team yards/opponent yards). I chose ten rather at random, but it really means that in the most extreme cases a team may have 5 to 10 points added to their estimated margin of victory.

The problem with a prediction model that adjusts for match ups is that it cannot be used to rank teams. In a rating system it is necessary that if A>B>C then A>C, but if we take match ups into account then if A>B>C it is still possible that C>A if A matches up poorly against C and well against B. This means that it can't be used for ranking teams, but only for predicting the winner if two teams play. I have thought up a method of getting around that, but programing it will take some time.