|
Post by Chris Clement on Oct 27, 2011 15:07:56 GMT -6
Now, I know many of you have English and history teachables, and might occasionally do some social studies, but for those of us who know what all the buttons on our calculators do, does anyone apply some basic stats stuff to their season results?
I went through the 3.5 games we currently have on Hudl (I know, he's really slow to upload, it kills me sometimes. It's our first year with it.) and I exported everything into Excel. I scrapped the columns I didn't want, got rid of the defense and kicking stuff, got rid of plays that I didn't get good numbers for, intentional safeties, and got rid of turnovers ( I already know those are a problem, and I didn't just want to put zeroes in). I ended up with 226 plays that I had Gn/Ls values for.
I went in thinking we accumulated a ton of yards, but we had trouble sustaining drives because we had a lot of negative plays, and our passing game was our QB hucking it deep to his best friend (another issue), while our RB wouldn't run between the tackles if he could help it.
First I ran the mean (average), and it came out at 5.6. Fantastic, right? but the median came out at 2.5. So, exactly half the time we get 2 yards or fewer. Running some counts, we had 63 plays (28%) for 0 yards, mostly incompletions, and 96 plays (42%) for 0 or loss. Only 43% of plays went for 4 or more.
The real kick in the pants was the standard deviation: 11.4. Standard deviation was TWICE the mean. In most circles, that's through the roof, but I don't know if it is for football, I have no basis of comparison.
I made a histogram, but it was too noisy, especially with artificial peak at 0 from the incompletions that skewed the scale of the axes. There was obviously a long right tail, the product of our deep passes and our dancing RB.
Has anyone done something similar that they would be willing to share? I want to know how far off we are, and I have a thesis that average yards per play is a somewhat overrated statistic if we don't consider the central tendency, because you need first downs more than just bulk yardage. General thoughts on the matter would also be great.
|
|
|
Post by coachcb on Oct 27, 2011 15:20:16 GMT -6
You are going to have big standard deviations when doing this kind of statistical analysis on football because of outliers. Think about it, most teams bang along, 4-5 yards a pop and break a few for a big gain. Now, a gain of ten yards doesn't seem like much but it's double the mean and four times the size of your median in this situation. Now, think about the effect of a 20+ yard gain. You're going to have your negative plays in football but it's pretty rare for them to be greater than a 5-10 yard loss. Your data is always going to be skewed and you'll always have outliers.
This is where it's good to shave the top and bottom 5% of your data off. Or, maybe leave the bottom 5% but lop off the top 5%.
|
|
|
Post by cqmiller on Oct 27, 2011 15:35:47 GMT -6
I'm with CB...
Any of those "explosive plays" (For us runs longer than 15 and passes longer than 20) need to be dropped. If you are looking for a true down-by-down estimate you cannot include any of the plays that break for that long. It makes all of your averages and stats completely skewed.
Think about the good college/NFL teams that run zone. You will see a RB with 35 carries for 70 yards (2 yd avg.) at the beginning of the 4th quarter, and if you watch the game you are thinking that the running game stinks. Then you flip the channel and 5 minutes later you see 40 carries for 175 yards and 2 TD's. All because of 1 really big gain late in the 4th quarter. Completely makes the stats lie.
I equate it to what happened to Oregon vs. LSU this season. They have HAMMERED most of the teams they have played because those quick RB's will break a 90 yd run untouched or a 75 yd run untouched. Against LSU, they never got those HUGE plays and they looked inept on the ground as a result. Anyone who just looks at the numbers after the game will not have any idea as to what actually happened. A 4 TD win could have been from just 4 breakaway runs.
Look at the Broncos last week... 58 minutes of TERRIBLE offense and 2 minutes of okay offense gives them a win.
|
|
|
Post by Chris Clement on Oct 27, 2011 15:40:42 GMT -6
Well, after getting rid of the top 5% (my long tail), I had a new mean of 3.8, and a SD of 6.9 Not a whole lot better. Median dropped to 2, but that was a given. I still have a lot of plays in the 20-something range, because that's where those ridiculous heaves into triple coverage land.
I did this for our defense (n=181), and for the raw data I got Mean = 7.3 Median = 4 (This was scary) SD = 12.67
This is a much tighter distribution, but still not great in absolute terms. With the top 5% gone, we get Mean = 5.1 Median = 3 SD = 8.00
Which isn't much different. Of course, our defense gave up long drives as well as big plays, so they aren't a great source of data.
|
|
|
Post by planck on Oct 27, 2011 15:53:44 GMT -6
Bear in mind that this will always have a positive long tail; that is to say, you're much more likely to gain 1, 2, 20, or 99 yards on a play than to lose 1, 2, 20, or 99 yards. It's going to be an asymmetrical distribution (as you seem to have discovered). I haven't had the time, but I would wager a guess that yardage gains are some kind of beta distribution; if you're analyzing them as though they were a normal distribution (i.e. using z scores or something), you'd get really weird results.
You might gain more insight by constructing a histogram rather than looking at the summative statistics. At least then you'd have some idea of the relative prevalence of the different yardage bins or could calculate some useful percentiles.
Anyway, if you don't know the distribution then the summative stats may be either misleading or flat out useless.
|
|
tekart
Junior Member
Posts: 298
|
Post by tekart on Oct 27, 2011 16:24:20 GMT -6
I read this post and then my head exploded!
|
|
|
Post by mattyg2787 on Oct 27, 2011 16:36:42 GMT -6
My head hurts
|
|
|
Post by spreadattack on Oct 27, 2011 16:43:19 GMT -6
I tried to work with stadium deviations versus averages a few years ago and had mixed results for the reasons stated above -- a big standard deviation isn't *necessarily* bad. If your plays all go for 8, 8, 8, 80, 8, 80 -- you wouldn't comeplain, whereas if your plays all go for 0, 0, 1, -1, 2, 0, you have a pretty tight standard deviation.
I do think looking at the skewness and kurtosis of the distributions is useful to get a sense of how assymetrical you are -- indeed, you might want it to be as asymmetrical as possible as you'd have lots of big plays and no negatives -- and to get a sense of how consistent your results are when plotted.
I think there's a lot of bright future with this but I've sort of stalled on how to move forward.
|
|
|
Post by Chris Clement on Oct 27, 2011 16:47:45 GMT -6
I bet if you did just running plays, it would look like a beta distribution with a high beta value, but I don't have a high enough n currently to try that. I cut out incompletions, and the graph was a lot tidier, albeit kind of useless, but there was still a lot of 25 yard jump balls. Incompletions are inherently going to give you a big-time kurtosis problem (spikes in the graph).
One point, you said we are more likely to gain 1 or 2 than lose 1 or 2. You have not seen this team. We would work very hard to change your mind.
|
|
|
Post by pmeisel on Oct 27, 2011 20:12:53 GMT -6
Guys, I have studied advanced statistics, and I am not quite sure how I would use a probability distribution to study this. The behaviors look like a beta, gamma, or chi-square could fit, but I am not sure how that would be useful.
A better way to the problem might be to set some football related criteria for play results, and then applied yes-or-no, Bernoulli type statistics to them. Example: play did or did not make 4 yards (our arbitrary target for success for staying on track to make first downs). Play did or did not make 10 yards (our arbitrary target for 2nd/3rd and long plays). Play did or did not make 25 yards (our arbitrary target for explosive plays).
By stringing together series of did/didn'ts probabilities, you can model the likelihood of successful scoring drives, and the probable yardage gained before you are forced to punt.
I did a little analysis like this for fun, probably 35 years ago, but didn't keep it. Back then, of course, most coaches ridiculed analytical thinking (math ain't futbaw).... My interest was the wisdom of Woody's ball control philosophy (and Chuck Knox, and some others) vs. the passing game (70s Chargers and Raiders, for example).
My conclusion at the time was that zero and loss plays require an offense to average much higher gains on its successful plays, so if you pass much, then short gains on runs (dive plays, 3 yards and dust) will not fit into a successful pattern in your offense. You can be Woody, or Bill Walsh -- Chuck Knox, or Don Coryell -- but you can't be both at the same time.
|
|
|
Post by Chris Clement on Oct 27, 2011 21:16:32 GMT -6
Well, that certainly passes the sniff test. But can we determine the combinations of factors that make a "good" offense, and more importantly, can we use that to make an offense better.
I don't know if we can simply apply Boolean values of success or failure, a 9 yard gain on 3rd and 15 is worse than a 2 yard gain on 3rd and 1.
|
|
|
Post by calkayne on Oct 27, 2011 22:54:30 GMT -6
I looked at the statistics of our games last season and noticed that I could not relate passing downs to running downs.
The scale of the yards gained on a long completion versus the many incompletions doesnt allow for an accurate description of the efficiency of an Offence when it is analysed in a data pool including the running game. If you where to remove the upper and lower gains/losses then you are also reducing your data pool to such an extent that you are artificially making the numbers fit.
However, if you allocate the plays zones as per your base coverage, you can then evaluate the efficiency of the game/season with some accuracy. You could also do this for TFLs by Zoning the Blocking Zone to see where the weak spots are. But this is a lot of work to get template in such a working order that also makes sense.
As you have pointed out, this sort of analysis is great when you are looking at the entire season or productivity of a philosophy over a number of seasons, but looking at individual games would also include evaluation the D&D situations not just overall efficiency.
|
|
|
Post by postcrack on Oct 27, 2011 23:39:19 GMT -6
Wow...I just read those and now I'm sober! Thanks!
|
|
|
Post by CoachCP on Oct 28, 2011 6:53:45 GMT -6
I've actually been thinking about creating an advanced situational excel dashboard. chandoo.org/wp/ has some great tips on this. I do marketing and I recently took an Edward Tufte class on graphics, charts, and other super graphics. I'm hoping, sometime soon, to apply this to some our statistics and see if the charts and help me see a few trends that would be hard to see when looking at raw data.
|
|
|
Post by John Knight on Oct 28, 2011 6:56:59 GMT -6
|
|
|
Post by realdawg on Oct 28, 2011 8:57:48 GMT -6
Anyone besides me wondering why dc hasnt changed his screenname to hcohio?
|
|
|
Post by Chris Clement on Oct 28, 2011 9:01:19 GMT -6
I`m not sure that NEVER punting is the best strategy, especially in Canadian football, I think punting LESS is advisable, in general. Now, if you have 16-year old Ray Guy, punt more, if you have nobody who can kick it 10 yards, maybe you never kick it that season. Even for that coach, there may have been times where punting would have been advantageous, but he probably had gotten so much attention, it became a pride/ego thing.
Has anyone done more than just one or two seasons worth to get useful comparison? What about applying less common statistical functions, like kurtosis or variance? What n value would we need to get a useful trend, I'm thinking ~1000, which would require about 12-14 games with our no-huddle and clock rules.
|
|
|
Post by coachks on Oct 28, 2011 9:14:48 GMT -6
Serious question, and this same thought applies when I read Smartfootball, FootballOutsiders, Phil Steele and anything else that jumps into the stats....
How does this help a coach. Knowing you average x ypp, your median is y ect... What is this helping coaching wise. I understand how it's useful for predicative purposes, but how can this be transfered onto the practice field?
|
|
|
Post by planck on Oct 28, 2011 9:15:22 GMT -6
Guys, I have studied advanced statistics, and I am not quite sure how I would use a probability distribution to study this. The behaviors look like a beta, gamma, or chi-square could fit, but I am not sure how that would be useful. A better way to the problem might be to set some football related criteria for play results, and then applied yes-or-no, Bernoulli type statistics to them. Example: play did or did not make 4 yards (our arbitrary target for success for staying on track to make first downs). Play did or did not make 10 yards (our arbitrary target for 2nd/3rd and long plays). Play did or did not make 25 yards (our arbitrary target for explosive plays). By stringing together series of did/didn'ts probabilities, you can model the likelihood of successful scoring drives, and the probable yardage gained before you are forced to punt. I did a little analysis like this for fun, probably 35 years ago, but didn't keep it. Back then, of course, most coaches ridiculed analytical thinking (math ain't futbaw).... My interest was the wisdom of Woody's ball control philosophy (and Chuck Knox, and some others) vs. the passing game (70s Chargers and Raiders, for example). My conclusion at the time was that zero and loss plays require an offense to average much higher gains on its successful plays, so if you pass much, then short gains on runs (dive plays, 3 yards and dust) will not fit into a successful pattern in your offense. You can be Woody, or Bill Walsh -- Chuck Knox, or Don Coryell -- but you can't be both at the same time. I like this idea quite a bit; the question then becomes what factors do you consider in calling a play a successful trial or not? Certainly down and distance are important - what about time on the clock, score differential, etc? It could be done, but you would need to really chart the heck out of your film. My inner math nerd finds it interesting, but considering the investment of time and mental energy one would have to consider what would be the benefit (beyond being awesome).
|
|
|
Post by hamerhead on Oct 28, 2011 11:38:35 GMT -6
I read on here once before where somebody was talking about "Play Efficiency" which was pretty much just the % of times a play gained 4 yards, first down, or touch down. I'd be tempted to eliminate "first down" from the equation. I guess if it gains 2 yards on 4th and 2 it's a good play, but if that's my threshold for success (4 yards) then shouldn't it still need 4 yards? I guess you can assume defenses play differently in those situations, thus "first down" is still and efficient play. Just throwing it out there, I may look at it this season.
The one stat I haven't kept that I tell myself every year I'm going to, is to track plays at practice vs plays in games, just to see the correlation. I really should start keeping those.
|
|
|
Post by CoachCP on Oct 28, 2011 13:19:48 GMT -6
Serious question, and this same thought applies when I read Smartfootball, FootballOutsiders, Phil Steele and anything else that jumps into the stats.... How does this help a coach. Knowing you average x ypp, your median is y ect... What is this helping coaching wise. I understand how it's useful for predicative purposes, but how can this be transfered onto the practice field? In my opinion, the secret is what feeds into that stat. What hurts your average ypp? What increases it? In what scenario do those increase or decrease? Some of this is game planning and changes week to week, but if I know I get 4+ yards on power and 2 yards on Iso, then I need to call power more. If I spend twice as much time teaching Iso compared to Iso, then I either need to (1) become more efficient at teaching iso (2) shift my priorities to teaching another play. Maybe I find that I run power really well vs teams that load up the LOS, but I run Iso better versus a "vanilla" safe defense. That will effect my play calling or practice prep. I won't spend as much time hammering out the details on Iso vs multiple fronts if I previously spend a crap ton of time doing that. Instead, I'll just check to power in the game. These are just examples, but you can most definitely take advantage of this data.
|
|
|
Post by Chris Clement on Oct 28, 2011 13:53:14 GMT -6
To continue, a large standard deviation on power, for example, implies inconsistency. Maybe the kickout guy is trying to hard to murder the end, and misses, or one WR doesn't block nearly as well.
|
|
|
Post by coachks on Oct 28, 2011 15:01:41 GMT -6
To continue, a large standard deviation on power, for example, implies inconsistency. Maybe the kickout guy is trying to hard to murder the end, and misses, or one WR doesn't block nearly as well. Right, but don't you coach these points anyway? Between seeing it on film (look at our FB whiff on the end...) and teaching it in practice, how does knowing the standard deviation help you teach the kickout block? I can see the merit in seeing how productive a play is (Trap gains 5 a play, Belly is only gaining 3), but that is a far cry from what is going on in this thread. Just curious how this stuff can be converted to an asset on the field.
|
|
|
Post by calkayne on Oct 28, 2011 15:52:49 GMT -6
Something to keep in mind for those that are less than enthusiastic about going to this extent is that this can be the proof that you are looking for. (Otherwise you really wouldnt be doing it)
We all believe our Theories and can argue them, but this kind of Statistical Analysis can prove your Theory. Is your Strategising successfull or not, are your decisions hurting the production, when and where is the team slumping or rolling.
It is not so much a case of "Johnny, now with the Lead Blunt on 3rd and 4 we attain a Median of 4.657yds when on the Left Hash, well do that" but more of a case of "Is the coaching working or not".
Its self Analysis that you cant argue against. The question is how do you optimise the Analysis so that it gives an unbiased result.
|
|
|
Post by Chris Clement on Oct 28, 2011 19:31:09 GMT -6
Right, it tells you what you may have already known or suspected. It doesn't tell you how to teach a kickout, but it tells you that something weird is going on with that play, or if your offense is adhering to your philosophy and perception. It's like watching the film, and you notice the game didn't actually go the way it felt like it was going at the time.
|
|
|
Post by coachd5085 on Oct 29, 2011 9:01:35 GMT -6
To continue, a large standard deviation on power, for example, implies inconsistency. Maybe the kickout guy is trying to hard to murder the end, and misses, or one WR doesn't block nearly as well. I can see the merit in seeing how productive a play is (Trap gains 5 a play, Belly is only gaining 3), but that is a far cry from what is going on in this thread. One underlying theme here is the devaluation of the "average" statistic. For example, trap gaining "5 a play" is a relatively meaningless piece of information if you are referring to the mathematical average. As others have noted, I can run trap 10 times, get stoned on 9 of them, pop one from my own 1 yard line, and average 9.9 a play. However, examining the mode and median numbers, shows I don't run trap terribly well. I think that might be one way this is useful. You can evaluate the plays in accordance with your definition of success. Like cclement is using mode and median stats to show that his has been "hit or miss" If it is quick and easy to do, you can also kind of set up a menu of things you want to look at. For example if the median for trap was 5, and belly only 3, you might look at WHY, to see if there is anything you can do about it. Some times it might just be situational, and you can't really do much about it (for example, trap might be your 3rd and long play...and so it gains 5+ 50% of the time. However, you ran it on 3rd and 8 25% of the time..... where belly was primarily a 1st down play. I do think because of the TREMENDOUS amount of variables involved combined with limited data that statistical analysis isn't the end all be all. But some data mining post season might help you prep for the next season. In season would be more challenging because of time issues, however I will say that I KNOW that often our perceptions and our gut are often wrong. We think something is working when it really isn't. We think someone is productive when they really aren't etc. Now I am not sure how beneficial analyzing the standard deviations would be, since remaining consistent is not necessarily the goal ( you WANT plays to POP)
|
|
|
Post by Chris Clement on Oct 29, 2011 13:19:40 GMT -6
I think a high average with large deviation is good, it means you're getting ots of plays that pop, but a medium or low average with high deviation should imply inconcistency. The problem is that the big plays are being overvalued. I don't care if the play goes 85 yards or 90, but I definitely care if it goes for 3 or -2. There must be some way of weighting it without creating bogus results.
|
|
|
Post by coachd5085 on Oct 29, 2011 14:08:42 GMT -6
I think a high average with large deviation is good, it means you're getting ots of plays that pop, but a medium or low average with high deviation should imply inconcistency. The problem is that the big plays are being overvalued. I don't care if the play goes 85 yards or 90, but I definitely care if it goes for 3 or -2. There must be some way of weighting it without creating bogus results. Maybe, but at THIS point... having to constantly modify the data... what do we get from it? If you got "non bogus" results, what value do they provide?
|
|
|
Post by cqmiller on Oct 29, 2011 14:19:51 GMT -6
Decide what your MAXIMUM you put into the computer will be and any run longer than that MAX value will only count as that number. That would make the data much more useful in terms of median, STD Dev, and chi-squared values.
It would be up to you whether you are looking for a true-mean or if you are looking for consistency. If all you are looking for is consistency, then anything that is >5 yd loss just count as -5 and anything longer than 5 count as +5. That would give you a range of 11 values to use with 0 being the middle. You would then hope that you have a skewed set of data with a large number at the 3, 4, and 5 values with very few in the 0, -1, -2, -3, -4, and hopefully none in the -5.
Then you would have a better set of data to check consistency. If you are looking for explosiveness, then you would increase that +5 max value to a larger number. Again setup your parameters to give you the best opportunity to solve what you are looking for.
|
|
|
Post by pmeisel on Oct 30, 2011 4:09:45 GMT -6
One thing you shouldn't do is let the numbers and the numerical methods become more significant than the real issue you are trying to analyze. It's easy to become immersed in the data and lose sight of the context.
For example, average yards gained, and % of time yards gained over target, for a particular play or series, can tell you whether it's working. If you averaged 9 yards a play, but that was one 90 yard run and you got stopped for zero 9 other times, it's not working. If you gain 10 yards or more 80% of the time you run tunnel screen, well, then run the hell out of it until they learn to stop it.
You need a certain amount of data to have faith in your conclusions, but I think the biggest value of the statistics is analyzing specific questions like these -- not in a broad numerical summary of your game. The broad summary has no context -- it's a little bit like basketball, where the points scored when the game is close counts, but no one cares about the scoring during "garbage time" when the game has been settled.
|
|