Player Evaluation: Modified Approximate Value Evaluation Model (MAVEM)

Front offices have been trying to perfect draft and player evaluation for years in order to figure out how to get an upper hand on draft selections and draft trades. But the methods used for evaluation are outdated at best. Ethan Young has come up with a new model called Modified Approximate Value Evaluation Model (MAVEM) that aims to more accurately capture value of different draft picks

NFL draft value charts are flawed, static, and don’t properly adjust for factors such as cap, actual output, or issues with the way said output is calculated, such as positional and surplus comparisons. So, I set out to build a model that not only solves these problems, but can be used to evaluate any type of trade (player for pick, pick for pick, and player for player), free agent signing, or draft pick. I also took a dynamic approach with draft pick values, by using pre-draft grades rather than draft slot, allowing us to adjust for class strength and provide actionable intel as players fall on draft day.

[dt_divider style=”thick” /]A New Value Chart

This project is similar to my original Return on Draft Capital concepts. However, rather than just determining efficiency rates for player groupings, as, for example, measuring the efficiency of players above or below predictive Positional Slaytic Thresholds (more on that here), this model can evaluate player groupings and any type of player acquisition.

A little over halfway through this project, I came across Kevin Meers and his work. For those who don’t know, Meers (now the Director of Research and Strategy for the Cleveland Browns) wrote a great article about Approximate Value (AV) outputs relative to draft position for the Harvard Sports Analysis Collective in 2011, and then used those principles to build a Draft Trade Chart. While at first I was a little disappointed to be beaten to the punch, I found it encouraging to see I was coming to the same conclusions he had, in that AV outcomes did not at all match traditional Draft Trade Value Charts. Moreover, there were enough differences in how we approached our charts, including additional improvements I wanted to add, that I thought it worth continuing forward with my chart; I’ll point out those differences as I go along.

[dt_divider style=”thick” /]Adjusting for Career Length

I started out by building a database of total Approximate Value for every player drafted since 1999. I omitted kickers and punters from this process because, as I explained in a previous article, past AV data says they aren’t worth drafting. I then divided each player’s total AV by the number of years they had played to adjust for different career lengths, creating AV per year (AVPY). This is the first major difference between our approaches. Meers uses “cAV”, which is a player’s weighted career AV. Pro Football Reference describes weighted career AV as “100% of the player’s best season, plus 95% of his 2nd-best season, plus 90% of his 3rd-best season, plus 85% of his 4th-best season” and so on.

There are a couple problems with using cAV. The biggest is, because cAV doesn’t account for career length, not every player is evaluated on a level playing field. Also, players currently in the league cannot be evaluated using cAV. I use AVPY to quantify value. In terms of using weighted or unweighted AV, I tend to believe each season over a career should be weighted equally. Due to the unpredictable nature of professional football, consistency should be rewarded over variance.

However, though smaller than the issues with cAV, AVPY has some shortcomings that should highlighted. The main one comes from a data collection standpoint. Scraping PFR’s Draft Finder (which gives you cAV, but does not give you years played to account for career length) takes only a couple of hours, whereas using the Player Finder instead (which gives unweighted AV and career length to divide out by) took about a month to assemble due to holes and missing players.

(I don’t know why the Player Finder is so broken, but the biggest problem areas were at linebacker and defensive end, and there were smaller problems at nearly every other position as well. Once I discovered there were missing players, I had to manually go through each draft pick by pick and fill in the gaps. A number of players were also listed in the wrong draft class, which resulted in more data re-entry. Some examples of missing players: Von Miller, Jamie Collins, C.J. Mosley, Alec Ogletree, Thomas Davis, Lance Briggs, Aldon Smith, Ken Lucas, Anthony Barr, Cameron Heyward, almost the entire 2015 draft class, and hundreds more. So it’s easy to see why AV-based work is done in units of cAV, even though it is flawed.)

[dt_divider style=”thick” /]Positional Adjustments

Once the data set was quality checked and organized, I ran into another problem: I knew from my past RODC work that comparing AV data across positions can be very uneven, and I had previously created some small qualitative positional adjustments to counter that issue. But the problem was worse than I had thought, particularly at tight end. I had thought that maybe the PFR staff had a reason for the apparent discrepancies in value by position, but after reading more into how AV is calculated and then asking some of the PFR guys, it became clear there was no intent to inflate or deflate certain positions based on any sort of significant findings that certain positions were more important than others. Perfectly setting how each position should be valued is impossible, so, instead, I took a different approach to let talent level rather than position bias shine through.

So, with all this scraped data at my disposal, I started to calculate what would be needed to balance the apparent residual inflation and deflation between positions under the existing method. This is another difference between the Meers chart and mine, as he does not mention anything about his chart being position adjusted. I could have used percentiles at each position instead of raw adjustments, but that can get inexact for purposes of data comparison, and I also already use percentiles heavily in Slaytics.

 

Identifying Problems

First, I found the AVPY floors, total averages, starter floors, starter averages, and ceilings (which were determined by the average AVPY among the top three players) at each position.

AVPY by Position

The question of how each position should be valued relative to one another is an impossible one to resolve. Therefore, instead of going through and deciding which positions are more valuable than others, I will attempt to bring each position to as close to equivalency as I can. This is certainly not 100% accurate, but rather attempts to solve major flaws with AV positional comparisons, so as to better quantify how good a player is in the overall scheme of things.

To do that, these results obviously need to be adjusted. Here is the same graph in numerical form:

AVPY by Position
Position Floor Total Mean Starter Floor Starter Average Ceiling
QB 0.17 4.29 6.83 10.88 16.12
RB 0.09 3.21 4.80 8.48 12.77
WR 0.11 3.33 5.09 7.37 11.73
TE 0.10 1.99 3.55 5.06 8.53
OL 0.20 4.22 5.91 7.49 11.78
INT 0.25 3.41 5.70 7.67 15.03
EDGE 0.25 3.48 5.67 7.45 12.06
LB 0.25 3.79 5.67 7.46 13.61
CB 0.33 2.71 4.60 6.23 13.10
S 0.08 2.81 4.67 6.05 10.75
Average 0.18 3.32 5.25 7.41 12.55

When I first saw these results, I initially wanted to incorporate dynamic adjustments. For example, under such a dynamic system, the TEs closer to the ceiling would get a bigger adjustment than the ones closer to the total average, so as to get closer to their other position talent equivalents. There were a few hiccups with that, though. It’s something I may revisit later, but some of the initial results looked off from a basic eye-ball test. So I decided instead to set uniform adjustments at each position. To do that, I found the total mean, starter floor, starter average, and ceiling of every position together. Then, for each position, I calculated the difference from the mean for each of the four tiers of valuation.

Distance from AVPY Mean Line by Position

Distance from AVPY Starter Floor Line by Position

Distance from AVPY Starter Average Line by PositionDistance from AVPY Starter Average Line by Position

In all these graphs, the zero line is the average of every position. So, for example, the QB AVPY ceiling is 3.57 over the average AVPY ceiling of all positions in the aggregate. To compare the ceiling of QBs, then, to the ceiling of other positions relative to the respective value of the positions, you would need to subtract 3.57, hence the negative value of 3.57 in that chart.

Differential from Line (Average)
Position Total Mean Starter Floor Starter Average Ceiling
QB -0.97 -1.59 -3.47 -3.57
RB 0.11 0.45 -1.06 -0.22
WR 0.00 0.16 0.05 0.81
TE 1.33 1.70 2.35 4.01
OL -0.89 -0.66 -0.07 0.77
INT -0.09 -0.45 -0.26 -2.49
EDGE -0.16 -0.42 -0.03 0.49
LB -0.46 -0.42 -0.05 -1.06
CB 0.61 0.65 1.18 -0.56
S 0.51 0.58 1.36 1.80

So to find the positional adjustments, we need to take the distance each datapoint is from the line in each of the four tiers, and then take the mean. Instead of averaging it equally, I weighted the average toward the total mean, as I found that this line represents the data set the better than the other lines. I did this to try to find the adjustment that puts each position on as close of a level playing field as possible to the other positions. The other lines are important though, as they identify where the data is inflated or deflated.

Fixing Problems

First off, the top end and starter average QBs are way off the line, which seems like an issue, but is actually OK. All these adjustments are weighted toward the total mean, so those QBs will be valued higher relative to players of the same skill level at other positions. And that isn’t a bad result considering the importance of the QB position.

TEs are a problem as well. The total mean and ceiling values are spread too far apart for one number to fit, so, to fix that, I squared the AVPY of every TE by 1.115. This helped bring all four TE tiers closer to other positions and greatly reduced the ceiling deficit, which is much bigger than the deficits of the other three lines. I didn’t just choose 1.115 randomly; it was the perfect middle ground so that a uniform adjustment would fit each skew better. When I saw how effective this was at reducing big ceiling deficits, I decided to implement it with the WR and S groups as well, but with a smaller exponent.

Offensive Lineman AV

The last and frankly biggest problem is the OL. The jump from the second highest total mean to the third smallest ceiling stood out significantly. The more I looked at it, the more I realized offensive lineman AV is very flawed. Seahawks OL Justin Britt came in ahead of five-time Pro Bowlers Joe Staley, Marshal Yanda, and Ryan Kalil, for example. You may think AV just loves Britt and that is a one off exception. But it isn’t, as David Bakhtiari, Ereck Flowers, Shane Olivea, and Justin Blalock also are above those guys. Jason Kelce, Branden Albert, and Richie Incognito all land outside the Top 150 (Britt is 20th).

To solve this problem, I created my own offensive lineman AV variant. Essentially, I rebalanced some of the existing inputs of AV and added a few of my own. My variant places Kelce, Albert, and Incognito all above Britt; Alex Mack goes from 110th to 33rd most valuable OL since 1999. It also has Joe Thomas at the top, while the old way didn’t even have him in the top five. It obviously takes more than a couple examples to prove accuracy and, while it’s not perfect, it is miles better than the old system.

Setting the Positional Adjustments

We should be set now. The interior DL ceiling may look off, but that’s just because of J.J. Watt. With the modified TE data and completely re-engineered OL results, the system has changed, as all the adjustment values fluctuate as the lines move. Here are the new results:

AVPY by Position
Position Floor Total Mean Starter Floor Starter Average Ceiling
QB 0.17 4.29 6.83 10.69 16.12
RB 0.09 3.21 4.80 8.48 12.77
WR 0.11 3.33 5.52 8.15 13.27
TE 0.07 2.36 4.31 6.55 11.91
OL 0.07 2.91 3.79 5.68 12.90
INT 0.25 3.37 5.70 7.67 15.03
EDGE 0.25 3.53 6.18 8.24 13.66
LB 0.25 3.79 5.80 7.76 13.61
CB 0.33 2.71 4.60 7.04 13.10
S 0.08 2.81 5.04 6.63 12.11
Average 0.17 3.23 5.26 7.69 13.45

 

Differential from Line (Average)
Position Total Mean Starter Floor Starter Average Ceiling
QB -1.06 -1.57 -3.00 -2.67
RB 0.02 0.46 -0.79 0.68
WR -0.10 -0.26 -0.46 0.18
TE 0.88 0.94 1.14 1.54
OL 0.32 1.46 2.01 0.55
INT -0.14 -0.44 0.02 -1.59
EDGE -0.30 -0.92 -0.55 -0.21
LB -0.56 -0.54 -0.07 -0.16
CB 0.52 0.66 0.65 0.34
S 0.42 0.22 1.06 1.34

Using these values, we can finally set our adjustments.

Position Adjustment
QB -1.70
RB 0.02
WR -0.12
TE 1.04
OL 0.69
INT -0.39
EDGE -0.35
LB -0.40
CB 0.52
S 0.69

These are not perfect, but we are trying to put each position on as close to level playing field as possible and this does a pretty good job. Let’s call this new outcome Modified Approximate Value Per Year, or MAVPY for short. The big winners here are the above average to top level QBs; most would agree they are the most important position in football, so inflating their value is a nice side effect. Average-to-below-average starting OL are hurt the most, but the results are miles better than where they were when we initially started:

MAVPY by Position

[dt_divider style=”thick” /]Cap Adjustment

Before continuing on, unlike other such systems, I want to cap-adjust these values. To do that, we need to find the value the cap has in terms of MAVPY. The cap is messy and constantly changing, so, to keep things simple, we will deal strictly with average annual salary. For now, let’s find the value in terms of MAVPY of $1 million in cap using free agency contracts and MAVPY returns. Here’s a snapshot of some of the results:

MAVPY per Million$ 2014/2015

Of the free agency data I collected, the average MAVPY per million was 0.85. Going back to past years and adjusting for increases in salary cap gave similar results. So, all we need to do is multiply the projected cap of each draft slot by 0.85, and subtract the corresponding values from each pick on our existing chart.  

Value of a Roster Spot

There is also the issue of replacement level. Chase Stuart claims NFL replacement level is 3.36 AV, but I have a hard time believing that veterans off the street can consistently hit that mark. Remember, back up players aren’t in a position to generate AV unless an injury occurs, and teams clearly value these backups more than they do unsigned players. Perhaps he didn’t account for veterans that were cut in camp, but 3.36 seems more like minimum starter level than replacement level, and those are very different things. Also, the uncertainty of replacement commodities needs to be considered. Of course, that’s the problem with replacement level concepts: You can argue the semantics of them all day.

I wanted a less subjective way, so I found the value of a roster spot by multiplying the average number of players on each roster that generated AV by the number of NFL teams, and averaged that ranked MAVPY over a 16 year period. That came in at barely more than 0.50 MAVPY. I then subtracted that from every entry on the chart. So, instead of replacement level, we are adjusting for the value of a roster spot and adjusting for another concept called surplus value.

[dt_divider style=”thick” /]Surplus Value

Surplus value needs to be considered as well. There are only 53 spots on a NFL roster, with roughly 28 playing a major role. Surplus value is the additional value players with higher AVs (better players) provide in a single roster spot; spreading out the value one player has over some of those 28 roster spots isn’t going to help your teams AV total. Naturally, you want to maximize the AV total of your roster with the assets you are given to acquire players: cap space and draft picks. Everything we’ve done so far in building these models is to ultimately quantify the efficiency of draft picks, trades, and free agent acquisitions, or any group of players with a common trait that you want to test. Efficiency is great and helps us not waste assets, but going too far and having the most efficient team possible wouldn’t give us the highest AV total, as many acquisition assets would remain unused. Ideally, we can adjust for that. Before we made positional adjustments that allowed for comparison across our data set. Now, we are adjusting up and down our data set so that different MAVPY quantities are scaled correctly in quality vs quantity comparisons.

It’s a delicate line to walk, as no context is put on comparing AV in a quantity vs. quality approach. Is Ndamukong Suh worth the same as a package of Terrance Knighton and Ahtyba Rubin, with all other factors being equal? Or, assuming the contracts are the same on both sides, is Suh equal to a ridiculous package like Langston Moore, Ryon Bingham, Ethan Kelley, Markus Kuhn, Mike Martin, Junior Siavii, Beau Allen, Christian Ballard, Jay Bromley, Tank Carradine, Ryan Carrethers, and Joe Cohen? Because currently both examples are true.

Obviously, that scenario is ridiculous, but I don’t know if anyone can say for certain just how far off it is, and PFR doesn’t attempt to make the distinction either. I determined surplus value by comparing a few hundred random past contracts against each other. While this isn’t a perfect approach, basing surplus value on how teams pay for it is as close as we can get at this point. I actually found some interesting potential positional contract inefficiencies doing this, but I will explore that in detail later.

Anyway, after removing major outliers, it was clear surplus value did exist in contracts, and at least needed a small adjustment. So, I took the the average surplus results (after removing these outliers) of my contract comparisons and applied that to every MAVPY result in the system. This doesn’t change much on the surface, but it makes the MAVPY results much more accurate for quantity vs. quality comparisons. With this change, Ndamukong Suh is now valued 267% higher than that crazy package of twelve players, which seems much more appropriate.

[dt_divider style=”thick” /]Building the Draft Chart

Now that we have adjusted for career length, position, cap, surplus, and roster spot value, we can start to build the chart.

My approach differs from the one used by Meers. Instead of taking the average output of each pick, I ordered my values by output of each player in each draft class as opposed to their pick numbers. So for the 2013 class, instead of Andrew Luck’s MAVPY being at the top of the chart since he was picked first, Russell Wilson’s would be there, since he had the highest MAVPY in that draft class. I then ordered each draft class since 1999 by MAVPY and computed the average output of each ranking for that time period. Rather than valuing a player by the outcome of a certain draft slot, this instead values the player compared to the draft class, placing an emphasis on prospect evaluation rather than success at a specific pick location in the draft.

Obviously, not every prospect will be evaluated correctly. Teams will always miss on players, and that should be accounted for. And because of these misses, more and more players who outperform their draft slot become available as the draft goes on. And while the chance to land the future top performers decreases significantly in the later rounds, the chance is still there, as evidenced in 2000 with Tom Brady going on to have the highest output despite being picked 199th overall.

To factor that in, I built a second chart. This chart takes the average MAVPY outputs of each draft slot and weights the next four descending picks after it, in order to showcase what sort of talent went off the board in that range over a 16 year period. So, for example, with the first pick, I took 100% of the MAVPY average of past first overall picks, 90% of the average second pick, and so on down to 60% of the fifth pick. I weighted it this way to showcase what picks in that area have generally amounted to. I then added that together, and divided that number by four, as 400% is the sum of all five inputs. This weighting also gets rid of gigantic outliers at certain slots. It’s not perfect, but this is really just a simple process meant to showcase what picks in certain ranges have amounted to over the last 16 years.

So we have two charts. One designed to capture what each pick should amount to, and another designed to capture what a pick in a general range has actually amounted to. By balancing them with a weight towards the former, we can set the expected value of each pick.

Although my initial results are not as spread as Meers’s are due to being primarily sorted by output rather than pick, there are still some values slightly out of order due to implementing that second chart’s values and cap adjustments. The conventional approach to fixing that issue is building a curve that best fits the data. So I ran a regression on the expected result values against the corresponding pick number to find the best fit for the data set. The results were good, with an R² of 0.931137.

For those who are confused as to what R² is, think of it as a score of how well the curve fits the data, with a perfect score being 1. For comparison, Meers’s curve had an R² of 0.91599 for comparison. To compare this regression to my original data, or any other chart values for that matter, I took the average and standard deviations of each chart we are comparing, and used that to calculate the percentage over average of each pick on said chart. This puts every chart on a level playing field, which is great for comparison purposes.  (Note: Just because the percentages go negative doesn’t mean the values do.)

This is the percentage over the average value for each chart. Here is the regression compared to the Jimmy Johnson Trade Chart (JJTC) and the actual results calculated earlier:

Percentage over Chart Average at Each Draft Pick

As you can see, the regression fits the expected value data much better than the JJTC does. The JJTC seems to be especially off in the top-5 picks, and again in the 2nd and 3rd rounds. But, the regression line isn’t perfect either. The first pick is 169% overvalued, which is obviously a problem; we can do better than that. So, I manually built a chart, using abnormal deterioration concepts, to better fit the data rather than making the curve look pretty. This is not conventional, but one of the main reasons regressions are used to fit data is so values are ordered correctly, to avoid situations where the 35th pick is worth more than the 34th and so forth. I accomplished this with my manual approach, while staying much closer to actual expected value.

Here is how this new chart, which we will call the Modified Approximate Value Evaluation Model (MAVEM) Chart, turned out:

Percentage over Chart Average at Each Draft Pick

This new chart is much closer to the expected value results, as evidenced by this graph. The R² backs that up as well, coming in at 0.995. If the expected values are inconsistent, it won’t matter how accurate the line is. Luckily, the expected value results only have an error bar of about 6%, so this chart should stay very accurate as draft results gets added in future years.

What does this mean in practical terms? Based on the new chart, the average draft pick is the 110th pick, and when you factor in the cap adjustment made already, that equates to a player close to Joseph Randle, Leonard Hankerson, or Desmond Bishop; those types of players are what should be expected from that slot. The first pick is worth 316% more than the 110th pick, and is valued closer to players like Adrian Peterson, Calvin Johnson, and Brian Urlacher.

[dt_divider style=”thick” /]Applications

Seems like we’ve thought of everything, right?

Well, there’s one more thing. Draft charts are conventionally listed by pick number, and that’s how our chart is formatted right now.

But what if we replaced these pick numbers with the evaluation grades given to each prospect? A versatile player-based model would take into account relative class strength better than a static pick model would, as well as provide actionable intel on exactly how much MAVPY capital should be spent to acquire each prospect based on how they are graded by a scouting department. Knowing exactly how much to give for prospects you want to trade up for so you don’t lose value is extremely valuable, and this approach can do that.

While this idea can fit any grading scale, let’s take a simple 10-point grading scale for example. A 9.9 grade could take the place of the #1 pick, all the way down with whatever methodology the user chooses. You can even build in interesting quirks to these grading scales, like, for example, limit a perfect 10 grade to superstar QB prospects only. And with a QB that a decision maker is 100% certain on, you could even remove the baked-in uncertainty from the top chart value derived from when we merged the two charts; this would identify the top possible value a player could have in draft capital. Obviously, exceeding that value in a trade-up would be an overpay, so it serves as a natural maximum for draft pick trades.

A dynamic prospect based approach would not only fit each individual draft class more effectively, but provide more useable data in real time on draft day. While this chart was designed with draft picks in mind, the same concepts can be used in player for draft pick trades, as well as player for player trades, since the values are all in terms of MAVPY. We can use the draft slot values rather than the dynamic prospect grade approach to keep things simple for now, but it’s an easy change: By accounting for a player’s salary, possible compensation pick changes, and years left on contract, we can look at almost any transaction.

Here is how a recent Jets-Broncos trade centered around Ryan Clady looks:

Trade Results
Team: Jets Broncos
Received: 5.51 2.40
Comp difference: 0.00 -0.16
Sent: 2.40 5.51
Net: 3.11 -3.27

 

((MAVPY Received – MAVPY of Contract Received (0.85 * Salary)) * Years Left on Contract)

+ Estimated Compensatory Pick MAVPY difference

– (MAVPY Sent – MAVPY of Contract Sent (0.85 * Salary) * Years Left on Contract)

= Trade Net

Cap adjustment is already baked into the values on our draft chart, so we only need to cap adjust with players in these models. So, if Ryan Clady plays up to his career average, the Jets will gain 3.11 MAV from this deal. 4.88 MAV will be gained this year, with 0.59 lost each the next three years. Clady has 3.11 MAV of leeway to make this trade worth it, which means he needs to be at least 70.5% of the player he was in Denver, or close to the level of former Saints OL Jammal Brown. Whether that is going to be the case going forward is a question for the Jets medical and Pro Scouting staffs, but this data can inform the process of making the deal.

This chart is used for Return of Draft Capital purposes as well, which means we can evaluate the efficiency of any draft pick, or any group of players with a common trait. Possibilities include prospects coming off serious knee injuries, prospects marked with off the field questions, small school prospects of a certain position against their big school counterparts, and so on. Basically, the RODC system can be used to identify untapped efficiencies and inefficiencies to avoid in the NFL player acquisition market. We can look at how individual teams evaluate individual positions, or just evaluate in general based on all their draft choices. We can even evaluate individual scouts using grades or big boards, identifying which positions they are best and worst at scouting, or just their general competency. I currently use these concepts in my talent predictive Slaytics work to quantify how often players above and below my TLP/PST thresholds outperform their draft slot, but they can evaluate any person, system, or prospect grouping. So, expect more RODC based work in the future.

So we have adjusted for career length, positional AV flaws, salary, surplus, and the value of a roster spot in a dynamic trade model that matches past AV outputs. In doing so, we built a new AV variant for offensive lineman to replace the old flawed values. These concepts can be applied to evaluate any type of trade, free agent signing, or draft pick. We also provided some new ideas on how to apply these concepts by using evaluation grades over draft slots to set draft chart values. The amount of actionable intel these concepts generate is very exciting; I plan to use this intel to evaluate all aspects of player evaluation in the future, with this piece serving as the introduction and background to how the system works. So, now that we are clear on how the system works, we can put these concepts into action in future studies.

Follow Ethan on Twitter @NFLDrafter.

Want more Inside the Pylon? Subscribe to our podcasts, follow us on Twitter, like us on Facebook or catch us at our YouTube channel.

2 thoughts on “Player Evaluation: Modified Approximate Value Evaluation Model (MAVEM)

Leave a Reply

Your email address will not be published. Required fields are marked *