“Fittest on Earth?”

It was back in 2013 when I first saw footage of Froning stepping out onto the tennis courts in Carson, on his way to sweeping the Sunday events and claiming the title of “Fittest on Earth.” I was immediately captivated by the sport and particularly keen on this title that was bestowed upon him. Claiming “Fittest on Earth” was bold and controversial for such a young sport compared to the likes of the Iron Man and the Decathlon. Was it fair for them to make such a claim?

This is the question I am attempting to answer with this project, and to do so, I believe 2 conditions must be met:
1.  We need a rigid definition for “fitness.”
2.  The Games must accurately test this definition.

1. Digging through the archives of the Crossfit Journal, you’ll come across an article entitled “What is Fitness?” back in October of 2002. This article provides us just what we need. It lists 10 qualities that must be developed to establish well-rounded fitness:

  • Cardiovascular/Respiratory Endurance
  • Stamina
  • Strength
  • Flexibility
  • Power
  • Speed
  • Coordination
  • Agility
  • Balance
  • Accuracy

Lacking in a particular ability is equivalent to saying an athlete’s fitness is biased towards one domain over another. This is what we’re trying to avoid. A good test of fitness should, therefore, be able to identify any of these deficiencies.

2. Did The 2017 Crossfit Games accurately test this definition? Let’s find out!

Data
The running hypothesis is that The Games must equally represent each of the 10 domains. In terms of scoring, each workout was weighted equally and, therefore, plays an equal role in determining the champion. I decided, then, that it would be best to break each workout down into its type and intensity of modality being tested, then average these intensities across all workouts to determine how The Games broke down as a whole.

The difficulty, however, is that these fitness domains are impossible to objectively measure given the variable nature of crossfit and the lack of physiological data available. I, therefore, decided to construct a survey and have experts in the field provide ratings on the type and intensity of fitness being tested by each workout. Asking for individuals to consider 10 variables for 13 different workouts was likely going to diminish my response rate and increase sample error, however, so I boiled the 10 domains down to 3:

  • Power: Ability to apply maximal force quickly
  • Endurance: Ability to maintain output levels over longer time periods
  • Efficiency: A combination of innate athleticism and technical refinement. What allows Alec Smith to clean and jerk as much as Thor Bjornsson, despite significantly less power capabilities.

I sent out a survey (sample question below) to Crossfit coaches all across the country asking them to rate each domain of each workout from 1-10 (10 being the most intense). For example, while a 5k and a 10k run are both endurance-focused events, the 10k is longer and would, therefore, garner a higher endurance rating.

With 50+ responses, it was time to finally determine if “Fittest on Earth” was a fair claim.

Analysis
While the survey asked the coaches to respond with their assessments of the individual workouts, my interest was in The Games as a whole. I started by calculating each respondent’s implied ratings of The Games. I did this by simply averaging across their ratings of the individual workouts. Here is how the distributions broke down:

 

 

Power Average: 7.356
Endurance Average: 6.667
Efficiency Average: 7.793

 

 

 

 

We can see that there are some clear differences in the overall intensity of the 3 domains being tested by The Games. Our running hypothesis is that these should be the same and we want to test if these results are statistically significant or if the difference in means can simply be chalked up to sampling error. To do this, we employ a one-way analysis of variance (ANOVA) test.

Upon performing the test, I determined that the differences in means were, in fact, statistically significant*. In other words, we can’t attribute these differences to sampling error, which implies that The Games did, indeed, have biases towards one modality or another. But what were these biases? To determine the specific relationships among the 3 domains, we need to employ another statistical test: The Tukey HSD (Honest Significant Difference).

The Tukey HSD allows us to compare all sets of means (Power-Endurance, Power-Efficiency, Endurance-Efficiency) to see which ones are significantly different. Upon performing the test, I determined that there were significant differences for the Power-Endurance and Endurance-Efficiency comparisons, but not for the Power-Efficiency comparison**. This implies that The Games did a good job in balancing the Power and Efficiency domains, but fell short in fully testing an athlete’s endurance capabilities.

Looking at this notion more closely, The 2017 Games only had 2 workouts with average finishing times greater than 20 minutes. These were the “Run-Swim-Run” and the “Cyclocross” events, which both tested an athlete’s cardiovascular/respiratory endurance. What The Games lacked this year was a long test (20+ min) of muscular stamina. “Murph” is a great example of such a test from year’s past. Josh Bridges and Kari Pearce, last year’s winners of “Murph”, had average finishes of 34th and 27th in these 2 cardio-centric workouts. Muscular stamina is a different type of test, and this wasn’t as well represented at The Games this year. But did this lacking amount of endurance have an impact on the leaderboard?

I started by categorizing each of the workouts into 1 of 3 domains based on which one was most dominant:

Power:                                                         Endurance:                                             Efficiency:
1 RM Snatch                                              Run-Swim-Run                                       Amanda .45
Assault Banger                                         Cyclocross                                              O-course
Strongman’s Fear                                      Heavy 17.5                                             Triple-G Chipper
Muscle-Up Clean Ladder                         Madison Triplet                                     2223 Intervals
Fibonacci Final

I took these categories and created theoretical leaderboards based on the results of the events and the scoring system implemented by The Games. I then took these point totals and regressed them on the overall point totals to see how well each domain explained the final results by assessing their R-squared values. If our theory holds true, we should see that power and efficiency do a better job of explaining the results (higher R-squared values) than does endurance.

Here are those values for the men:
Power: 0.6531
Endurance: 0.6859
Efficiency: 0.62

Very similar values across the board, suggesting that each domain had relatively equal predictive ability for the overall result. Part of the reason for this is the lack of variability at the top of the leaderboard. Fraser took the most points every single time…surprise, surprise.

And here they are for the women:
Power: 0.7634
Endurance: 0.6794
Efficiency: 0.8273

This is more like it! The highest rated domain of The Games did the best job of explaining the results, followed by power and endurance, respectively. This serves to substantiate the claim that endurance was underrepresented at The Games this year.

I didn’t want to just stop here, though. If The Games as a whole was not a perfect balance of fitness, perhaps there exists a subset of The Games that was…

I iterated through all of the possible combinations of the 13 workouts in search of the collection that minimized the variance amongst the intensities of the 3 domains. The following set of workouts was chosen:

  • Run-Swim-Run
  • Assault Banger
  • Strongman’s Fear
  • Muscle-up Clean Ladder
  • Madison Triplet

This is the distribution of the responses when only considering these workouts:

 

 

Power Average: 7.263
Endurance Average: 7.267
Efficiency Average: 7.259

 

 

 

 

Calculating the leaderboard based on these 5 workouts produces some interesting results:

Men:                                                                         Women:
1. Fikowski                                                              1. Toomey
2. Fraser                                                                   2. Reed-Beuerlein
3. Garard                                                                  3. Webb
4. Gudmundsson                                                    4. Sigmundsdottir
5. Mayer                                                                   5. Thorisdottir

First and foremost, the big standout is that Fikowski and Fraser swap places. Given that he received 1132 of a possible 1300 points during the weekend, I wouldn’t have thought there was any collection of 5 workouts where Fraser comes up 2nd. Let’s take a closer look at where the difference is coming from…

The differential between the 2 was 10 points (Fikowski – 404, Fraser – 394). Fikowski blew out Fraser on “The Assault Banger”, Fraser blew out Fikowski on “The Muscle-Up Clean Ladder”, and they were both within a spot of each other on “Strongman’s Fear” and “The Madison Triplet”. This just leaves the purest endurance event of The Games, “The Run-Swim-Run”, where Fikowski took 1st and Fraser took 7th, a differential of 28 points. When endurance plays a stronger role, we see that Fraser is not as impenetrable as he seems. So do I believe Fikowski is the true “Fittest Man on Earth”? Not exactly…and I’ll dive into why in the “Limitations of Analysis” section.

On the women’s side, the big standout for me is Reed-Beuerlein’s jump from 7th to 2nd. While she lost 38 points to both Webb and Thorisdottir on the the power-focused workouts, her top 5 finishes in both “The Run-Swim-Run” and “Madison Triplet” gained her 40 points and 76 points on the other 2 athletes, respectively. Much like Fikowski’s jump from 2nd to 1st, the extra focus on endurance plays into Reed-Beuerlein’s strength and drastically changes her finishing position.

Limitations of Analysis
The first limitation of this analysis is that it treats all of the workouts as independent events. In reality, 13 workouts over a span of 4 days is a great test of an athlete’s ability to recover and continue to perform at a high level. This is, in essence, muscular stamina. Therefore, the very nature of The Games is an athlete’s ability to endure, and every workout following the first should be biased higher and higher towards being an endurance test. This would obviously have the effect of producing more balanced domains and a more well-rounded test of fitness overall. This is where the Fikowski-Fraser discussion comes back into play. The more The Games as a whole represents a balanced test of fitness, the more an athlete’s overall performance is a better indicator of their ability than that of the subset I generated.

The second limitation I want to dive into is the fact that this analysis excludes both The Open and Regionals. While each stage of the competition should address the 3 domains, The Games’ website describes itself as a “Three Stage Journey”, and it should, therefore, be treated as such. The reasons The Open and Regionals weren’t included were that the leaderboard and competition settings weren’t consistent and that I didn’t trust I’d get the responses I was hoping for by asking coaches to consider 25 workouts.

Last, but perhaps most importantly, are my assumptions about how the 10 domains could be boiled down to 3. My goal was to isolate specializations, while staying as broad as possible. It’s easy to comprehend how powerful athletes will dominate heavy, explosive events, and athletes with good stamina will succeed in longer events, but it’s more difficult to define the quality that allows an athlete to perform 15 muscle-ups, or walk on their hands, or impeccably overhead squat. “Technique” and “Athleticism” could also have been viable descriptors. The connotations of these words may impact people’s opinions and affect the overall results of the survey.

Hope you enjoyed this project as much as I did. If you have any questions or ideas about how to further use the data, please feel free to comment or send me an email!

 

* F-value = 15.93, P-value = .000000518

** Power-Endurance P-value = .00214; Power-Efficiency P-value = .0825; Endurance-Efficiency P-value = .0000003