Cracking the Whip of the Long Tail

So, let’s take a sample of 100 million individuals, and assume that they’re all normally distributed along… whatever imaginary one-dimensional variable you want to… imagine.

Some definitions of the variables and functions I’ll be using:

Mean: μ

Standard Deviation: σ

Cumulative Distribution Function: F \left( x \right) = \Phi \left( \frac{x - \mu}{\sigma} \right) = \frac{1}{2} \left[ 1 + \textrm{erf} \left( \frac{x - \mu}{\sigma \sqrt 2} \right) \right]

Thus, we’d expect 68,268,949 (rounding down whenever I give an integer derived from a percentage times a population) within 1 standard deviation (plus or minus), 27,181,024 to be between 1 and 2 standard deviations, 4,280,046 to be between 2 and 3, 263,645 between 3 and 4, and 6,276 between 4 and 5.

This leaves roughly 57 individuals farther than 5 standard deviations away from the mean or, looking at just the top end, 28 individuals more than 5 standard deviations above the mean. (Why 5? Because first, 5 is half of 10, and second, 28 individuals is a nice manageable number. Remember, this entire exercise is a very crude, back-of-the-envelope thought experiment.)

Let’s say that these 28 plus-five-plus sigma individuals are our ‘world class’ performers.

Now, let’s suppose that there’s a population of, why not, 5,000,000 individuals, the mean of which is just 5% of a standard deviation higher than that of the base population. Same standard deviation, so that the number of individuals from this population that exceed the +5+ threshold (as determined by the population as a whole [and yes, I realize a more rigorous model would ‘break out’ that overall 100 million into the various populations being supposed, i.e. that either adding in this 5 million or saying that 5 of those 100 millions is now like this would change the statistics of the overall / original 100 million but again this is an incredibly rough lunch-hour calculation here]) would be given by:

F' \left( \mu +5 \sigma \right) = \Phi \left( \frac{5 \sigma + \mu -  \left( \mu + 0.02 \sigma \right)}{\sigma} \right) =\Phi \left( \frac{5 \sigma - 0.02 \sigma}{\sigma} \right)=\Phi \left( 4.98 \right)

– which, out of 5 million, is almost 2. About 1.85, so, yeah, that population would be a little bit over-represented among the ‘world class’ ranks. (I.e., instead of an expected 1.43, you’d see 1.86. Which, even though it’s a difference of less than a person, is almost a 30% increase over what you’d see were it not for that +5% in the mean for the smaller population.

What about a larger-than-normal standard deviation? Again, let’s say that it’s a +5% difference, i.e., that the 5 million strong population has a standard deviation 5% larger than the 100 million population. In this case, the number above the +5+ would be given by:

F' \left( \mu +5 \left( \frac{1}{1.05} \right) \sigma \right) = \Phi \left( \frac{5}{1.05} \right)

Out of a population again of 5 million people, that gives 4, almost 5 (~4.79) individuals at the ‘world class’ level. In other words, there’s now a 234% over-representation of the smaller population at the uppermost levels. (Note: ‘over-representation’ is used here w.r.t. each individual’s value for the hypothetical variable being IID.)

What does this mean? Practically, nothing – this is a thought experiment that’s a simplified-ab-adsurdum version of an already unrealistic one-variable ‘model’ of something that’s known to be incredibly complex and interconnected (i.e., anything to do with people. Sure, the idea’s applicable to anything with a normal distribution, but an idle passage related to people read while shelving books is what prompted this thought.)

But, since it’s gotta mean something, ‘cuz otherwise I wasted my lunch hour, here are a couple of takeaways:

  1. Small differences in average (mean) values for a group compared to the larger population can be magnified at the upper and lower extremes (tails) in terms of over/under-representation of members of that group.
  2. Small differences in variance (standard deviation; I know that variance has a technical meaning but I’m looking for the closest plain-English equivalent to ‘standard deviation’) for a group (… compared to the larger population) can be greatly magnified at the upper and lower extremes (…).

 

The Buddy Bear Crawl; or, Finally a Practical Use for That PHYS-151 Problem!

What is the buddy bear crawl? This:

Buddy Bear Crawl IRL
Or rather, it’s like this, only the person underneath has their rucksack switched to the front, and you have a ruck on, too, so they’re probably grabbing you around the neck.

A wild “classic problem from introductory physics” appears! (Seriously, this was one hell of an AHA! moment for me.)

(Specifically, Young & Friedman 5-38. [Question #2 in the linked pset.])

So, for dragging something or someone along a flat surface at an angle Θ above the horizontal, you have:

F = \frac{\mu _{k}mg }{\cos \theta +\mu _{k} \sin \theta }

Note that the above is for a constant velocity – to minimize the force required, set the derivative to 0:

\frac{\mathrm{d} }{\mathrm{d} \theta }\left ( \frac{\mu _{k}mg}{\cos \theta +\mu _{k}\sin \theta } \right ) = 0\rightarrow \frac{\mathrm{d} }{\mathrm{d} \theta }\left ( \cos \theta +\mu _{k}\sin \theta \right )^{-1} = 0

\frac{\sin \theta -\mu _{k}\cos \theta }{\left ( \cos \theta + \mu _{k}\sin \theta\right )^{2}}=0\rightarrow \sin \theta = \mu _{k} \cos \theta

\tan \theta = \mu _{k}\rightarrow \theta = \arctan \mu _{k}

What’s the coefficient of kinetic friction for dragging a person + rucksack along the ground? Based on the Journal of Strength and Conditioning Research, Volume 27 / Issue 5, May 2013, p. 1175-8, I’ll go with 0.33, giving:

\theta = \arctan 0.33 \approx 18^{\circ }

So, the next time you’ve got to pull someone wearing a rucksack (filled with bricks) while you’re wearing a rucksack (filled with bricks), while crawling like a bear the entire length of the soccer pitch in a public park (at three in the morning, after hydro burpees), remember – eighteen (18) degrees above the horizontal.