So, let’s take a sample of 100 million individuals, and assume that they’re all normally distributed along… whatever imaginary one-dimensional variable you want to… imagine.
Some definitions of the variables and functions I’ll be using:
Mean: μ
Standard Deviation: σ
Cumulative Distribution Function:
Thus, we’d expect 68,268,949 (rounding down whenever I give an integer derived from a percentage times a population) within 1 standard deviation (plus or minus), 27,181,024 to be between 1 and 2 standard deviations, 4,280,046 to be between 2 and 3, 263,645 between 3 and 4, and 6,276 between 4 and 5.
This leaves roughly 57 individuals farther than 5 standard deviations away from the mean or, looking at just the top end, 28 individuals more than 5 standard deviations above the mean. (Why 5? Because first, 5 is half of 10, and second, 28 individuals is a nice manageable number. Remember, this entire exercise is a very crude, back-of-the-envelope thought experiment.)
Let’s say that these 28 plus-five-plus sigma individuals are our ‘world class’ performers.
Now, let’s suppose that there’s a population of, why not, 5,000,000 individuals, the mean of which is just 5% of a standard deviation higher than that of the base population. Same standard deviation, so that the number of individuals from this population that exceed the +5+ threshold (as determined by the population as a whole [and yes, I realize a more rigorous model would ‘break out’ that overall 100 million into the various populations being supposed, i.e. that either adding in this 5 million or saying that 5 of those 100 millions is now like this would change the statistics of the overall / original 100 million but again this is an incredibly rough lunch-hour calculation here]) would be given by:
– which, out of 5 million, is almost 2. About 1.85, so, yeah, that population would be a little bit over-represented among the ‘world class’ ranks. (I.e., instead of an expected 1.43, you’d see 1.86. Which, even though it’s a difference of less than a person, is almost a 30% increase over what you’d see were it not for that +5% in the mean for the smaller population.
What about a larger-than-normal standard deviation? Again, let’s say that it’s a +5% difference, i.e., that the 5 million strong population has a standard deviation 5% larger than the 100 million population. In this case, the number above the +5+ would be given by:
Out of a population again of 5 million people, that gives 4, almost 5 (~4.79) individuals at the ‘world class’ level. In other words, there’s now a 234% over-representation of the smaller population at the uppermost levels. (Note: ‘over-representation’ is used here w.r.t. each individual’s value for the hypothetical variable being IID.)
What does this mean? Practically, nothing – this is a thought experiment that’s a simplified-ab-adsurdum version of an already unrealistic one-variable ‘model’ of something that’s known to be incredibly complex and interconnected (i.e., anything to do with people. Sure, the idea’s applicable to anything with a normal distribution, but an idle passage related to people read while shelving books is what prompted this thought.)
But, since it’s gotta mean something, ‘cuz otherwise I wasted my lunch hour, here are a couple of takeaways:
- Small differences in average (mean) values for a group compared to the larger population can be magnified at the upper and lower extremes (tails) in terms of over/under-representation of members of that group.
- Small differences in variance (standard deviation; I know that variance has a technical meaning but I’m looking for the closest plain-English equivalent to ‘standard deviation’) for a group (… compared to the larger population) can be greatly magnified at the upper and lower extremes (…).