## Thursday, November 17, 2011

### Statistical Fallacy #176: Ignoring Selection Effects

I stumbled on a post at a credit-related blog. It starts off with the bombastic first line
"The average consumer is saddled with \$29,985 in student loan debt..."
Wow! That's a lot of debt! It's true that the U.S. population has a giant amount of student loan debt -- even more so than the amount of credit card debt. Last year, I wrote about the subject. But, the figure above is pretty high. That statistic is drawn from "262,887 CreditKarma.com user scores." Sounds pretty robust. But, some simple math reveals there's more to the story.

Facts:
• Total student loan debt in the U.S. is about \$1 trillion (~\$1,000,000,000,000).
• The U.S. population is 308,745,538. Of that, 24% are under 18, leaving 234,646,609 adult consumers.

Do some division, and you'll find that the average adult consumer has \$4261.73 in credit card debt. That's about \$25,000 less than the Credit Karma estimate!

What went wrong? My guess: selection effects. Members of a site specializing in credit advice are not a random sample of the population. People who join are probably concerned about their credit... and people who are concerned about their credit probably have a lot of debt.

Nothing personal against the writers for that site, as it would be an easy mistake to make (and they were very nice, even in response to my snarky comment pointing this out). But still, they should have been more careful. A quick test, by multiplying their estimate of average debt by the number of consumers, finds that the U.S. has a total of \$7,035,878,570,865 in student loans outstanding, about seven times the real figure. If it were true, that would be about 11% of the entire world GDP owed by American students!

The lesson: look out for non-random sampling due to self-selection, or your numbers will be nonsense.