I had a wonderful birthday last week. I celebrated at Petco Park, watching my San Diego Padres eliminate the Los Angeles Dodgers. The whole thing was wild and fantastic.
Not surprisingly, Dodger fans started complaining about alleged unfairness while the players were still celebrating on the field, especially as it relates to the new playoff format — which added another wildcard team (the Philadelphia Phillies, not the Padres).
I could offer multiple counters to this complaint (for example, any “fairness” argument ought to consider the lack of equity among teams and their payrolls). However, unless you want to eliminate playoffs altogether, we have much too small a sample size even to evaluate how fair or unfair the new playoff system is. Moreover, if the Houston Astros, clearly the best American League team this season, wins the World Series, the unfairness argument doesn’t really work.
That said, it won’t stop Dodgers fans from whining.
Our general difficulty in dealing with statistical questions like sample size is the subject of this week’s TBL.
If you like The Better Letter, please subscribe, share it, and forward it widely.
Thanks for reading.
A Collective Statistical Illiteracy
More than a century ago, H.G. Wells recognized that statistical thinking would become as necessary for good citizenship as being literate. And he was right.
Today, nearly everyone (in the developed world, at least) learns to read and write. Statistical thinking – how to understand information about risks and uncertainties in our complex world? Not nearly so common.
We suffer from a collective statistical illiteracy.
We are all prone to this innumeracy, which is “the mathematical counterpart of illiteracy,” according to Douglas Hofstadter. It describes “a person’s inability to make sense of the numbers that run their lives.” Although Hofstadter coined the term, mathematician John Allen Paulos popularized the concept with his book, Innumeracy: Mathematical Illiteracy and Its Consequences. While illiteracy strikes mostly the uneducated, we are all prone to innumeracy, no matter how high our status or education.
We all tend toward innumeracy generally, and we’re especially bad at probability.
If a weather forecaster says that there is an 80 percent chance of rain today and it remains sunny, instead of waiting to see if it rains 80 out of 100 times when his or her forecast called for an 80 percent chance of rain (or even eight in ten), we race to conclude — perhaps based upon that single instance — that the forecaster isn’t any good.
To do math, neither maturity nor knowledge of human nature and experience are required. All that is needed is the ability to perceive patterns, logical rules, and linkages. But because of the enormous sets of random variables involved in real life, patterns, logical rules, and linkages alone do not solve many actual puzzles. Correlation does not imply causation. Information may be cheap, but meaning is expensive and elusive. Insight is priceless.
In the early 1980s, the A&W restaurant chain introduced a new “Third Pounder” hamburger to challenge the Quarter Pounder from McDonald’s. Taste-tests showed that consumers thought it tasted better and it was cheaper, but it wasn’t selling. The company undertook some focus group research to figure out what the problem was. A&W discovered that consumers bought the Quarter Pounder over the Third Pounder because they wanted more meat. Because the “3” in ⅓ was smaller than the “4” in ¼, “customers believed they were being overcharged.”
We struggle with orders of magnitude, too. For example, it takes only about eleven-and-a-half days for a million seconds to tick away, whereas it takes almost thirty-two years for a billion seconds to pass. Few of us see the jump from a million to a billion as nearly so big.
We’re also prone to the gambler’s fallacy – we tend to think that randomness is somehow self-correcting (the idea that if a fair coin is fairly tossed nine times in a row and it comes up heads each time, tails is more likely on the tenth toss). However, as the financial services commercials take pains to point out (as the SEC requires), past performance is not indicative of future results. On the tenth toss, the probability remains 50 percent.
Lotteries are, by definition, long-shots. Alexander Hamilton, however, recognized that most of us “would prefer a small chance of winning a great deal to a great chance of winning little.” As more people play, pots increase, and vice versa. A bad bet gets worse. So, the worse the odds of winning became, the more people want to play. For most people, the difference between one-in-three-million odds and one-in-three-hundred-million odds simply doesn’t matter.
With Powerball, the odds of winning its Grand Prize are one in 292,201,338. That means the expected value of a $2 ticket is just $1.35. It’s a pretty stupid play, but about half of us do it anyway (it’s a $91 billion dollar endeavor). Remarkably, many have somehow convinced themselves it’s a good deal, and “you have to be in it to win it.”
To make matters worse, every state lottery is regressive, meaning that it takes a disproportionate toll on low-income citizens. According to Bankrate, players making more than fifty thousand dollars per year spend, on average, one percent of their annual income on lottery tickets; those making less than thirty thousand dollars spend thirteen percent.
Americans spend more on lottery tickets every year than on cigarettes, coffee, or smartphones and they spend more on lottery tickets annually than on video streaming services, concert tickets, books, and movie tickets combined.
When we want something to be true, it’s easy for the human mind to create reasons why it is true. Or can be true. Or might be true. And it’s only a couple of bucks….
Most of us tend to filter out the bad and the failed to focus on the good. Casinos encourage this tendency by making sure that every slot machine win, no matter how small, comes with music and flashing lights. It isn’t hard to conclude that everyone’s winning. Losses and failures remain silent. So, we play even though it’s a losing proposition.
We also have trouble with scale.
At first glance, it looks like Bernie might be onto something; $25 billion in a quarter is a lot of money. But it’s not really a big deal when the scale of the businesses and the market sector are considered.
Back in 2008, then Exxon CEO Rex Tillerson noted that the company spends a billion dollars a day to run the business. It’s more now, and there are many more oil companies. Moreover, oil companies, over the last ten years, have been among the least profitable businesses overall and are far from the most profitable now. In terms of market returns, the energy sector is the worst performer within the S&P 500 over the decade ended September 30.1
From a mathematical perspective, at least, Bernie is full of it.
The conjunction fallacy is another common problem whereby we see the conjunction of two events as being more likely than either of the events individually. Consider the following typical example. A group of people was asked if it was more probable that Linda was a bank teller or a bank teller active in the feminist movement from the following data points: “Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice and participated in anti-nuclear demonstrations.” Fully 85 percent of respondents chose the latter, even though the probability of two things happening together can never be greater than that of the events occurring individually.
Now suppose that Company X has a workforce that is only 20 percent female. The base-rate fallacy would suggest that the company is discriminatory. But further analysis is required. If the applicant pool was only 10 percent female, Company X might have an exemplary record of hiring women.2
Let’s try a problem. We’ll start with the following assumptions: (1) The probability that a woman has breast cancer is one percent; (2) If a woman has breast cancer, the probability that she tests positive for it is 90 percent; and (3) If a woman does not have breast cancer, the probability that she nevertheless tests positive is nine percent.
If you know nothing else about them, how many women who test positive for breast cancer have the disease? Choose the best answer.
nine in 10
eight in 10
one in 10
one in 100
The best answer is (3).
That 90 percent of women with breast cancer get a positive result from a mammogram doesn’t mean that 90 percent of women with positive results have breast cancer. The high false-positive rate, combined with the disease’s overall prevalence of one percent, means that roughly nine out of 10 women with a worrying mammogram don’t have breast cancer.
That so many of us screw up this sort of calculation is interesting, but not alarming. However, when German psychologist Gerd Gigerenzer asked a similar question to practicing gynecologists, the results were alarming. Only 21 percent got the right answer – worse than random guessing. It seems that a shocking proportion of doctors don’t understand garden-variety medical statistics.
Physicians’ innumeracy has enormous practical consequences. Months after receiving a false-positive mammogram, one in two women reported considerable anxiety about mammograms and breast cancer, while one in four reported that this anxiety affected their daily mood and functioning.
Practicing gynecologists should know the impact of breast cancer screening and, failing that, should be able to figure it out from the data!
Fortunately, a bit of training and a better framing of the question solved the problem.
Assume you conduct breast cancer screening using mammography in a certain region. You know the following information about the women in this region.
Ten out of every 1,000 women have breast cancer
Of these 10 women with breast cancer, 9 test positive
Of the 990 women without cancer, about 89 nevertheless test positive
After learning in a training session how to translate conditional probabilities into natural frequencies, the gynecologists’ confusion all but disappeared. A chart helps, too.
Nobel laureate Daniel Kahneman offers an interesting explanation for why it is so difficult for people generally to compute and deal with probabilities. He did not point to innumeracy. Instead, he said, “to compute probabilities you need to keep several possibilities in your mind at once. It’s difficult for most people. Typically, we have a single story with a theme. People have a sense of propensity, that the system is more likely to do one thing than the other, but it’s quite different from the probabilities where you have to think of two possibilities and weigh their relative chances of happening.”
In his famous 1974 Cal Tech commencement address, Richard Feynman talked about the scientific method as the best means to achieve progress. Even so, notice what he emphasized: “The first principle is that you must not fool yourself – and you are the easiest person to fool.” The examples above make Feynman’s point. It’s easy to fool ourselves, especially when we want to be fooled – we all really like to be right and have a vested interest in our supposed rightness.
If we are going to be data-driven (that that’s a very good thing), we need to check our work, our biases, and our framing very carefully anyway, and especially because we suck at math and are even worse at probability. Make sure you’ve got your math correct.
Remember, it isn’t just the other guy. We suffer from a collective statistical illiteracy.
Totally Worth It
Puzzle: What is the largest possible remainder that is obtainable when a two-digit number is divided by the sum of its digits? Solution below.
Feel free to contact me via rpseawright [at] gmail [dot] com or on Twitter (@rpseawright) and let me know what you like, what you don’t like, what you’d like to see changed, and what you’d add. Praise, condemnation, and feedback are always welcome.
This is the best thing I read this week. The saddest. The strongest. The sweetest. The pettiest. The funniest. The most interesting. The most insightful. The most nonsensical. The most unknown. The most sensible. The least surprising, unless it’s this or this. Wow. Pretty good evidence The New York Times mostly exists to tell progressives what they want to hear. Racism.
Don’t forget to subscribe and share TBL. Please.
Of course, the easiest way to share TBL is simply to forward it to a few dozen of your closest friends.
Solution: Since the remainder of a division is always less than the divisor used, we can begin our search for the largest possible remainder by looking at the largest possible divisor, that is the largest possible sum of the digits of a 2-digit number. The largest possible sum of digits is 18. And 99 divided by 18 leaves remainder 9. The next largest is 17, which could come from 89 or 98. Doing the division, we see 89 divided by 17 leaves remainder 4 and 98 divided by 17 leaves remainder 13. The next largest sum of digits is 16, which could come from 88, 97, 79. And division shows that 88 divided by 16 leaves remainder 8; 97 divided by 16 leaves remainder 1; and 79 divided by 16 leaves remainder 15 (the largest remainder so far). Any sum of digits below 16 will have remainder below 15, so the remainder of 15 that we have achieved must be the largest possible.
~ from The Ultimate Mathematical Challenge
Please send me your nominees for this space to rpseawright [at] gmail [dot] com or via Twitter (@rpseawright).
The TBL Spotify playlist, made up of the songs featured here, now includes more than 235 songs and about 17 hours of great music. I urge you to listen in, sing along, and turn up the volume.
Benediction
Kacey Musgraves sang “Walk in Peace,” at a tribute to John Prince at the Ryman last week in honor of the late songwriter’s 76th birthday. It’s today’s benediction.
Amen.
Thanks for reading.
Issue 126 (October 21, 2022)
Most estimates suggest oil companies make about seven cents per gallon of gasoline sold. Meanwhile, the federal and state governments “earn,” on average, about 50 cents per gallon in taxes (over 72 cents here in California).
If you want to learn more in this area, you might start with this paper on teaching statistics.
I think the Bankrate study fails the test of common sense. It reports that the portion of household income spent by folks self reporting an income of less than $30,000 as 13% on lottery tickets, 11% on alcohol, 13% on tobacco and 4% on gambling. This totals 41% Doesn't leave much income for food, clothing and shelter. More likely that people self reporting are bad at math...