I came across a published report recently that made me wonder why people persist in reporting that there is a causal relationship when the data doesn’t justify the assertion. Actually, the reasons aren’t all that hard to figure out. Usually, it’s because the relationship seems obvious, and sometimes it is when the person writing the report has a bias they wish to share.
But I’m getting ahead of myself. Let’s start with a couple of definitions:
A correlation is simply the test of the relationship between two variables. Pearson’s coefficient, commonly used to test linear relationships between scale variables, will be 1 (or -1) for perfect correlation. Other coefficients are used for different types of variables. Tools such as SPSS that calculate correlation coefficients generally provide some guidance as to whether the relationship is significant – the strength of the correlation.
What correlation tells you is given the value of the one variable, what to expect for the value of another variable.
Causality, on the other hand, is a statement that if the value of one variable is changed then the value of the second variable will change accordingly. Correlation is necessary, but not sufficient, for a cause-and-effect relationship.
It is easy to find good examples of correlations where assuming a causal relationship would be absurd. The Wikipedia article on the topic shows a chart of Mexican lemons imported from Mexico to the US plotted against total US highway fatalities. This is an example of a coincidental correlation.
Another type of misinterpretation occurs when the order of the cause and effect is reversed. Daniel Huff’s excellent “How To Lie With Statistics” discusses the relationship between smoking and college grades. Apparently the results were used to promote the idea that giving up smoking would lead to improved grades. But it is equally feasible that lower grades caused students to take up smoking.
We can get into trouble by using more sophisticated statistical techniques without paying enough attention to the meaning of the data and the variables being used to express results. Regression analysis is a powerful tool, but look at the correlations first. Even the jargon can encourage misinterpretation and misstatements; when you are performing analysis for the ‘dependent’ variable it is easy to conclude causality where none exists.
More subtle problems can occur when some other factor is the cause for both the correlated variables. This article describes a study where eating breakfast was correlated with elementary school success. This could have resulted in the conclusion that breakfast eating caused them to be better learners. The article continues, “It turns out, however, that those who don’t eat breakfast are also more likely to be absent or tardy — and it is absenteeism that is playing a significant role in their poor performance. When researchers retested the breakfast theory, they found that, independent of other factors, breakfast only helps undernourished children perform better.” The article is from the Statistical Assessment Service – STATS – which is a non-partisan resource whose mission is to provide education on the use and abuse of science and statistics in the media.
I can’t be sure which of the fallacies were behind the ill-considered statements that were the inspiration for this article without access to the raw data. The Kauffman Foundation does some excellent work studying entrepreneurship. But their report on “The Use of Credit Card Debt by New Firms” draws some conclusions that are not justified by the data shown. The report states that “credit card debt reduces a firm’s probability of survival” (emphasis mine). It appears that the authors want to warn entrepreneurs to avoid using credit cards. All the more surprising then that two positive examples for credit card funding (Spike Lee and the Blair Witch Project movie) are named in the report. I don’t want to be hypercritical of Kaffman or the report, as there are some interesting and useful results presented. But from the data shown it seems equally likely that the businesses that failed were going to fail anyway, regardless of taking on credit debt. In fact, businesses that failed during the three years of the study actually had lower credit card debt at the end of the first year. Perhaps they did not borrow aggressively enough!
How then do you avoid drawing the wrong conclusions about cause-and-effect? And how can you deliver results from research that provide useful guidance for actions that forward the organizational goals?
First, avoid making statements that imply the correlations imply causality. Consider the other possibilities such as reverse causality or another variable that wasn’t measured. However, don’t be too pedantic or academic either. It is often fair to say that there may be a cause-and-effect relationship. And frequently the changes that will positively impact one variable will be beneficial to the organization as long as they make sense on the face of it.
If you really need to confirm causality, you’ll generally need to do some sort of study that is repeated over time. By including the same people in the sample, you’ll have good assurance that changes you see in Overall Satisfaction can be connected with the changes you make from one wave to the next – such as for Speed of Connecting to a Customer Service Representative. If you don’t use the same people, you’ll have to take more care to make sure the samples are the same as far as possible.
For more examples that will help you critically review your own and others’ work, check out this great list of correlation/causality fallacies.
And finally, I couldn’t resist this cartoon on the topic from XKCD: