The last item in James Taranto’s “Best of the Web Today” column reminded me of some research I did nine years ago. Mr. Taranto describes some “research” done at the Yale School of Public Health. The “researchers” discovered a correlation between dentist density (dentists per 10,000 population) and obesity rates. Every additional dentist apparently reduces the obesity rate by one percentage point.
Those who follow my work (both of you) have read my rants about correlation versus causation. Correlation does not imply causation. Statistics are good for testing hypotheses, not developing them. The hypotheses to be tested are produced from some sort of model of an aspect of the real world. One does not look at the data first, then develop the hypotheses. The authors admit as much, but nevertheless their “research” was posted in the Journal of the American Dental Association. (The full article is behind a paywall. I am unwilling to pay $15 for a copy.)
Based on my observations over the past 10 years, it appears that we are in the midst of a revolution, and not the good kind. Think about giving ten million monkeys a copy of SPSS and some data. Chances are excellent that 500,000 of those studies will produce statistically valid results (significant at the five percent level). Yes, of course, 500,000 is five percent of ten million. Those results are no more valid than, say, a correlation between dentist density and obesity rates.
But, as long as this is happening anyway, I offer this study. I put it together in 2005 and submitted it to the New Yorker. I didn’t expect them to publish it and my expectations were fulfilled. In fact, I had pretty much forgotten about it until today. I’ve edited it slightly, but not very much. Those who want the data in the usual Excel workbook can click here.
Suicide and New Yorker Subscriptions
An op-ed piece in the New York Times got me wondering. Showing the top three states in various categories, the article was a humorous statistical look at the state of the states. Two categories caught my attention: states with the lowest suicide rates and states with the highest subscription rates to the New Yorker. It happened that two of the three states were the same. Looking at data for 2002, the five states with the lowest suicide rates were Washington, D.C. (5.4 per 100,000), New Jersey (6.4), New York (6.4), Massachusetts (6.8) and Connecticut (7.5). States with the highest New Yorker subscription rates (per capita) were Washington, D.C. (0.025), Vermont (0.011), Massachusetts (0.009), New York (0.008) and Connecticut (0.007). With four of five states matching the data was begging for more statistical massaging.
For those who want to avoid suicide, the states with the highest rates are Alaska (20.5) and Wyoming (21.1). While their relatively low populations (hence smaller denominators) account for part of this, it also seems that states with low levels of urbanization have higher suicide rates. You may draw your own conclusion from this.
The Levels Model
Whipping out my trusty software (Microsoft Excel 2003) I ran a linear regression of total suicides on population and total New Yorker subscriptions. The data was for each state and the District of Columbia for the year 2002. The results were striking:
Total suicides = 62.7 + 0.00012 (Population) – 0.0073 (New Yorker subscriptions)
The t-statistics were 27.26 (population) and 7.76 (New Yorker subscriptions). And the adjusted R-squared was 0.96. Our simple little regression explained 96 percent of the variation in suicides!
And the implications are staggering. Apparently persuading 1,000 more people to read the New Yorker will reduce the number of suicides by 7.3. The economic cost of suicide affects everyone by reducing total national output and income. This means there is justification for the government subsidizing New Yorker subscriptions. (I have to add that some benefit-cost analysis should be done before implementing this proposal.)
The population variable is doing most of the heavy lifting in this equation. The means of the variables are 620.7 (total suicides), 5,653,953.9 (population) and 17,635.7 (New Yorker subscriptions). The contribution of each term to total suicides is
Total suicides = 62.7 + 0.00012 (5,653,953.9) – 0.0073 (17,635.7)
Total suicides = 62.7 + 686.2 – 128.2 = 620.7
The Rate Model
To compensate for the large impact of population, I did a second regression. Using the suicide rate (suicides per capita) as the dependent variable and the New Yorker subscription rate (per capita New Yorker subscriptions) as the independent variable, I got this:
Suicide rate = 0.000135 – 0.003567 (New Yorker subscription rate)
Thus a one percentage point increase in the New Yorker subscription rate will reduce the suicide rate by 0.00003567. This is actually impressive since the constant is 0.000135.
However, we need to look at the means of the two variables. The mean for New Yorker subscription rates is 0.00332. A one percentage point increase would be phenomenal, resulting in about four times the current mean. In fact, the percentage increase is 301%. Similarly the mean for the suicide rate is 0.000123. A reduction of 0.00003567 is about 28%. And that, of course, means we can calculate the elasticity of suicide rates with respect to the New Yorker subscription rate:
ξ = 28%/301% = 0.096
Not much bang for the buck there (although, again, a full benefit-cost analysis is needed before pursuing this policy recommendation).
The conclusion is clear. If you have friends who are depressed and frequently talk about suicide buy them subscriptions to the New Yorker.
There is, of course, the inevitable question of cause-and-effect. Our model assumes that subscriptions to the New Yorker are the cause of changes in the suicide rate. An alternative hypothesis is that those who are less likely to commit suicide have a higher propensity to read the New Yorker. Without additional data it is impossible to test the direction of causality.
Suicide data from the American Association of Suicidology (www.suicidology.org). New Yorker subscription data from the Audit Bureau of Circulations (www.accessabc.com, how the Alliance for Audited Media, http://www.auditedmedia.com). Population data from U.S. Bureau of the Census.