Canaries in a Coal Mine III. Is the Trend Real?



Join 36.9K other subscribers

Yesterday’s blog explained that the seemingly ‘senseless,’ ‘random’ nature of most shooting rampages is not senseless at all. Instead, the shooter is motivated by the logic of ‘social substitutability.’ In other words, random mass shootings are a variety of suicide terrorism. The aim of the terrorist is not to kill a specific person, but an organization or, even more broadly, a social institution or society as a whole. Thus, workplace rampagers want to murder the company or strike at the corporate culture. School shooters aim at the school or university. Anti-government rampagers target the state’s representatives (cops, a legislator) or offices (e.g., IRS). The Oklahoma bomber Timothy McVeigh belongs to the latter category, although he attempted to avoid being killed or arrested, unlike most rampage shooters.

These three are the most common motives in the US Political Violence Database (USPVD). Others include attacks against people falling into a certain category (e.g., ethnicity, race, gender or sexual orientation) and institutions (e.g., religion). The FBI classifies such attacks as ‘hate crimes’ (and has published data on them since 1992). However, the FBI lists thousands of such incidents, including fairly minor ones, while my database only includes those that have caused at least one fatality. More generally, because the requirements for inclusion into the USPVD differ from those used by law-enforcement agencies, and because their data are typically available only for the last couple of decades, I was unable to use their publications as data sources.

For this reason, I collected data on killing rampages as part of my overall investigation of American political violence, primarily focusing on computerized searches of newspapers. Because I am interested in trends, it is not necessary (nor possible) to chase down every single incident of political violence. I explain the logic and the methodology in my JPR paper, for those interested in technical details.

Of course, the big question whether my sample is fair – whether the trends we see are real, or a result of a flawed sampling procedure. I devoted quite a lot of effort during data collection to this question, and discussed it in the paper at length.

I used two approaches to detect any sampling biases. The first uses a kind of a triangulation approach: if we have two samples independent of each other, then we can actually estimate what proportion of events we have captured. In my former life as an ecologist I used this procedure to estimate population numbers (it is called ‘mark-recapture method’). In the JPR paper I used two datasets. One was constructed by the University of Oklahoma historian Paul Gilje, who used a variety of secondary sources to build his American Riots Database. The second dataset I collected myself by searching Hartford Courant. I chose this newspaper because Paul did not use it at all, so it gives us an independent view of the incidence of riots. Also, its coverage, unlike that of New York Times, went to the very beginning of the American Republic (and even beyond).

Both datasets showed very similar dynamics of riot incidence (a long-term secular wave, and a 50-year cycle superimposed on top of it). They differed in detail, but not in the overall pattern. So the implication is that the pattern is real.

I also used the mark-recapture method on these two datasets, and estimated that overall my database was capturing roughly 50 percent of all riot incidences.

However, some riots were relatively minor, in which only one or two people were killed, while other riots were much more spectacular with dozens or even hundreds of deaths. It would stand to reason that bloodier riots would be better reported, and would have a greater chance of being captured in my database. Indeed, the analysis showed that the USPVD captured 70 percent of riots in which three or more people were killed, and once the fatalities exceeded ten, the probability of missing an event rapidly went to zero.

We can now the same approach to determine whether there is a bias affecting killing rampages. In an ideal world, we would collect data by using an alternative source and then run a mark-recapture analysis. For example, since I relied primarily on New York Times for data, we could search Los Angeles Times. And check whether the degree of overlap between the two newspapers increased between 1960 and 2010. If yes, that would indicate that initially NYT and LAT missed more geographically distant rampages and later their coverage improved. So at least part of the rise would be due to such improved coverage.

Perhaps I will eventually do this kind of analysis, but right now I am busy with other things (and believe me, it is not a matter of a couple of days, or even a couple of weeks – I spent literally two years of my life building the USPVD). But what I can do right away is check for bias by looking separately at ‘small’ and ‘large’ events in the data (that is, with few or with many fatalities).

FBI defines ‘mass homicide’ as an incident that caused four or more fatalities. So let’s divide the rampages into those with 3 or fewer deaths, and those with 4 or more deaths. When we plot incidence of these two kinds, scaled by the US population, we see this:

rampagesWhat we see is that the two curves move exactly in parallel (minor fluctuations aside). But if New York Times coverage expanded, what we would expect is that large events would be over-represented in the past. We don’t see such a pattern, so I doubt very much that there is a systematic bias in the NYT and other newspapers reporting of these events. The rapid and accelerating rise in rampages is real.

I think what’s going on is that because of their ‘senseless’ nature, rampages capture the imagination of both news reporters and readers, and for this reason are particularly likely to be reported. Note that I still don’t think that my database captures all such events. For one thing, I used a particular set of search terms, and a newspaper article might not use one of them, so I would miss the event. But it looks that the probability of a rampage being recorded in the database does not change with time. Again, and it is worth repeating because most people not steeped in statistics don’t realize it: when characterizing time trends, it is not necessary to have a complete sample, it is perfectly sufficient to have a fair, unbiased one.

This suggests to me that when I get around to use a mark-recapture approach on this question, I will similarly not find a geographic bias in coverage. And if anybody wants to prove me wrong, you are welcome to do it yourself. The data in the USPVD has been posted on the JPR site and on my Cliodynamics website.

There is another way we can approach the issue of whether the sample is representative. Grant Duwe of Minnesota Department of Corrections recently collected data on mass murder patterns using FBI reports and news sources (PDF of the publication). This is a very valuable dataset. Unfortunately, he does not distinguish between mass homicides that were motivated by personal or crime reasons from the ‘senseless’ ones (that is, in which the principle of social substitutability was operating). This is a critical distinction for my purposes, so I cannot use his data. On the other hand, Duwe also looked at the extent of coverage of such crimes by the news media.

Here’s one look at his data:

Duwe1Figure 1. The Annual Number of Stories on Mass Murders Presented by the New York Times from 1900-1999.

This graph shows that there were relatively frequent reports of mass murder during the 1920s and 1930s, then they declined to almost nothing, started appearing again in the 1950s, and took off after the 1960s. Is this due to NYT paying more attention to such events at some point?

Let’s take a look at the next graph:

Duwe2Figure 2.  The Annual Average Number of Stories Per Mass Murder Presented by the New York Times from 1900-1999.

Indeed, during the 1920s and 1930s the NYT devoted fewer article to mass murder events than after 1960. It’s hard to see whether there is a trend after 1960 (I wish the data were plotted by 5-year intervals), but it doesn’t look that way. And if there is one, it is certainly not enough to explain the 10-fold increase that we see in the number of rampages scaled by population.

What is even more telling is that, as Duwe explains, this increased attention by the media news was due to the changing nature of mass murder. In the early half of the twentieth century most cases of mass murder involved family or people personally known to the shooter, which was not as intrinsically interesting to the NYT and its readers. It is also the kind of mass murder events that I excluded from the USPVD. So it would be interesting to determine whether the number of articles per the kind of events that go into my database changed with time. Most likely, it didn’t.

In conclusion, all evidence so far suggests that the pattern is real. In fact, it is so strong – ten fold increase – that it would take truly heroic biases in the news coverage to create it from nothing.

Also, remember that the focus is on ‘random’ rampages. I am sorry to keep harping on this point, but it’s important. It is quite possible that the other kinds of rampage, in which family members are murdered (“Man kills wife, kids, self”) or crime-related ones (Reservoir Dogs), did not change much in frequency over the last decades. Adding those together with ‘random’ killing rampages, then, would dilute the trend.

To be continued…

Notify of
Most Voted
Newest Oldest
Inline Feedbacks
View all comments

My comment is published here (

Peter Turchin

Mato, thanks for the comment and critique. One comment, however. I do not claim that indiscriminate mass murder (IMM) is directly caused by increasing inequality in the US. Rather, as you said, inequality is another manifestation of deeper processes working their way through the society. So inequality and IMM frequency are caused by something else. I have described this ‘somoething else’ in the JPR article, my article on the Conversation, and some blogs already, but a more complete explanation is in the book on which I am currently working.


May want to also consider Cramer, “Madness, Deinstitutionalization & Murder”

Peter Turchin

Thank you for this article. It looks well reasoned and is certainly highly relevant to the topic. I will be writing a blog on alterantive explanations, and this is one is one of the the most important ones that I need to discuss.


Glad to contribute. My own work has focused on understanding long-term historical violence (first chapter of first book is here but I am trying to track current situations.

I do think the structural-demographic approach in your Freakanomics blog encapsulates the link to mental illness and I suspect the two are linked. I’ll keep an email eye out for the next post.

Peter Turchin

Again, thanks! As you may know I have a great interest in frontiers and frontier violence, so I am looking forward to reading your book.


Interesting post. Coincidentally, I recently came across this post in which the author claims mass murder (defined as 3 or more fatalaties in a single incident) is not increasing. No mention is made of how their data were sampled, so I’m not sure what to make of it. Perhaps their is a dilution effect going on (which you alluded to in your post).


Is it fair to remove family mass murders when looking for causality? Suppose we’re talking about 2 crazy individuals, one who grew up in the early 1900s and one who grew up in modern times. If the 1904 guy grew up on a farm and was not required to attend school, he is naturally going to be more inclined to rampage against his perceived agitators, his family. The modern guy will likely have attended public school and would be more likely to target that institution. In both cases a crazy person is rampaging against their environment, but you are excluding some of these incidences. I do agree that not all family mass murders would fall in to the same category defined by motivation, so it would not be a clean data set to include them all either. But excluding them all seems to have some problems with it too.

  1. Home
  2. /
  3. Cliodynamica
  4. /
  5. Regular Posts
  6. /
  7. Canaries in a Coal...

© Peter Turchin 2023 All rights reserved

Privacy Policy