Good Statistical Methods Is No Substitute for Bad Data

Join 36.9K other subscribers

The Database of Religious History (DRH) team has recently published the first (and only, as far as I know) analytic article based on the DRH data, Religion and Ecology: A pilot study employing the database of religious history. Their results were quite disappointing. As they write in the Abstract:

However, when it came to analyzing DRH and ecological variables extracted from paleoclimate reconstructions, we were unable to replicate previous findings. We explore possible reasons for this discrepancy, including inaccurate climate reconstructions, ecological variables being unrepresentative of the standard climate, aggregation method and—perhaps most significantly—the inadequacy of existing statistical tools for dealing with the kind of complex and highly interrelated cultures that have characterized most of human history.

And their main conclusion is that “New and better methods are needed to disambiguate the messiness of history.”

I disagree. The main problem here is not in the statistical methods, but in the DRH data themselves. Before I explain, I need to acknowledge my previous history with DRH. I also want to make some general comments on the role of critique in science, and how it can be used productively.

As some readers of this blog know, the initial group that started working on Seshat: Global History Databank, included Ted Slingerland (who is now the project director of DRH). However, as we were laying the foundations of the work to come, it became clear that there were irreconcilable differences between us, partly stemming from a clash of personalities, but mainly because we disagreed on the basic issues of methodology. As a result, we split and each team proceeded building their databases in the way they thought best (I’ll say more about these divergent approaches below).

Later on, the DRH team, and Slingerland in particular, made their disapproval of our empirical approach public, by publishing several articles criticizing our data and results. For example, see Historians Respond to Whitehouse et al. (2019), “Complex Societies Precede Moralizing Gods Throughout World History”. In general, criticizing methods and the appropriateness of conclusions is an intrinsic part of doing science, because how would we otherwise reject bad methods and bad theories in favor of good ones? Our critics, however, greatly exceeded the boundaries of appropriate scientific discourse. Thus, Slingerland et al. accused us—falsely—of scientific misconduct. Most seriously, their critique included an allegation that Seshat researchers lied about the involvement of one of our expert historians in checking data. If you are interested in the sordid details, read our response to their critique.

Well, let Big God be their judge or, alternatively, let karma take care of it.

Critique is very important in science, we just need to follow several good rules: criticize methods and conclusions, not people; promote dialogue by inviting responses to your criticisms; don’t use extra-scientific power to suppress other people’s opinions—let logic and facts do your job; and so on. My remarks below are offered in this spirit.

Returning to my critique of the Religion and Ecology article, it stems from the fundamental disagreement between the Seshat and DRH approaches (which led to the split) on how the knowledge of historians about past societies should—and could—be translated into data that can be analyzed with statistical methods. The DRH approach has been to rely entirely on historians to build the database “bottom up,” by filling in the boxes online. Unfortunately, this approach leads to several serious problems, I’d say even fatal flaws.

First, operationalizing religious variables is a difficult business, and it takes a lot of effort. Some (non-religious) variables are easy to define; for example, did the warriors in the coded society use swords in battle? But what does it mean to answer the question on whether “supernatural monitoring is present”? What is “supernatural”? What is “monitoring”? Who is monitored? Who monitors? For what kind of actions? Or even (impious) thoughts? These are all hard questions to answer. What is going in the mind of an expert quickly flipping through dozens of boxes, when they choose between “yes”, “no”, or “field doesn’t know”?

I’ve written before about this problem with the DRH. Because Slingerland and colleagues used it as an example to criticize Seshat, the specific example I discussed in that post was whether the religion of the late Shang China (1250–1045 BCE) had a moralizing god, or not. At that time there was disagreement between three DRH experts, with one saying “no” and two saying “yes.” Since then two more entries were added, both by Ruiliang Liu, who coded “field doesn’t know.” So now we have the complete spectrum of all possible answers! What’s going on here? From our experience in the Seshat project, one problem is that each scholar simply understood the question in a different way. It takes a lot of effort to explain the subtleties involved in answering questions like this. The Seshat project developed a methodology to deal with this problem, see the section on “Data gathering and collation” in this article:

Turchin, P., H. Whitehouse, J. Larson, et al (2022). “Explaining the rise of moralizing religions: a test of competing hypotheses using the Seshat Databank.” Religion, Brain & Behavior: 1-28. doi:10.1080/2153599X.2022.2065345

Another reason for apparent diverging answers is that historians had different units in mind. Thus, the Robert Eno entry focused narrowly on the late Shang period (1250–1046 BCE), while the Lothar von Falkenhausen entry covered the period 1750–850 BCE, during which a lot of religious evolution happened in Ancient China. I return to the question of divergent units below.

Second, relying entirely on historian volunteers to add data results in an unbalanced sample. There are lots of scholars working on Greek history and religion, and few specialists on the religion of Scythians, for example. Or even Achaemenid Persia, despite the significance of this early mega-empire for understanding social evolution.

Furthermore, in addition to a lopsided geographic sample, the temporal sampling in DRH is not continuous. This creates problems for analysis, because cultural evolution is about change in time, and how can we understand this evolution if we only have a sample that mainly consists of gaps?

Third, historians have their own interests, and they write about what interests them. This is as should be, but their entries in the DRH cover a bewildering array of topics, some on states and empires, others on small religious sects and cults, yet others on particular temples, sanctuaries, and churches, on surviving documents and inscriptions, or even on geographic features, such as mountains.

The resulting hodge-podge of data addressing a bewildering variety of units cannot be subjected to meaningful statistical analysis. It’s worse than comparing apples and oranges (both are fruit, after all):

Source: Wikimedia

Contrast it with a successful database, like the Standard Cross-Cultural Sample, in which all data address one class of units: “cultures.” The success of SCCS is attested by literally hundreds of analytic articles that used this database. Similarly, the central unit in Seshat is a “polity.” Although Seshat is much younger than SCCS, there already are more than a dozen of analytic articles based on it, with at least four independent groups of analysts (that I know of), in addition to the Seshat team.

Because of these multiple problems, DRH currently is not a database. An attempt to treat the information in it as “data” suffers from what is known in computer science as the GIGO problem. The sad results from the Religion and Ecology article confirm this general principle. When we started building historical databases (before splitting up), one of our central goals was to test with these data cultural evolutionary theories, such as the Big Gods Theory. This is clearly impossible given the current state of the DRH.

I don’t want to end this critique in a completely negative way. What can be done? In particular, for the goal of testing rival theories? The DRH contains a ton of potentially useful information (if not data). This information needs to be curated by addressing the three problems that I explained above. It’s possible to do, as our experience in Seshat shows. Religion is a minor part of Seshat, and I, for one, would be very interested in seeing additional variables coded in a way that would make them amenable to analysis. If the DRH team is interested, I would be ready to act as consultant in such an endeavor.

Notify of
Inline Feedbacks
View all comments
  1. Home
  2. /
  3. Cliodynamica
  4. /
  5. Regular Posts
  6. /
  7. Good Statistical Methods Is...

© Peter Turchin 2023 All rights reserved

Privacy Policy