Amazon recently released its Good Omens mini-series, based off of the book co-written by Neil Gaiman and Terry Pratchett. Concurrent with its release, I happened to be attending a course at the Digital Humanities Summer Institute on Stylometry with R. In a mini-project, I found a way to combine my love of fantasy literature with my bourgeoning skills in the programming language R. In the course we were learning how to use statistics to analyze style and attribute authorship. I decided to see if I could figure out which sections of Good Omens were written by Gaiman and which by Pratchett.
Gaiman has been asked this question before, and he describes nine weeks of feverish, glorious collaboration filled with writing, rewriting, swapping, and editing of sections. He concludes “People still ask us who wrote what, and, mostly, we've forgotten.” Well, stylometry can help!
Using a training set of texts by Pratchett and Gaiman, I used the R package Stylo to analyze Good Omens. (Specifically rolling nsc classification with 50 features and 5000 words per slice). The figure below shows my results. The words of the novel progress along the x axis. The pattern below the horizontal white line represents the signal from the author to whom the program attributed the majority of the authorship (Gaiman is in red and Pratchett is in green). The top, fainter pattern roughly shows how much signal there is from the other author. Together they add up to 100% in each section of the text.
I tested it against Moving Pictures by Pratchett and Coraline by Gaiman, which the algorithm indeed clasified as exclusively by Pratchett and Gaiman respectively.
I then divided the text into 5000 word chunks and re-read the resulting sections to figure out what was happening in each section. Here is a version of my visualization with a rough description of what is going on in each section. Enjoy! And check out both the miniseries and the book (although if you are reading this post about stylometry it’s probably because you are already a fan of Good Omens).
I had so much fun revisiting this amazing book.
Update: Neil Gaiman retweeted my above analysis and contributed some significant insight.
I had chosen Moving Pictures and Coraline as controls using the R function sample(), which chooses among options randomly. (I use the sample() function prominently in my outfit generator). I did notice that there was a tiny bar of Gaiman signal detected in Moving Pictures, but I had assumed that was just noise. Little did I know I was uncovering literary secrets, as Gaiman revealed when he tweeted that he did indeed have some small input into Moving Pictures. He suggested I test out Pratchett’s novel Pyramids too, which I did with the results below.
I’d like to finish by highlighting what Neil Gaiman himself pointed out about the analysis of Good Omens. The overlays are a beautiful testament to the collaborative nature of the book. Even in areas where one of the two author’s signal dominates, the other author is present. Both Gaiman and Pratchett are detectable all over their shared work. That’s a pretty great accomplishment for a collaboration.