Quill and Quire

Book culture

« Back to
Quillblog

Text mining the novel

Andrew Piper

Andrew Piper

A group of researchers from 21 academic and non-academic institutes, led by Andrew Piper, associate professor and William Dawson Scholar of Languages, Literatures, and Cultures at McGill University, is undertaking the “first large-scale quantitative history of the novel” using a decidedly modern method: text mining.

NovelTM is described as a “digital humanities initiative,” the aim of which is to understand the importance of the novel in society using advanced computational analysis. Text mining uses computerized algorithms to identify and track recurring words, trends, and themes in written material, which are then converted into statistical data. While a traditional literature course might use a handful of books to study a specific topic – how the use of language has changed over time, for example – Piper says NovelTM will examine close to one million texts. “These techniques allow us to better understand the types of vocabulary that are unique to the novel at different points in time,” he says in an email. “We don’t start with preconceptions about what matters to novels, but look to use these new tools to understand what novels are saying and how.”

The application of modern analytics to classic literature is nothing new to Piper, who, as director of McGill’s .txtLAB, has spearheaded other projects including “The Poetic Body,” which examines the relationship between poetry and aging, and “Conversational Reading,” an exploration of the lingering effect of Jane Austen’s dialogue on modern writing.

“I became interested in this topic for two main reasons. The first is to see what new kinds of reading technologies can tell us about the books that we care about,” says Piper. “The second is the fact that there are way more novels out there than anyone will ever be able to read. If we are going to make claims about the novel’s cultural and social significance over the last three centuries, we are going to need ways of accounting for a far greater number than we have in the past.”

Piper is joined by co-investigators and partners from across North America and Europe, including academics specializing in literature, computer science, media and culture, philosophy, and quantitative linguistics, as well as experts in e-publishing, cartography and geoinformation, and humanities computing.

The seven-year project, which received more than $1.8 million in funding from the Social Sciences and Humanities Research Council of Canada, will focus on a different theme each year. The first is “genre,” which Piper says will explore what makes novels unique compared to other kinds of writing. “But we’ll also be looking at questions of periodization (how should we be thinking about the evolution of the genre, changes in style over time, etc.), plot, character, and narrative.”

In addition to scholarly papers and a series of books for each theme, NovelTM will also be used to create a “methods commons” – an online resource that will provide the methodological foundations of literary text mining, which Piper hopes will encourage others to begin their own projects.

“I think we’re on the brink of a major shift in how we think about and study the literary past,” says Piper. “This is just a first step.”