Kalani Craig, Ph.D.

Fall 2024 H699 Week 4

Week 4: Text Echoes

  1. Week 4 Overview
  2. Week 4 Reading and Discussion
  3. Week 4 Lab: AntConc, Voyant, and other corpus-linguistics tools

Week 4 Overview

Textual Echoes

One of the earliest computational treatments of humanities text came in the form of corpus linguistics. In a nutshell, that means the exploration of specific linguistic features of a corpus of representative texts that helps us understand how specific elements of a language appear in context.

Reading: We have two shorter complementary articles, one on what corpus linguistics is and how it’s been used in the past, and one that provides an example of corpus linguistics at work in a digital history project. See Week 4 Reading and Discussion for guided questions and links to the reading.

Lab: We’ll work with AntConc, a focused corpus-linguistics software program for Mac or PC, that lets you explore a corpus on your own desktop. You’ll compare the output from AntConc to Voyant’s output for the same corpus. We also have 2 intermediate and advanced labs from The Programming Historian available for your use. See Week 4 Lab: AntConc, Voyant, and other corpus-linguistics tools for a full at-home walkthrough. This week’s lab is designed for you to explore at home and troubleshoot/discuss in class. Note that the reading is shorter to make time for that.

Collaborative data management: No student presentation.

Week 4 Reading and Discussion

Theory and Methods Reading: Chapter 33, “The History of Corpus Linguistics” from The Oxford Handbook of Linguistics , https://doi.org/10.1093/oxfordhb/9780199585847.013.0034 (open-source text available online

Exemplar Reading: Lincoln Mullen, America’s Public Bible (https://americaspublicbible.supdigital.org/):

  • Preface
  • Introduction: Commenting on America’s public Bible
  • Methods: The how and the why of finding biblical quotations

Discussion:

  • Does corpus linguistics readily support answers to questions about change and continuity over time without any methodological adaptations? For instance:
    • How does the construction of corpora differ between linguists and historians using corpus linguistics?
    • How does the use of keyness and other statistical measures shape linguistic research vs historical research?
  • How effective is corpus linguistics as a methodological approach to the exploration of change and continuity over time when we are working with languages outside of the Western European world and/or multilingual projects?

Further resources

  • Jo Guldi, The Dangerous Art of Text Mining: A Methodology for Digital History, Cambridge University Press (2023). https://doi.org/10.1017/9781009263016. Editorial note: it is fabulous, and also a longer read than we can handle and still do an at-home tutorial. If you are going to engage in a text-mining project, this is a must-read. If you are not going to engage in text mining, we’ll come back to Chapter 1 (“Why Textual Data From the Past is Dangerous” in Week 12 when we talk about LLMs and AI.
  • Quinn Dombrowski, Multilingual DH, https://quinndombrowski.com/projects/multilingual-dh/

Week 4 Lab: AntConc, Voyant, and other corpus-linguistics tools

Quick overview

Everyone should

  • Download AntConc and follow the “Corpus Analysis with Antconc” tutorial by Heather Froehlich at https://programminghistorian.org/en/lessons/corpus-analysis-with-antconc
  • Upload the movie-review corpus in Froehlich’s AntConc tutorial to Voyant (https://voyant-tools.org/) and
    1. use the keyword-in-context (KWIC) feature of Voyant in the lower-right-hand corner to get a feel for the different user-interface choices Voyant’s team and the AntConc team made
    2. Look for the parallel features in Voyant at https://voyant-tools.org/docs/#!/guide/tools that you used in your AntConc tutorial

UPDATED: Struggling with the user interface? Courtesy of REDACTED, here’s the most recent YouTube tutorial from Laurence Anthony, AntConc’s lead (and only) programmer: https://youtu.be/_GSlwIO5QZE?si=M9dwCq1XQxbG1cGG

Extra resources:

These Programming Historian tutorials offer intermediate- or advanced-level tutorials on other aspects of corpus linguistics

  • Intermediate: https://programminghistorian.org/en/lessons/detecting-text-reuse-with-passim
  • Advanced: https://programminghistorian.org/en/lessons/text-mining-with-extracted-features

This site built with Foundation 6. Kalani Craig, 2025