Week 4: Text Echoes
- Week 4 Overview
- Week 4 Reading and Discussion
- Week 4 Lab: AntConc, Voyant, and other corpus-linguistics tools
Week 4 Overview
Textual Echoes
One of the earliest computational treatments of humanities text came in the
form of corpus linguistics. In a nutshell, that means the exploration of
specific linguistic features of a corpus of representative texts that helps us
understand how specific elements of a language appear in context.
Reading: We have two shorter complementary articles, one on what corpus
linguistics is and how it’s been used in the past, and one that provides an
example of corpus linguistics at work in a digital history project. See Week 4
Reading and Discussion for guided questions and links to the reading.
Lab: We’ll work with AntConc, a focused corpus-linguistics software
program for Mac or PC, that lets you explore a corpus on your own desktop.
You’ll compare the output from AntConc to Voyant’s output for the same corpus.
We also have 2 intermediate and advanced labs from The Programming
Historian available for your use. See Week 4 Lab: AntConc, Voyant, and other
corpus-linguistics tools for a full at-home walkthrough. This week’s lab is
designed for you to explore at home and troubleshoot/discuss in class. Note
that the reading is shorter to make time for that.
Collaborative data management: No student presentation.
Week 4 Reading and Discussion
Theory and Methods Reading: Chapter 33, “The History of Corpus
Linguistics” from The Oxford Handbook of Linguistics ,
https://doi.org/10.1093/oxfordhb/9780199585847.013.0034 (open-source text
available online
Exemplar Reading: Lincoln Mullen, America’s Public Bible
(https://americaspublicbible.supdigital.org/):
- Preface
- Introduction: Commenting on America’s public Bible
- Methods: The how and the why of finding biblical quotations
Discussion:
- Does corpus linguistics readily support answers to questions about change and continuity over time without any methodological adaptations? For instance:
- How does the construction of corpora differ between linguists and historians using corpus linguistics?
- How does the use of keyness and other statistical measures shape linguistic research vs historical research?
- How effective is corpus linguistics as a methodological approach to the exploration of change and continuity over time when we are working with languages outside of the Western European world and/or multilingual projects?
Further resources
- Jo Guldi, The Dangerous Art of Text Mining: A Methodology for Digital History, Cambridge University Press (2023). https://doi.org/10.1017/9781009263016. Editorial note: it is fabulous, and also a longer read than we can handle and still do an at-home tutorial. If you are going to engage in a text-mining project, this is a must-read. If you are not going to engage in text mining, we’ll come back to Chapter 1 (“Why Textual Data From the Past is Dangerous” in Week 12 when we talk about LLMs and AI.
- Quinn Dombrowski, Multilingual DH, https://quinndombrowski.com/projects/multilingual-dh/
Quick overview
Everyone should
- Download AntConc and follow the “Corpus Analysis with Antconc” tutorial by Heather Froehlich at https://programminghistorian.org/en/lessons/corpus-analysis-with-antconc
- Upload the movie-review corpus in Froehlich’s AntConc tutorial to Voyant (https://voyant-tools.org/) and
- use the keyword-in-context (KWIC) feature of Voyant in the lower-right-hand corner to get a feel for the different user-interface choices Voyant’s team and the AntConc team made
- Look for the parallel features in Voyant at https://voyant-tools.org/docs/#!/guide/tools that you used in your AntConc tutorial
UPDATED: Struggling with the user interface? Courtesy of REDACTED, here’s the most
recent YouTube tutorial from Laurence Anthony, AntConc’s lead (and only)
programmer: https://youtu.be/_GSlwIO5QZE?si=M9dwCq1XQxbG1cGG
These Programming Historian tutorials offer intermediate- or advanced-level
tutorials on other aspects of corpus linguistics
- Intermediate: https://programminghistorian.org/en/lessons/detecting-text-reuse-with-passim
- Advanced: https://programminghistorian.org/en/lessons/text-mining-with-extracted-features