As with last week, we’ll be focusing on some of the “What’s next” in digital history. Unlike last week, we’ll be looking at fairly well established ideas of what’s next: machine learning (where humans give a machine a task and a dataset, and the task has built-in loops where the machine rewrites some of what it’s expected to do/look for). Auto-image-detection is fairly new in digital humanities, but machine learning has been around for a while, so we’ll tackle the image-analysis-as-new aspect of AI with machine learning rather than with generative AI like Bing or MidJourney.
Reading: Our independent reading looks at how image-based machine learning works from both a technical perspective (Arnold) and a historian’s cultural/analytical perspective (Tilton).
Lab: Our lab this week uses Google Colab and a computer-vision machine learning task to classify newspaper advertisements. The computer vision here combines textual and visual analysis, so it’s a nice bridge from the largely textual focus we’ve had so far to a more image-focused analytical process. *We’ll also tackle AI ethics using this machine-learning tutorial. Keep an eye out for that at the end of the lab.**
Go back to the Programming Historian tutorial and look at the 2 paragraphs on “Transfer Learning”, which says: a machine-learning process trained on one dataset can then “transfer” the classification strategies it “learned” to the training of a new, unfamiliar dataset.
OPTIONAL BUT REALLY GREAT: For a long-watch that ties the technical development of LLMs to the ethics of LLM use, check out Emily Bender (a famous computational linguist) on “ChatGP-why: When, if ever, is synthetic text safe, appropriate, and desirable?”.
This talk is about Bender’s take on uses of ChatGPT and does a great job of:
I recommend that you watch this at some point soon and save this for your own records.
The key to understanding new and emerging technologies is to find an analog in what you already know. Here, we want to think about pixels.
A pixel in a photograph, stripped to its most essential characteristic, is a single block surrounded by 8 other blocks:
1 | 2 | 3 |
---|---|---|
4 | PX | 5 |
6 | 7 | 8 |
Each pixel in a photograph has a red/green/blue numeric value that, mixed together using additive light-based color theory, displays a color that is visible to a human being. THIS GREEN COLOR , for instance has a Red 6 (out of 255), Green 255, Blue 6 value. Because these are numbers, computer vision can compare the numbers for each pixel and its surrounding pixels one by one and use that in a giant dataset to look for patterns that are visible in numeric form to the computer and in visual light-based-color form to the human eye.
Then read Taylor Arnold, Lauren Tilton, Justin Wigard. “Understanding Peanuts and Schulzian Symmetry: Panel Detection, Caption Detection, and Gag Panels in 17,897 Comic Strips Through Distant Viewing.” Journal of Cultural Analytics, vol. 8, no. 3, Sept. 2023, https://doi.org/10.22148/001c.87560.
As you read, consider two overview concepts and track 3 related specific things in the article.
As you read, we’ll use prompts that are similar to last week.
This week, our lab will focus on the technical steps to classify advertisements in a newspaper using machine learning: Daniel van Strien, Kaspar Beelen, Melvin Wevers, Thomas Smits, and Katherine McDonough, “Computer Vision for the Humanities: An Introduction to Deep Learning for Image Classification (Part 1),” Programming Historian 11 (2022), https://doi.org/10.46430/phen0101 .
You may or may not have the facility to do this as a hands-on tutorial, but you should all read this at least once.
For everyone
Giles Bergel et al and their library of visual computational analysis: https://www.robots.ox.ac.uk/~vgg/software/
If you want to try using Google Colab WHICH WORKS THIS WEEK WITH MINOR CHANGES | If the thought of customizing Google Colab or installing a new piece of software makes you itch
—|—
There are 2 places where you’ll get errors, both of which have cells that contain this code:
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn')
replace everything in those cells with this and then run them
!pip install seaborn
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns # Import the seaborn library
plt.style.use('seaborn-v0_8') #set the seaboard style
This site built with Foundation 6. Kalani Craig, 2025