Week 12: Text and AI

Week 12 Overview
Week 12 Reading and Discussion
Week 12 Lab: Using Generative AI
Collaborative Data Week 12 Wells

Week 12 Overview

Text, Large Language Models and AI

Digital historians get 2 kinds of questions: “can you help me with my project” aligns well with the consultative and advising model we’ve been learning all semester. “What’s next” will be the focus of Weeks 12 and 13. Because AI is the bleeding-edge technology now, and because it’s a complicated subject that has both technical and ethical dimensions, we’ll spend 2 weeks looking at AI as a model for understanding how to evaluate and engage with emerging technologies that you’ll encounter in your future.

Reading: Our independent reading will break down the ways that Large Language Models (LLMs) have developed over time, so that you can assess how to think about your existing knowledge as a way to understand new technologies.

Lab: Our lab this week walks through the technical constraints and steps involved in actually using a GPT model (GPT2, several generations old at this point), so that you have the resources to run generative-AI approaches in Google CoLab (or on your computer at home, if you have something powerful enough). This week’s lab is designed for you to explore at home and troubleshoot/discuss in class. Note that the reading is designed to make the lab more understandable.

Collaborative data management: NAME REDACTED

Teaching with/in the age of AI: Teaching with AI : a practical guide to a new era of human learning / José Antonio Bowen and C. Edward Watson. https://iucat.iu.edu/catalog/20645687

Week 12 Reading and Discussion

What even is generative AI? How does it work?

The key to understanding new and emerging technologies is to find an analog in what you already know, Step back into the world of distribution analysis (topic modeling) and keywords in context (corpus linguistics) briefly. Anchor yourself in how word distribution works and how it is useful for finding trends.

Then read Andreas Stöffelbauer, “How Large Language Models work: From zero to ChatGPT” in “Data Science at Microsoft” series on Medium, https://medium.com/data-science-at-microsoft/how-large-language-models- work-91c362f5b78f

As you read, consider two overview concepts and track 3 related specific things in the article.

As you read, track

Details
- what makes sense based on your existing training as a digital historian and on your existing technical skillset (which varies for each one of you)?
- what doesn’t make sense or is worded in such technical language that you get lost?
- where do the analogies and simplifications the author provides help you navigate the parts that don’t make sense?
Overview concepts
- How do you become a translator for other people who are less well-trained in digital history?
  - We each have to find our own idiom when we’re describing how digital history works and helping other people who aren’t trained in any aspect of digital history encounter and assess new tools.
- What can you learn from how Stöffelbauer uses analogies to create your own?

Week 12 Lab: Using Generative AI

Lab background

This week, our lab will focus on the technical steps to get a generative-AI GPT model working: Chantal Brousseau, “Interrogating a National Narrative with GPT-2,” Programming Historian 11 (2022), https://doi.org/10.46430/phen0104 .

You may or may not have the facility to do this as a hands-on tutorial, but you should all read this at least once.

For everyone

Read. Carefully. Twice.
Try to make as much sense of the analytical and speculative-text portions as possible.
Consider how you will handle emerging technology in your future: 1. What are your motivations for using new technology? 2. What do you envision your role will be as a collaborator when you leave IU and how does that help you think about framing yourself as some sort of technologist? 3. After reading about how generative AI functions, how do you feel about the use of generative AI? (we’ll dedicate a full week to generative-AI ethics in Week 13)

If you want to try using Google Colab or MiniConda | If the thought of customizing Google Colab or installing a new piece of software makes you itch
—|—

Read. Carefully. Go all the way through once before trying the hands-on version.
Search for a GPT2 Google Colab notebook and see if it has enough similar commands that match the second half of the tutorial, so that you can use the tutorial and a second Google Colab notebook to get your environment set up and running without errors.

Read. Carefully. Twice.
Pay special attention to the commands themselves, the order in which they are presented, and how the syntax works
Focus on the fact that you are learning a second language, not learning a computer-specific skill.

Collaborative Data Week 12

NB: This week was claimed by a student who led us in extracting metadata from a series of newspaper articles in order to set up a text-analysis dataset

Fall 2024 H699 Week 12