Content
The explosion of digital text represents an unprecedented opportunity for social
research! This course surveys text pre-processing and analysis on three levels. Topics include:
- Pre-Processing: cleaning, pre-processing, exploratory analysis
- Analyzing Tokens: vectorization, word-embeddings, tagging and naming
- Analyzing Documents: text classification, sentiment analysis
- Analyzing Corpora: latent topics, semantic networks
Students will examine how text analysis is used to conduct social science. They will also apply
text analysis to social science questions. Special attention will be given to generative AI and the
ways it is rapidly augmenting text analysis.
Audience
Designed as a course for IDAS students, students in IMAS or other graduate programs
are also welcome to join.
Requirements
No prerequisites and no prior coding experience are required. Sample code will be provided. And as your instructor, I will walk you through each exercise, step-bystep. No fear! Let’s start coding!
Course Materials
- Syllabus – Text Analysis for the Social Sciences
- Module 1 – Text Analysis
- Module 2 – Latent Topic Modeling
- Module 3 – Machine Learning
- Module 4 – Semantic Networks
YouTube Tutorials
- Coding Task #1.1: Pre-processing
- Coding Task #1.2: Sentiment Analysis
- Coding Task #1.3: Keyword Analysis
- Coding Task #2.1: Latent Topic Model
- Coding Task #2.2: Comparative Analysis using Latent Topics
- SVM Classification of Offensive Tweets
- Semantic Network Analysis of a Bipartite Network
Semesters Taught
Spring 2024 – National Chengchi University
