We have curated a massive corpus of 1.25+ million articles from over 10 diverse Kazakhstani news outlets. Spanning Kazakh, Russian, and English, this dataset offers unprecedented insight into the politics, economics, and culture of Central Asia. Our lab is currently seeking collaborators to co-author high-impact research papers using this data.
Corpus At-a-Glance
- Volume: 1.25M+ articles.
- Sources: 10+ major sites (State-run and Private).
- Languages: Russian, Kazakh, English.
- Scope: Comprehensive coverage of politics, society, culture, and economics.
Project Resources
Supported Projects
- Qandy Qantar – Nygmet Ibadildin
- Russian loanwords in Kazakh language legal texts – Rakhiya Toxanbayeva
- Eurasian Regional Integration – Anar Shaikenova
- Childrens Mental Health – Aigerim Mussabalinova
- Kazakhstani fighters in Ukraine War – Dana Nurgazinova
- Ethnic minorities in Kazakhstan – Niamh Friel
- Political economy of foreign rents – Assylzat Karabayeva
Text Analysis and Statistical Hypothesis Testing
- Text Analysis Methods
- topic modeling
- sentiment analysis
- word embeddings and LLM
- semantic networks
- Statistical Hypothesis Testing
- statistically test differences in the proportions of topic frequencies, keyword frequencies, or sentiment levels across groups
- regression analyses of topic frequencies, keyword frequencies, or sentiment levels including not only groups, but also other covariates
- time series of topics or keywords to show trends over time or contrast trends across groups
- Additional Resources
How to Collaborate
Our lab offers a “front-end/back-end” collaboration model designed to strengthen your manuscript and prepare it for submission to a top-tier journal:
- Your Role on the Front-End
- Define the research question
- Provide the theoretical framework
- Draft the “front-end” of the paper (Intro, Lit Review, Theory: approx. 3000 words).
- Our Role on the Back-End
- Query articles from the corpus.
- Consult with you to design a research method and select measures.
- Perform text analysis.
- Draft the “back-end” (Methodology, Data, Analysis – approx 3000 words).
If you would like to collaborate, we propose these first few steps:
- Contact Dr. Reidhead by email at reidhead@g.nccu.edu.tw. Briefly mention your topic and desired research outcome (e.g., journal article, MA thesis, chapter in an edited volume). Let’s schedule a time to chat and discuss your project.
- We will create a project folder for you. If the first step involves keyword searches, we will ask you to specify your keywords and search parameters. We will pull the relevant articles for you and save them to your folder.
- If you want to proceed qualitatively from there, then you can move forward on your own. If you want to employ additional quantitative text analysis methods, we can consult again and decide on next steps.
