Dartmouth College is an elite R-1 university that promotes critical thinking, creativity, and collaboration, and students in Dartmouth's Program in Quantitative Social Science (QSS) apply statistical, computational, and mathematical tools to social science questions. The University of California at Berkeley is a top-ranking public university with exceptional support for computational research collaborations between faculty and students through the Discovery program, the D-Lab, BIDS, and the Data Science major.
The number of journals and articles published have increased enormously over the past 20 years, making it increasingly difficult for scholars to keep up with the literature. Research in many academic fields requires reviewing the literature to document what we know about a phenomenon and what gaps exist in our knowledge. We are developing a flexible and reproducible method to review academic literature that takes advantage of massive online collections containing nearly all articles published in academic journals (e.g., JSTOR, MathSciNet, Web of Science, MEDLINE). The goal is to harness computers to review the entire corpus of published literature, by charting engagement with specific theories or topics over time and across subfields. This computational method stands in sharp contrast to the time-honored practice of human reading, which can cover only a small fraction of the published corpus.
Professor Heather Haveman (UC Berkeley) and Dr. Jaren Haber (Georgetown University) are analyzing hundreds of thousands of academic articles gathered from JSTOR, the leading online repository of journal articles for the social sciences. Specifically, we are developing a method to construct, validate, and apply dictionaries--which are lists of concepts (unigrams, bigrams, and trigrams) related to a specific theory or topic. Our method harnesses inductive computational text analysis methods, specifically word-embedding models (Word2Vec) and hierarchical clustering. Will you join us?