Eviction research has become a prominent topic over the past decade, highlighting the precarious conditions that low-income households face as costs rise. The consequences of an eviction are dire, impacting a family’s future housing options, health, economic well being, and generational outcomes (Benfer et al. 2021; Franzese 2018; Gaetz, Ward, and Kimura 2019), requiring an in-depth study on how these events perpetuate social stratification and relate to other social phenomena.
Research in this field has been limited due to the dearth of reliable data. Institutions like Princeton’s Eviction Lab and the Legal Services Corporation have attempted to collect roughly half of available eviction data across the country, however they rely primarily on third-party providers and courts that have prepared readily-curated data (Ashley Gromis et al. 2022; Bernstein and Youngren 2021; Eviction Lab 2018). This leaves massive gaps in information largely due to the fact that most eviction data are buried within court documents. The solution to this problem requires a data science approach to scrape and mine the text of remaining court documents and advance research with tools such as demographic estimation to determine disparities and record linkage across other datasets to predict the outcomes of eviction.
In 2019, The Evictions Study was established to fill this gap to collect a full count of eviction data using data science tools like Natural Language Processing to mine electronically scanned court records and Bayesian demographic estimation to determine racial, ethnic, and gender differences in who is evicted. Currently, the Evictions Study has accumulated 33 states of data that require extensive curation to determine the rates and trends of eviction across different states, accounting for unique local political landscapes, with the goal of updating these data and creating a panel dataset linked to other administrative data to advance scholarly research and inform policy change.
We have developed a pilot coding pipeline, mostly in R and part in python, within several GitHub repos and we need help in three areas:
(1) At a ground level, we need help improving this pipeline, which involves web scraping court records and data, analyzing the nuances of each state's eviction condition, cleaning data, estimating demographics, and profiling each state's eviction situation.
(2) At a more advanced level, we need help improving our NLP process, machine learning analysis, and record linkage to determine drivers of eviction at the households and neighborhood level.
(3) We also need help in public communication by improving our website evictions.study to host new maps and reports and also improving our visualizations and maps.