Emma Lurie, a third-year Ph.D. student in UC Berkeley’s School of Information, found her academic calling in the political campaigns waged during the 2016 elections. Her relatives and other people she knew were deluging her with political emails that, to her, were blatantly untrue.
“That made me wonder who started the emails in the first place and why did the recipients think they were true,” said Lurie, who specializes in data science and information policy.
She describes her field of study as “online civic infrastructure,” which includes how voters find information and the role that online platforms play in recommending information to voters. Lurie is looking into the “gray area” of misinformation and how it travels through the population.
A week before the 2020 California state election, Lurie said she heard about a rumor that Google was favoring certain sides of issues the company was interested in, specifically Prop. 22, which exempts app‑based transportation and delivery companies from providing employee benefits to certain drivers, and Prop. 24, which expands the state’s consumer data privacy laws and allow consumers to direct businesses to not share their personal information. Both measures passed.
Based on the rumor, Lurie began collecting pages of Google results and looking through the search results to see if bias could be identified. “After 2016, it seemed like the right thing to do,” she said.
The Evaluating Bias on Election-Related Google Search Results project is now moving along and Lurie is getting data science support from a group of students in Berkeley’s Data Science Undergraduate Studies through the Data Science Discovery program.
For Vyoma Raman, a third-year student majoring in computer science and interdisciplinary studies with a minor in data science, the project was an opportunity to tackle a qualitative social science question by applying computational methods. In this case, she used topic modeling to glean different themes and recurring ideas from the search results, then cluster those snippets together and sort the text into different groups.
Raman used this approach to look at two propositions: Prop. 15, which would have increased funding for K‑12 public schools, community colleges and local governments by requiring owners of commercial and industrial property to pay taxes based on the current value instead of purchase price; and Prop. 16, which would have repealed Prop. 209, passed in 1996, and allow diversity to be considered as a factor in public employment, education and contracting decisions. Both measures failed.
For Prop. 15, the search results largely fell into two distinct groups, either increased funding for schools or higher taxes. “I saw the most interpretable results for this proposition and they fit with what I saw as the reality of the election,” Raman said. She is still working on exploratory analysis of the search results surrounding Prop. 16.
“I really like the Discovery program as I get to try things out, like computational social science, and now I realize that this is definitely what I want to do,” said Raman, who is interested in applying her data science expertise to defend human rights. “This is real world stuff and adds a dimension to my studies by showing what data science is like in practice. We’re going through so many stages of the research process.”
For the Discovery students, that starts with cleaning up the data. Unlike in the Data 8 introductory class that uses a neatly packaged dataset for teaching, Lurie gave her team a lot of messy data, as well as some potentially messy questions.
“We’re evaluating ideas of what bias is and what it means,” said Anna Gueorguieva, a third-year undergraduate student majoring in data science and legal studies. “We have to define it first and determine what it means in a mathematical sense to develop the algorithms for evaluating the bias.”
For example, is it bias if the search results take information from the California secretary of state’s proposition analysis -- which is neutral -- and selectively use only some of those words to imply bias on the part of the source? Gueorguieva asked. Or do the results reflect which side is buying more ads on Google? “How we define bias will lead to different results,” Gueorguieva said.
So far, the group has not found any evidence of intentional bias, Lurie said.
“But what we did find is that the results sometimes included summaries from the secretary of state’s election information pages,” Lurie said. “This is concerning from my point of view because this takes results from election authorities and can make them seem not neutral, which can make our democratic institutions weaker.”
The work is also important because voters don’t usually get a lot of direct information on the propositions, unlike races between political candidates.
“With low levels of information, is it more or less likely that an election can be bought? This area is very understudied,” Lurie said. “We hope to help focus more attention on how local elections are represented in search engine results.”
California’s propositions date back to the 1910s when Gov. Hiram Johnson implemented them as part of his ethical reforms. One of the first propositions, passed in 1911, gave women in California the right to vote. The ability of backers to put propositions on the ballot has been called “direct democracy.”
Lurie praised the Discovery team for their hunger to tackle difficult problems and their excitement at the opportunity to work with messy data.
“They’re a great team, they’re great at picking up new things and asking critical questions,” she said. “They have autonomy and ownership over their pieces of the project.”
But the project has involved more than just data science. Lurie has been mentoring the students in how to conduct research, assigning them two papers to read each week and discuss as a group. The students declined her offer to cut the assignment to one paper a week, she said. Lurie also holds office hours for students to discuss any issues that come up. The students are also contributing to a research paper Lurie is writing about the work.
“It’s really incredible, we’re learning so much,” Raman said. “I’m so lucky to have been working with Emma.”
Gueorguieva agreed, saying she’s learning a lot, including how to formulate research questions. “It’s also valuable because I can see what kind of mentor I want to be,” said Gueorguieva, who plans to pursue a Ph.D and work to advance social justice using data science. “It’s been awesome.”