UC Berkeley’s fifth annual National Workshop on Data Science Education brought together national and international perspectives on teaching undergraduate data science. The workshop covered topics ranging from curriculum development to human contexts and ethics to student-led data science groups and programs.
“What you're building at Berkeley in data science continues to be really exceptional on many levels, so it remains highly relevant, even as an established program," said participant Julia Koschinsky, executive director of the Center for Spatial Data Science at the University of Chicago.
The conference began with sessions focused on data science at Berkeley. Presenters included John DeNero, associate dean of undergraduate studies for the Division of Computing, Data Science, and Society (CDSS), who explained the structure of the Data Science Undergraduate Studies program and its signature Data 8 course. In another session, recent Berkeley graduate Carlos Ortiz shared how he and student Ciara Acosta worked to remove barriers of entry for students underrepresented in STEM fields through the Data Scholars program and successfully scaled its “Foundations” course from 30 to over 100 students in one academic year.
“We at CDSS have put a ton of time and energy into making our materials shareable and engaging other undergraduate educators on the topics, and people have really responded to that,” said Eric Van Dusen, outreach and technology lead for the Data Science Undergraduate Studies program and the workshop organizer.
Held from June 27 to July 1, this was the first year the workshop was carried out in a hybrid format, with in-person presentations and training the first two days, online panels the next two days and technology demonstrations on the final day.
Inviting diverse perspectives on teaching data science
Data 8, the foundational course for Berkeley’s data science major, has been widely adopted at other institutions. In the “International Adopters of Data 8” session, panelists spoke of their experience teaching versions of the course in France, Colombia, Germany and South Africa.
In addition to shedding light on the challenges and opportunities inherent in adapting Data 8 to different academic and cultural settings, the presentations reflected the spirit of collaboration evident throughout the workshop. One presenter, Camilo Andrés De la Cruz Arboleda, a research professor at Universidad Externado de Colombia, has been working with his team to translate the open-source Data 8 lab materials into Spanish, making the course more accessible for many students worldwide. And Paul van Staden of the University of Pretoria in South Africa thanked Berkeley’s Deborah Nolan, associate dean for faculty for CDSS, for twice visiting his university and helping to develop its data science program.
Presentations from NSF Big Data Hubs focused on building data science programs in the United States. Moderated by Renata Rawlings-Goss, executive director of the South Big Data Hub based at Georgia Institute of Technology, the group provided a zoomed-out view of their work over the past five years, examining how data science programs have developed across the country. And Florence Hudson, executive director of the Northeast Big Data Hub at Columbia University, moderated a panel on increasing capacity for student-driven data science communities nationwide.
Two panels with representatives from Tuskegee University shared the university’s work to develop a student-driven data science program and its new, multi-year partnership with Berkeley.
Claudia von Vacano, executive director of Berkeley’s D-Lab, moderated a panel on diversity and inclusion in data science that presented research from the NSF-funded project “Undergraduate Data Science at Scale,” for which D-Lab’s David Harding serves as the principal investigator. Working with colleagues at Mills College and the University of Maryland, Baltimore County, the project examines how successfully the Berkeley data science model – as implemented at Berkeley and other universities – serves students with identities or backgrounds underrepresented in STEM fields. Panelists presented their findings and recommendations for improving the classroom and data science community experience for underrepresented students.
Building community with far-reaching impact
Throughout the conference, attendees and panelists from community colleges, public and private universities, and even high schools engaged in conversations about data science pedagogy.
"I've never experienced this radical extent of actually being serious about inviting all of us to join in,” said Koschinsky, who attended virtually last year and in-person this year. “I don't think I've ever talked to so many people during breakfast, lunch, coffee breaks and evenings at an academic event. It was basically four days of 12 hours of engagement where everyone's on fire."
In her presentation on CDSS, Jennifer Chayes, associate provost for the division, underscored the importance of collaboration between educators and between institutions, emphasizing that the collective work of educators is what achieves meaningful impact at scale.
Participant George Avirappattu, associate professor of mathematics at Kean University, noted the tremendous resources pooled together in the workshop and its worldwide following among interdisciplinary faculty and staff. “I am overwhelmed by what I am seeing and learning and how many people I am meeting. I want to channel as much as I can back to my colleagues in New Jersey," he said.
For more information and to view session recordings from this year’s event, visit the workshop’s website.