September 20, 2018

Now that the Fall term has begun to settle down, I would like to share a snapshot of the activities in and around the Division of Data Sciences reflecting the connectedness of research, learning, and community at Berkeley.

In the next few weeks, students will be able to declare the Data Science major approved last Spring for the College of Letters and Science. An experienced advising team is in place, working through 780 student “pre-declaration” surveys and training a host of peer advisors. They have started seeing the more senior students to get them ready to declare, while working towards the many freshman intendeds. A multidisciplinary faculty Governance Committee is finalizing policy decisions and processes, while many departments (over 45 in total) are working hand-in-hand with the Division to formulate documents of shared understanding that serve all our students. During the coming year, we expect a minor to complement the major, along with our graduate offerings.

In many ways the major is the tip of the iceberg, so let me share more of what’s beneath. Data 8, Foundations of Data Science, opened the term in Zellerbach Hall, to give a chance for all interested students to see what it is about and to consider critical thinking through data. While the level of interest was even greater than what we could support, Data 8 has settled to 1,300 students from most of the majors across campus experiencing a blend of large lecture in Wheeler, online experience, and 46 small hands-on labs. Taught by David Wagner and Ramesh Sridharan, plus a team of 38 Undergraduate Student Instructors, 45 tutors, and over a hundred student assistants, it provides a welcoming learning environment despite its size.

Last year, 2,243 students completed Data 8, thanks to Ani Adhikari, John DeNero, summer instructors, and amazing student teams. Like the campus, Data 8 continues to draw a rich and broad demographic, majority female. A plurality of its students, 30%, are heading to Social Sciences majors and 11% to Business. Dozens of Connector courses interweave with Data 8, bringing new perspectives to 924 students last year, on subjects spanning from juvenile justice to immunotherapy of cancer. We launched an online version, Data 8X, on edX/BerkeleyX, serving tens of thousands of learners across the world. Our Civil and Environmental Engineering and Cognitive Science majors have already been revised to use Data 8 as a requirement, with others, including Public Health, soon to follow and dozens using it to fulfill existing requirements or as an elective.

A wonderful part of the faculty development around data science has been the summer short-courses and workshops. Once again, the 4-day Data Science Pedagogy and Practice Workshop in June brought together faculty and instructors from throughout campus, this time with an emphasis on developing new Modules. A new graduate student introduction to data science pedagogy workshop broadened our active teaching community. Particularly exciting and essential to our degree programs was a new Human Contexts and Ethics of Data Teaching workshop, attended by 38 members of our community from across the disciplines. Each part of the program is strongly shaped by teams of students, working closely with staff, faculty, and graduate researchers. We also held an NSF sponsored workshop bringing faculty from over 40 diverse universities nationwide to help them adopt the Berkeley data science pedagogy.

Students working in the Data Science Modules program have joined up with faculty throughout campus to help them offer new, typically 1-2 week, enhancements to existing courses. Last year, in fields such as Public Health, Legal Studies, Psychology, Linguistics, and Ethnic Studies, Modules in 24 courses offered a taste of data science to 1,750 students, 76% of them in Social Sciences and 11% in Arts and Humanities courses. Joanna Reed’s Sociology 130 AC Social Inequalities class builds maps and visualizes socioeconomic and demographic variation across the East Bay using qualitative observations that students collect themselves, fused with traditional census tract sources. In Susan Lin’s Linguistics 110, Introduction to Phonetics and Phonology, students record their own speech data and use a guided program, created by the Modules team, to analyze and visualize their findings. In Amy Tick’s Rhetoric R1A, The Craft of Writing, students analyze how politicians’ speeches change over time and use the conclusions they draw to illustrate a social theory for the mutability of human moral reasoning. Rudy Mendoza-Denton brought Modules into Psychology 167AC, Stigma and Prejudice, to allow students to see how they might approach psychological questions using big data. A new team of approximately 30 students is already helping faculty create new Modules this Fall.

The more advanced class called Data 100: Principles and Techniques of Data Science, co-taught by Statistics and EECS faculty (Bin Yu, Deb Nolan, Fernando Perez, Joseph Gonzales, Joe Hellerstein, and Josh Hug, so far), expanded this fall to serve 763 students, along with 43 in the pilot masters level version, beginning to stabilize after growing from 100 to 264 to 619 per semester. This course provides a gateway into the upper division of the Data Science major and other student pathways. A host of “data enabled” advanced courses have been introduced across campus, furthering these pathways. Particularly important are the new Human Contexts and Ethics courses taught by faculty from the humanities, social sciences, and professional schools, including Cathryn Carson and Margo Boenig-Liptsin’s HIST C182C/STS C100/ISF C100G, Human Contexts and Ethics of Data, Deirdre Mulligan’s INFO 188, Beyond the Data: Humans and Values, and Michel Laguerre’s AFRICAM/AMERST C134, Information Technology and Society, which together are serving more than 300 students this term.

Hand-in-hand with the education program, Research Discoveries link up enthusiastic undergraduates with faculty, graduate students, post-docs, and partners in the public sector to contribute to important research projects. Last year, 165 students participated in 44 such projects, developing insights on topics ranging from semantics of Sumerian texts to driving away air pollution in Mexico City (grand prize at the UN Data Challenge for Climate Action) to tropical rainforest ecology to distant life in the universe to helping campus achieve its equal opportunity employment goals. We have already received applications from over 500 students to participate in this semester’s approximately 50 projects.

Particularly exciting this term is a new Data Collaboratives grant from Schmidt Futures that seeks to identify brilliant student initiatives, seeded in Modules, Discovery, and Data Scholars, that can grow into open Data Collaboratives for positive social impact – moving from data to knowledge to action. As a start, Berkeley teams are diving in to the California Safe Drinking Water Challenge, which the West Big Data Innovation Hub (led out of Berkeley, in partnership with UC San Diego and University of Washington) was crucial in bringing together; it really started flowing with the California Water Data Hackathon in collaboration with the Berkeley Institute for Data Science (BIDS).

Embodying our commitment to keep participation broad from day one in Berkeley data science, the Division’s Data Scholars program is beginning its third semester in partnership with the Social Sciences D-Lab to educate the widest community of future data scientists. Building on the successful Biology Scholars and CS Scholars programs, Data Scholars extends its richness and support through a series of seminars along students’ educational journey: Foundations connecting to Data 8, Pathways opening up options, and Discovery bringing in research.

We welcome several new faculty to Berkeley connected to data science. Ziad Obermeyer, a physician and researcher who works at the intersection of machine learning and health, joined the School of Public Health. Thomas Philip, the new faculty director of the teacher education program, includes a focus on data science education at the high school level. Jiantao Jiao, joining the Electrical Engineering division, works on statistical machine learning in high-dimensional and nonparametric settings and associated applications. Niloufar Salehi joins the School of Information with interests in social computing, technologically mediated collective action, and digital labor. Peter Sudmont, in Integrative Biology, uses computational, statistical, and experimental methods to interrogate genetic and molecular phenotypic diversity at both the organismal and cellular level. Others have or will join Berkeley as well, so stay posted.

Again this term, BIDS and the Division are co-hosting Distinguished Lectures in Data Science, so you can get a sense of some the exciting data science research underway at Berkeley – join us at BIDS in 190 Doe Library on Tuesdays at 4:00.

We are looking forward to an exciting term ahead.

David Culler
Interim Dean for Data Sciences