Judy Cameron speaks about DataJam at the National Workshop on Data Science Education on June 22 in Berkeley, Calif. (Photo/ Catherine Cramer)

IBM, Oracle and other technology companies realized in 2013 that data science was increasingly important to their organizations. The problem? Not enough people were trained in the field. They wanted young students to learn about it early so they’d study it in college.

One solution? The DataJam competition. Today, high school student teams and their college mentors choose a research question that’s relevant to their community, then explore how data science can solve it. Colleges find potential students. Businesses tap into a personnel pipeline.

“Every single industry needs data scientists,” Judy Cameron, a University of Pittsburgh professor and director of nonprofit Pittsburgh DataWorks that runs DataJam, said June 22 at the National Workshop on Data Science Education. “It’s a win-win-win for students and educators, for businesses and for communities.”

Through project-based programs like DataJam, students learn how data science can help solve real-world problems, said Ashley Atkins, West Big Data Innovation Hub executive director. They gain skills like how to communicate about data to people with non-technical backgrounds, and they see in practice how to consider ethics and the impact of their choices on society, she said.

DataJam, the University of Washington’s Data Science for Social Good program and Berkeley’s Data Science Discovery program – all affiliated with West Big Data Innovation Hub – are creating Data Science Experiential Pathways to connect these three programs. The effort will build and expand project-based learning opportunities, participation and workforce pipelines. Launching this fall with a focus on transportation projects, the pathways initiative will serve students in middle and high school, community and four-year colleges and graduate schools.

“We need to better understand the [ethics] implications of this as a field, and we need to be doing much, much more to train students to think about these issues,” said Jennifer Noll, a National Science Foundation (NSF) program director. Project-based programs do just that, but very few data science education awards fund these efforts, she said. “There’s an opportunity there.”

The national workshop is an annual data science education conference organized by UC Berkeley’s College of Computing, Data Science, and Society with support from Microsoft and the West Big Data Innovation Hub. It was held in Berkeley, Calif., June 20 through June 23.

The state of project-based data science programs

According to a preliminary analysis by Noll, NSF has awarded 293 data science education grants since 2007. Nearly a quarter of those were labeled possible project-based learning awards, and most were aimed at undergraduate and graduate students, Noll said.

Few of these awards have targeted teachers or K-12 students, groups that need to develop interest and capacity in learning data science to expand the pipeline, Noll said. There also needs to be more research into the how’s and why’s of project-based learning, she said. 

Only one of NSF’s 70 data science project-based learning awards focused on data ethics, Noll said. But these kinds of programs are uniquely positioned to help students learn how and when to think about the possible impacts of data creation, bias, governance and more, she said.

Data ethics issues “don’t occupy much time in courses often, and so students are seemingly very unprepared as they enter the workforce or enter undergraduate programs or graduate programs that consider these issues,” Noll said.

While most of the awards were interdisciplinary, Noll still sees opportunities to broaden participation through connecting data science to subjects like history, journalism and art. Panelists at the national workshop said they are seeing – and acting – on that same opportunity.

Sarah Stone, Ashley Atkins, Judy Cameron, Catherine Cramer and Anthony Suen celebrate on June 22 in Berkeley, Calif., the upcoming launch of the Data Science Experiential Pathways program. (Photo/ Ashley Atkins)

Doing ‘something meaningful’ with data science

Sarah Stone, executive director of the University of Washington’s eScience Institute, said 40 percent of participants in their Data Science for Social Good summer project-based learning program are from non-STEM majors. Her team looks for applicants that want to solve a problem using data rather than students who only want to experiment in using cutting-edge data tools.

“There are a lot of students out in our environments who have some amount of programming skill or who would be interested to pick it up in order to do something meaningful, to engage on a question where they can impact their community,” Stone said. “I would just encourage us all to think about how we capture those students.”

These projects and programs can make concrete, positive advances for local communities and broader society. For example, students who participated in Berkeley’s Data Science Discovery program have helped improve transparency around police misconduct, make sense of drinking water regulations for regulators and providers, and increase access to heart health data.

Project partners at Berkeley range from nonprofits like the Environmental Defense Fund to government agencies like the National Aeronautics and Space Administration. These efforts offer free data science expertise to entities that may not have this expertise, or could benefit from more.

“Students are even more engaged if the project has a real impact – if they’re working with real stakeholders,” said Anthony Suen, director of the Data Science Discovery program at Berkeley.