Application Deadlines
Priority Deadline: January 23
Final Deadline: January 30
Team Placement Deadline: February 5
Application Instructions
Use this application portal to browse this semester’s projects and submit your application.
- Browse projects and click on the project name that interest you. You will be redirected to an application page with more information on the chosen project.
- Submit an application for up to 10 maximum projects that you are interested in. Some projects may close their applications early due to high levels of interest, so students are advised to apply as soon as they can.
Program Requirements
The program offers variable units on a P/NP basis.
In order to receiving a passing grade:
- Students are only allowed to participate in 1 Discovery project per semester.
- Students are expected to commit 6 - 12 hours per week to their research project (the exact time commitment expected should be confirmed with the project partner). 1 unit of academic credit is available for every 3 hours of work per week. Students will be enrolled in a DATA 198 course for units when they join a project.
- All student teams are required to submit a Mid-Term Progress Report (this will be sent to you as a Google Form to fill out) and a Final Presentation during RRR week.
Have any questions? Email us at ds-discovery@berkeley.edu.
Highlighted projects
Academia / Education
FHA: Faculty Hiring Analysis | Moore Accuracy Lab
Collective Intelligence for Continuous Improvement | UC Berkeley CalHOPE
Linking Scientific Articles to Media Mentions | UC Berkeley
Ten Strands/CAELI - How Data can Accelerate Environmental Literacy for All CA TK-12 Students | Ten Strands and California Environmental Literacy Initiative (CAELI) (Full)
HERE to Promote Student Well-being in K-12 Education | UC Berkeley Greater Good Science Center
Identifying trends and insights from conference abstracts | Regeneron Pharmaceuticals
Predicting Insights from Conference Abstracts | Regeneron Pharmaceuticals
Visualizing Trends in Student Experience at Research Universities (Spring 2023) | Student Experience in the Research University (SERU) Consortium, CSHE, UC Berkeley
The impact of personal networks on advancing environmental education in schools | The Hebrew University of Jerusalem
Toward Computational Literature Reviews: Analyzing Organizational Theories through Citations and Concepts | Dartmouth College and UC Berkeley
Better Learning Through Better Learning Spaces | University of California, Berkeley (Full)
Access patterns of large distributed data system | LBNL
Data Visualization: Equity in College and University Athletics | Accelerate Equity (affiliated with non-profit organization Athlete Ally) (Full)
LENS: Learning the Educational Needs of Students | NWEA (Full)
OUSD Budget Analysis | Oakland Education Association (Full)
Business / Economics
Diversity, Dominance, & Discrimination: "Dominance Terms" in Venture Capital Contracts | University of California, Berkeley, Culture, Diversity, and Intergroup Relations Lab (CDIRL)
Seven Million Demand Elasticities | Arizona State University, the San Francisco Fed, and the University of Chicago
Workforce Analytics - Employee Engagement | Petaluma Health Center (Full)
Workforce Analytics-Benefit Utilization | Petaluma Health Center (Full)
Workforce Analytics - Time in Meetings vs. Productivity | Petaluma Health Center (Full, no longer taking applications)
Separating Financial Fact from Opinion | Berkeley Law (Full)
A SkyDeck backed startup empowering young stock market investors to have access to the same information as institutional investors | Wisdm (Full)
Understanding the food retail landscape: Are corporate businesses killing small businesses in Mexico? | UC Berkeley (Full)
Environment / Sustainability
Impacts of localized artificial enhancement of sea ice albedo in Arctic for the Fire-weather Worldwide | Climformatics Inc.
Optimizing the climate benefits of seaweed farming | Environmental Defense Fund
Valuation of Nature Capital | Zulu Forest Sciences
High spatial resolution mapping of emissions and air | UC Berkeley Department of Chemistry (Full)
Role of climate change in infectious disease pandemic | Climformatics Inc.
Geospatial Analysis for Wildland-Urban Interface fire | University of California, Berkeley
Building localized climate prediction product | Climformatics Inc.
Analysis of Snowfall Storms Dynamics in the Western US | UCB, ESPM (Full)
Impact of Climate Change on Mangrove Forests | IBM (Full)
ChatBot for Drinking Water Regulations | California State Water Resources Control Board (Full)
Can planting trees help fight climate change? - Building a data-driven framework for evaluating natural climate solutions | UC Berkeley (Full)
Climate Synapse Carbon Mitigation Platform | Tusher Initiative, Haas School of Business (Full)
How much CO2 does our ecosystem breathe in and breathe out? | UC Berkeley (Full)
Fighting Climate Super-Pollutants | Project Climate at Berkeley Law (Full)
Humanities
Photogrammetric model volume studies for forest fire fuel load analysis | Dept. Anthropology
AIrish | UC Berkeley Department of English, Irish Studies
PhiloBiblon: From siloed databases to linked open data via Wikibast | PhiloBiblon (Full)
FactGrid Cuneiform | FactGrid Cuneiform
Text Analysis of Securities and Exchange Commission Comments and Rules | UC Berkeley School of Law
LMC: Language Model Calibration | Moore Accuracy Lab
The Project on Arms Trade History | Berkeley History Department
Improving the Sumerian Language Linguistic Annotation Pipeline & lexicalizing the CDLI | Cuneiform Digital Library Initiative (CDLI)
Investigating the Ethics of AI Art | Denova Labs (Full)
Industry
PAYGo Lab: scalable data analytics of off-grid solar receivables payments in Africa | Catalyst Energy Advisors
Catalyzing a global movement of food systems leaders, powered by plants and data | Plant Futures Initiative (Full)
Project membership | YMCA of the East Bay (Full)
Project AEI | Koer A.I., Inc
DSG Interns - Various Projects | Data for Social Good Foundation (Full)
Technovation's data dashboard | Technovation (Full)
Creating employee-rated company review dataset | University of Colorado Denver (Full)
ZeeMee Year End Analytics and Visualizations | ZeeMee (Full)
Square's Partner-Referred Sellers | Square (Full, no longer taking applications)
Analyzing API Call Activity | Block / Square (Full, no longer taking applications)
Telematics Data Dashboard | Honda Development & Manufacturing of America LLC (Full, no longer taking applications)
Applying Data to Advance Augmented Reality | Geopogo AR+ (Full)
Community App | Baby2Baby (Full)
ImpactMapping | Impact Circles (Full)
Determining success | Callisto (Full)
CAA Members Discovery | Cal Alumni Association (Full)
Self-service data ingestion framework | Regeneron Pharmaceuticals (Full)
Fathers' UpLift Data Dashboard | Fathers' UpLift (Full)
Automated Quality Control and Analysis of Histopathology Images | Merck (Full)
Natural Sciences
Reconstructing the evolution of river systems across the Cretaceous-Paleogene boundary 66 million years ago | Earth and Planetary Science (EPS), Berkeley Geochronology Center
Ensuring authenticity of biological imaging data | University of California, Berkeley (Full)
Un-supervised behavioural classification of laboratory mice | Dan Lab at UC Berkeley (Full)
Creating a data product using Intelligent Semantic Search | Regeneron Pharmaceuticals
Network modeling to infer sparse gene co-expression estimates from single-cell sequencing data | Merck Research Laboratory; Data and Genome Sciences Department
Building allele analyzer from human genomes for CRISPR gene therapy | Clelland lab, UCSF Dept of Neurology; Weill Neurosciences Institute
Benchmark dataset generation for discovery biologics | Merck
Deep learning for pharmacokinetics and pharmacodynamics | Merck
IB 32 gecko notebook | IB 32 Bioinspired Design
Pupil Power - Developing Predictive Models of Pupil Size | DEVCOM Army Research Laboratory (Full)
Building computational tools to analyze life at single-molecule resolution | Tjian-Darzacq Lab, University of California, Berkeley (Full)
Automated Genome Manipulation Pipeline | Regeneron Pharmaceuticals (Full)
Automated Next Generation Sequencing Workflow | Regeneron Pharmaceuticals (Full)
Host Pathogen interaction in aging | University of California, Berkeley (Full)
Estimating human-technology team consensus and creativity from physiological signals | US DEVCOM Army Research Laboratory (Full)
Physical Sciences / Engineering
Anomalies detection on variable frequency drive | Powerside
New approaches to reconstructing ancient continental motions | Swanson-Hysell Group
Space Weather Drivers: Understanding Ionospheric Variability using Satellite Data | Space Sciences Laboratory (SSL) (Full)
Machine Learning-based design and modeling of Analog/Mixed-Signal Circuits | Berkeley Wireless Research Center
Real-Time Damage Assessment for Aerial Drone Imagery and Videos using AI | SpatialGIS
Using Machine Learning for 2D Seismic Facies Classification Along the Pacific Outer Continental Shelf | Bureau of Ocean Energy Management (BOEM), Pacific Region (Full)
Reconstructing 2D CT dicom images into 3D volumes and perform registration-based quality check | Merck
Creating Machine Readable Well Test Data to Support Reservoir Evaluation and CCUS Efforts | Bureau of Ocean Energy Management (BOEM), Pacific Region (Full)
Automated Optical Structure Recognition and Activity Extraction | Merck Research Laboratories
Machine Learning for Optimised Magnetic Field Computation | UC Berkeley
Fuel management in pebble bed reactors | Nuclear Engineering Department (Full)
Using vision transformer for retinal video object detection and instance-level semantic segmentations | C. Light Technologies
Machine learn with time series data to identify abnormality of air conditioner | BART (Full)
ADAM: Asteroid Discovery, Analysis, and Mapping Platform - Data Inconsistencies in the MPC | B612 Foundation (Full)
Categorization of journal entries of Operation Control Center (OCC) | BART (Full)
ML for micro robots and solar sails | Berkeley Autonomous Microsystems Lab (Full)
Public Health / Medicine
SFFD EMS and Community Paramedicine | San Francisco Fire Department (Full)
The Power of Health in Africa: Remote sensing of power quality and reliability in Congolese health facilities | Renewable and Appropriate Energy Lab, UC Berkeley
Analyzing Hospital Prices and Hospital Market Areas | UC Berkeley Petris Center (Full)
Creating patient summary using NLG | Innovaccer (Full)
Social Listening on Vaccine Confidence in Southeast Asia | UC Berkeley School of Public Health
Neuroprotective targets for treating Glaucoma | UC Berkeley School of Optometry & Vision Science
Generating 3D models of the human heart from biomedical images using deep learning | Shadden Lab (Full)
Using Data Science and Text Mining to Improve Classification and Analysis of Healthy Start Grantee Progress Reports | Health Resources & Services Administration, Maternal & Child Health Bureau, Division of Healthy Start & Perinatal Services (Full)
The Causes and Consequences of Physician Misconduct | Haas School of Business
Spirometry AI | Fisher Center for Business Analytics, Haas School of Business (Full)
Meta-analysis methods for publicly available datasets for Inflammatory bowel disease | Merck
Healthcare Pricing | UC Berkeley
Developing analysis pipeline for functional calcium imaging dataset | Kaufer Laboratory, Dept of Integrative Biology, UC Berkeley
Machine learning from brain scans for predicting TBI outcomes | Neural Systems and Data Science Lab (Full)
AHA Data Science - Analysis of > 13 million patient records | American Heart Association (Full)
Hospital Affiliations in California: Trends and Impacts | Petris Center for Health Care Market and Consumer Welfare (Full)
Deep learning brain data for detecting Alzheimer Disease | Center for human sleep science (Full)
Deriving eye gaze metrics from functional MRI data | UC Berkeley Dept of Psychology (Full)
Denoising to improve drug discovery assay models | Merck & Co (Full)
Social Sciences
Multispecies Cities Studio Datathon | UC Berkelfey
Current Status of the Arctic-related research cooperation between U.S., Russian, and Chinese | US Arctic Research Commission
Mining and Mapping Racial Attitudes in National Surveys | Goldman School of Public Policy
Ensuring the Longevity of Just As Special's Foster Care Resource Database | Just As Special (Full)
Longitudinal data harmonization of child development in Latin America | University of California, Berkeley School of Public Health
Global Poverty and Practice Minor Organizations (GPP Orgs) Database | UC Berkeley Global Poverty and Practice Minor (Full)
The Child and Adolescent Needs and Strengths (CANS) and Youth Outcomes | Aspiranet (Full)
Supreme Court Opinion Draft Viewer | UC Berkeley School of Law (Full)
Idiographic Dynamics Lab | Idiographic Dynamics Lab
The Eviction Research Network | The Eviction Research Network
Istanpolis: Visualizing local Christian communities of Ottoman Istanbul | UC Berkeley, Department of History
Natural Language Processing for raw naturalistic audio data to learn about parent-child interaction in low-income households | Center for Effective Global Action (CEGA), UC Berkeley
Understanding Racial Disparities in Behavioral Health Emergency Response | Risk Resilience Research (Full)
Filling Gaps in Social Demographic Data using Machine Learning, Energy Consumption, and Alternative Data | East Bay Community Energy (Full)
Analyzing internet broadband availability to close the digital divide. | EducationSuperHighway
UN Common Country Analysis (CCA) | United Nations Development Coordination Office
Cooperation Frameworks (CF) Good Practices Database | United Nations Development Coordination Office
NGO Insights Dashboard | DaanMatch (Full)
We Have to Talk: Resolving Conflict Through Spoken Conversation | Social and Moral Judgment Lab (SOMO) (Full)
Continued Research in D10 and San Francisco Districts | Wu Yee Children's Services
Cleaning criminal justice data | EPIC Data Lab at Berkeley
Simulating Police Responses to Calls for Service | Berkeley Police Department (Full)
Diversity, Equity, Inclusion, and Belonging Megastudy | Berkeley Culture Center
Bay Area Arrest Trends: Perceptions vs Reality | Impact Justice (Full)
A risk model for Social-Political Conflict in US States | School of Information and Breakwater Strategy (Full)
UNSDG Information Management System | United Nations Development Coordination Office (Full)
United Nations Country Team Report | United Nations Development Coordination Office (Full)
Diversity Tagging and Scoring for Films and TV | Media Metadata Research Lab (mmrl) (Full)