Every time you visit the doctor and watch her document your complete medical history, your data is being captured by huge electronic health records (EHR) systems on the backend. Although these data are typically captured to communicate medical information and support the business of healthcare, recent years have a growing interest in mining these data for new insights in the arena of precision medicine. However, a major roadblock towards the optimal repurposing of these data is that they are mostly unstructured (free-text) rather than well-organized and ready to analyze.

A prior Discovery Project Cohort (Taline Mardirossian, Saransh Gupta, and Rohan Narain, all Cal Class of ’20) successfully demonstrated that standard text classification techniques could achieve human-level performance on the task of converting colonoscopy reports to Mayo Scores: a standard scaling system used to assess disease activity for Inflammatory Bowel Disease. Their work is currently being presented in national conferences and undergoing preparation for publication.

We propose to extend this information extraction pipeline to other clinical text domains: CT/MRI scans, clinical notes, pathology reports and beyond. The output of your models will be the key variables needed to understand the real-world effectiveness of IBD treatments. But likely, your models will be disease agnostic. We see this work as being the very first steps towards unlocking the full information content hidden within EHR systems and accelerating clinical research across the full landscape of diseases.

View our work here

Term
Fall 2020
Topic
Public Health