The advent of the BERT language model has achieved state of the art performance on a variety of Natural Language Understanding (NLU) tasks such as question-answering. These models have been trained on broad corpora such as the english wikipedia and tested on general question-answering datasets such as SQuAD. As a consequence they may not perform as well on specialty domains such as clinical medicine. Although multiple models have been developed that 'fine-tune' BERTs performance for biomedical domains using publicly-available text data such as biomedical journal abstracts (e.g. BioBERT), the expansive subdomain that is 'real-world clinical practice' (as captured by electronic health records systems) remains distinctly different. This in large part due the prevalence of acronyms, misspellings, and other highly idiosyncratic features that distinguish it from peer-reviewed text. We propose to perform additional fine tuning on a fully-identified corpus of 75M notes of clinical text at UCSF. The obvious use case will be question answering on clinical notes for the purposes of information extraction to support actual clinical practice and as well as research; however, we will also explore other potential use cases including clinical reasoning.