A deep learning–enabled workflow to estimate real world progression-free survival in patients with metastatic breast cancer
May 29 2024
Background: Progression-free survival (PFS) is frequently measured in oncology clinical trials. In analyses outside of the trial setting, various strategies have been utilized to assess real-world PFS (rwPFS), including manual abstraction of clinical records, natural language processing (NLP) of oncology notes and radiology reports, and longitudinal analyses of radiologic images. Here we develop and validate a new semi-automated workflow that combines NLP of clinical notes with structured electronic health record data to facilitate the evaluation of rwPFS in patients with metastatic breast cancer (mBC). Methods: This study analyzes de-identified EHR data using nference nSights. The data is extracted following privacy-preserving protocols and is HIPAA compliant. The overall cohort included 316 patients with HR-positive, HER2-negative mBC who initiated Palbociclib and Letrozole combination therapy between January 1, 2015 and December 31, 2021. We developed and implemented an ensemble of deep-learning NLP frameworks to capture progression events from unstructured clinical notes and radiology reports. A change in the line of therapy, as determined by structured drug order/administration records, was also considered a progression event. We used manually curated “ground-truth” datasets to evaluate the performance of the progression-event capture workflow at the levels of both sentences (N = 1000) and patients (N = 100) by calculating sensitivity, specificity, precision, accuracy, and F1 scores. Progression events and censoring events (death, loss to follow-up, end of study period) were considered to compute rwPFS. Results: At the sentence level, progression events were captured from clinical notes and radiology reports with a sensitivity of 99.8%, specificity of 96.7%, and accuracy of 98.2% (Table). At the patient level, initial progression was correctly captured within a window of +/-30 days with a sensitivity of 92.5%, specificity of 83.0%, and accuracy of 88.0% (Table). In a sample of 100 patients, the median rwPFS was determined to be 25 months (95% CI; 15-35 months) by manual curation and 22 months (95% CI; 15-35 months) by the semi-automated workflow. In the overall cohort, median rwPFS was 20 months (95% CI; 18-25 months). Conclusions: An ensemble of NLP algorithms extracted progression events from clinical notes and radiology reports with high accuracy. A semi-automated workflow enabled rapid and accurate determination of rwPFS in mBC patients receiving a combination chemotherapy regimen. Further evaluation of this workflow to estimate rwPFS in other cancers and therapeutic settings is warranted.