A deep learning–enabled workflow to estimate real world progression-free survival in patients with metastatic breast cancer

May 29 2024

Background: Progression-free survival (PFS) is frequently measured in oncology clinical trials. In analyses outside of the trial setting, various strategies have been utilized to assess real-world PFS (rwPFS), including manual abstraction of clinical records, natural language processing (NLP) of oncology notes and radiology reports, and longitudinal analyses of radiologic images. Here we develop and validate a new semi-automated workflow that combines NLP of clinical notes with structured electronic health record data to facilitate the evaluation of rwPFS in patients with metastatic breast cancer (mBC). Methods: This study analyzes de-identified EHR data using nference nSights. The data is extracted following privacy-preserving protocols and is HIPAA compliant. The overall cohort included 316 patients with HR-positive, HER2-negative mBC who initiated Palbociclib and Letrozole combination therapy between January 1, 2015 and December 31, 2021. We developed and implemented an ensemble of deep-learning NLP frameworks to capture progression events from unstructured clinical notes and radiology reports. A change in the line of therapy, as determined by structured drug order/administration records, was also considered a progression event. We used manually curated “ground-truth” datasets to evaluate the performance of the progression-event capture workflow at the levels of both sentences (N = 1000) and patients (N = 100) by calculating sensitivity, specificity, precision, accuracy, and F1 scores. Progression events and censoring events (death, loss to follow-up, end of study period) were considered to compute rwPFS. Results: At the sentence level, progression events were captured from clinical notes and radiology reports with a sensitivity of 99.8%, specificity of 96.7%, and accuracy of 98.2% (Table). At the patient level, initial progression was correctly captured within a window of +/-30 days with a sensitivity of 92.5%, specificity of 83.0%, and accuracy of 88.0% (Table). In a sample of 100 patients, the median rwPFS was determined to be 25 months (95% CI; 15-35 months) by manual curation and 22 months (95% CI; 15-35 months) by the semi-automated workflow. In the overall cohort, median rwPFS was 20 months (95% CI; 18-25 months). Conclusions: An ensemble of NLP algorithms extracted progression events from clinical notes and radiology reports with high accuracy. A semi-automated workflow enabled rapid and accurate determination of rwPFS in mBC patients receiving a combination chemotherapy regimen. Further evaluation of this workflow to estimate rwPFS in other cancers and therapeutic settings is warranted.

Authors:

Gowtham Varma, Rohit Kumar Yenukoti, Praveen Kumar-M, Bandlamudi Sai Ashrit, K Purushotham, Subash C, Sunil Kumar Ravi, Verghese Kurien, Avinash Aman, Mithun Manoharan, Shashank Jaiswal, Akash Anand, Rakesh Barve, Viswanathan Thiagarajan, Patrick Lenehan, Scott A. Soefje, Venky Soundararajan

Mayo Clinic, Rochester, MN nference Labs, Bengaluru, India nference, Cambridge, MA

nference

Mayo Clinic

Correspondence to:

Gowtham Varma (gowtham.varma@nference.net)