< Back to All Publications

Augmented curation of disease diagnoses and medications for patients with hepatocellular carcinoma.

June 7 2023

Background: Unstructured EHR data contains nuanced rich insights for Real World Evidence (RWE) studies, however harnessing this unstructured data at scale can be challenging. Here, we evaluated two Natural Language Processing-based curation (“augmented curation”) models to identify medical conditions and medications from clinical notes for patients with hepatocellular carcinoma at City of Hope National Medical Center. Methods:We deployed two augmented curation models originally trained on de-identified EHR data from the nference nSights platform, a federated data platform comprising clinical data from several Academic Medical Centers, on clinical notes found in the City of Hope POSEIDON platform, an oncology insights engine. We assessed their performances using manual chart review as reference. We then calculated an Enrichment Factor (EF) score for augmented curation model extraction and its 95% confidence intervals. The EF score was defined as the number of patients captured by either augmented curation or structured data divided by the number of patients captured by structured data alone. Results:The augmented curation models captured conditions and medications from clinical notes with F1 scores of 0.93 and 0.95, respectively. Compared to structured EHR data alone, the disease diagnosis model captured significantly more individuals with signs and symptoms such as vomiting (EF: 22.8, 95% CI: [15.2, 41.1]), weight loss (EF: 5.0, 95% CI: [4.1, 6.6]), nausea (EF: 4.3, 95% CI: [3.1, 6.9]), and edema (EF: 3.1, 95% CI: [2.7, 3.7]). Similarly, the medication use augmented curation model captured significantly more individuals with medications such as interferon, aspirin, and pembrolizumab. From a Cox proportional hazards model analysis, we found that a survival model using augmented curation-based features and structured data achieved a model concordance of 0.745 (95% CI: [0.741, 0.748]), compared to 0.722 (95% CI: [0.717, 0.726]) for the model based on structured data alone. In addition, the augmented curation-based survival model identified jaundice (HR: 2.0, 95% CI: [1.5, 2.7]) as a significant risk factor for mortality, which was not picked up by the structured data-based survival model. Conclusions:Overall, this study shows that augmented curation models can be used to accurately capture comorbidities and medications from unstructured clinical notes, and these extracted covariates are correlated with clinically meaningful endpoints such as survival. We recommend using augmented curation as a standard insight generation approach in RWE study protocols.


Wui Ip, Colin Pawlowski, Vineet Mathew, Mayank Choudhary, Michiel Niesen, Akash Anand, Allen Mao, Ananth Peddinti, Chetan Kancharla, Isaac Kunz, Joseph D. Bonner, Matthew Boyle, Olamilekan Jinadu, Alex Pozhitkov, Xiaoyu Xia, Raul Sarmiento, Stacy Berger, Tarrah Kirkpatrick, Venky Soundararajan, Samir Courdy

nference, Cambridge, MA; nference Labs, Bengaluru, India; City of Hope National Medical Center, Duarte, CA; City of Hope Comprehensive Cancer Center, Duarte, CA


City of Hope

Correspondence to:

Venky Soundararajan (venky@nference.net)

Download Our One-Pager

Featuring key corporate highlights and an overview of nference’s technology