Back to Blog

How nference Protects Patient Privacy – and Why De-identification Matters

nference Product Management

August 14, 2023

As artificial intelligence (AI) is increasingly used in healthcare, a big concern is privacy. No one wants their most personal information made public or sold. Yet, real-world healthcare data is essential to furthering our understanding of disease and developing algorithms to help accelerate scientific research, predict disease, improve diagnoses, and optimize treatment.

De-identification Techniques: Protecting Patient Privacy

Since nference deals with sensitive healthcare data day in and day out, we are often asked: How is patients’ privacy protected when data is accessed through the nSights platform?

The answer is that we use best-in-class federated AI to de-identify structured and unstructured data in electronic medical records. Our de-identification technology detects personal health information and transforms that information by replacing it with suitable surrogates. We refer to this technique as “hiding in plain sight.”

For example, all names are transformed in a manner that is consistent with format, gender, and ethnicity. So, a clinical note that says, “Ms. Lopez visited New York General Hospital for her routine checkup” becomes “Ms. Hernandez visited Mass General Hospital for her routine checkup.”

Dates are similarly transformed. In this example March 5th, 2014 becomes February 27th, 2014 and 03-05-2014 becomes 02-27-2014. The shift in the date is randomized by patient (up to 30 days), such that the dates are changed consistently throughout the entire record for a given patient. Locations and organizations are also replaced with suitable surrogates chosen from a predefined dictionary.

These suitable surrogates are used throughout the patient’s entire electronic medical record to ensure consistency and preserve data richness for researchers. Since no information is truly redacted, unlike other de-identification approaches, patient privacy is preserved with no loss of the rich clinical phenotypic information that often exists exclusively in unstructured form. This is often critical for research and real-world evidence generation.

Data Security and Accessibility Through the nSights Platform

We also safeguard patient data, which comes from our academic medical center and health system partnerships. Our federated data platform, nSights, enables insights from data that encapsulates the collective wisdom and experience of millions of scientists and physicians worldwide. Rather than sending the data for users to store and access on their own platforms – which increases the risk of data leaks - the data remains safely behind firewalls at the source.

Even though the data is fully de-identified, it remains accessible only on nSights, deployed within the cloud of our health system partners. The data will never leave that environment. As a result, we’re able to refresh the data in real-time within the platform, which facilitates analysis that is dependent on more recent patient encounters like the investigation of safety and efficacy of newly approved drugs in the real world.

Dedication to Privacy in the Age of AI

Through the nSights federated data platform, users of all backgrounds can leverage the de-identified data from our health system partners. Users without technical coding experience can still build and characterize complex patient cohorts based on any component of the underlying clinical data (including information from clinical notes, ECG and echo parameters, imaging DICOM headers, radiological findings, and more).

Users with data science and other technical skills can use our code-first analytics workspaces. They can build predictive AI models, use Python, R, SQL, etc. to directly query and analyze data, provisioning on-demand the computing resources (including GPUs) required for their work through our scalable computing infrastructure.

As we reimagine biomedicine in the age of AI, nference is dedicated to putting patient privacy first. Our goal is to provide the most comprehensive biological and clinical insights possible – and this means ensuring that all types of healthcare data are de-identified consistently across the platform.

Learn more about how nference built a best-in-class de-identification tool for EMRs through ensemble learning.

Let’s talk about how we can work together.

Fill out the form and we’ll get right back to you.