Healthcare and life science organizations are increasingly working with large-scale, multimodal datasets that include structured records, clinical notes, diagnostic images, and PDF documents.
Sharing this data for research and AI development requires rigorous de-identification to ensure patient privacy — without compromising the ability to extract insights across time and modalities.
In this webinar, experts from John Snow Labs and Databricks will demonstrate an end-to-end solution for automating the de-identification and tokenization of medical data with regulatory-grade accuracy. You’ll learn how to:
- Automatically de-identify structured data, unstructured text, DICOM & JPEG images, whole-slide pathology images (SVS), and PDFs using John Snow Labs’ industry-leading software and AI models
- Apply patient tokenization to enable linking of de-identified data across modalities and time points
- Use Databricks to process and scale these capabilities across large, real-world datasets
- Support HIPAA, GDPR, and other regulatory requirements for privacy-preserving research
This session is ideal for data scientists, clinical researchers, compliance teams, and healthcare IT leaders working with multimodal patient data who want to enable longitudinal, privacy-compliant research at scale.