Special Offer: Get 50% off your first 2 months when you do one of the following
Personalized offer codes will be given in each session
Share This Webinar
To invite people, share this page:
About This Webinar

Healthcare and life science organizations are increasingly working with large-scale, multimodal datasets that include structured records, clinical notes, diagnostic images, and PDF documents.
Sharing this data for research and AI development requires rigorous de-identification to ensure patient privacy — without compromising the ability to extract insights across time and modalities.

In this webinar, experts from John Snow Labs and Databricks will demonstrate an end-to-end solution for automating the de-identification and tokenization of medical data with regulatory-grade accuracy. You’ll learn how to:

- Automatically de-identify structured data, unstructured text, DICOM & JPEG images, whole-slide pathology images (SVS), and PDFs using John Snow Labs’ industry-leading software and AI models
- Apply patient tokenization to enable linking of de-identified data across modalities and time points
- Use Databricks to process and scale these capabilities across large, real-world datasets
- Support HIPAA, GDPR, and other regulatory requirements for privacy-preserving research

This session is ideal for data scientists, clinical researchers, compliance teams, and healthcare IT leaders working with multimodal patient data who want to enable longitudinal, privacy-compliant research at scale.

When: Wednesday, July 16, 2025 · 2:00 p.m. · Eastern Time (US & Canada)
Duration: 1 hour
Language: English
Who can attend? Everyone
Dial-in available? (listen only): No
Featured Presenters
Webinar hosting presenter
Solutions Architect, Databricks
Srikanth Kumar Rana is a seasoned Field Engineer at Databricks, bringing extensive experience in helping organizations unlock the full potential of data and AI. With a strong focus on empowering customers, Srikanth has consistently demonstrated expertise in complex deployments, driving adoption, and enabling businesses to achieve tangible outcomes on the Databricks Lakehouse platform.
Webinar hosting presenter
Senior Data Scientist, Machine Learning Engineer, John Snow Labs
Youssef Mellah, Ph.D., is a Senior Data Scientist and Machine Learning Engineer at John Snow Labs, specialist with more than 8 years of experience in artificial intelligence, natural language processing, and deep learning. He specializes in building, training, and deploying regulatory-grade ML/DL models and large language models (LLMs) for healthcare and life sciences, including the de-identification and tokenization of multimodal medical data. Youssef has a strong track record designing scalable, privacy-preserving AI solutions that enable compliant research and analytics across structured and unstructured data. He is passionate about advancing NLP technology, leading multidisciplinary teams, and transforming cutting-edge research into practical, real-world applications.
Hosted By
Data Science Salon webinar platform hosts Regulatory-Grade Multimodal Medical Data De-Identification and Tokenization
The DATA SCIENCE SALON is a unique vertical-focused data science conference that grew into a diverse community of senior data science, machine learning, and other technical specialists. We gather face-to-face and virtually to educate each other, illuminate best practices and innovate new solutions in a casual atmosphere.