Special Offer: Get 50% off your first 2 months when you do one of the following
Personalized offer codes will be given in each session

(BigData 2020) Cross-Cancer Genome Analysis on Cancer Classification Using Both Unsupervised and Supervised Ap

About This Webinar

Abstract: Many problems exist within the current cancer diagnosis pipeline, one of which is alarmingly high over-diagnosis rates in breast, prostate, and lung cancer. Through quantifying gene expression levels, next-generation sequencing techniques such as RNA-Seq offer an opportunity for researchers and clinicians to gain a more complete view of a cell's transcriptome. With the adoption of this new data source, cross-cancer methods for cancer diagnosis have become more viable. We utilize mutual information in conjunction with a Gaussian mixture model and t-SNE to evaluate the separability of cancer and non-cancer tissue samples from RNA-Seq expression data. The Gaussian mixture and t-SNE combination produced clear clustering without supervision, suggesting the ability to separate tissue samples algorithmically. Afterwards, we use a collection of deep neural networks to classify tissue origin and status from tissue sample gene expressions. We use genes selected based on the prior mutual information technique. First, we select the top 500 genes from candidate genes without considerations for overlap in the predictability of those genes. We then applied Recursive Feature Elimination (RFE) to select 200 genes, thus accounting for covariation. We find that the performance using the top 500 genes is only slightly better than the 200 genes selected using RFE, and the two approaches achieved similar performance overall, indicating that only a small subset of genes is required for the identification of status and origin. This work indicates that RNA sequencing data is a useful tool for cross-cancer studies. Next steps include the implementation of a greater amount of non-cancer data from other datasets to decrease bias in model training.

Authors: Jonathan Zhou (Horace Greeley High School, USA); Baldwin Chen (Ardsley High School, USA); Nianjun Zhou (IBM, USA)

Email: jozhou@students.ccsd.ws, baldwinchen@gmail.com, jzhou@us.ibm.com

Who can view: Everyone
Webinar Price: Free
Featured Presenters
Webinar hosting presenter Services Society
Jonathan Zhou is a senior at Horace Greeley interested in the interesection of oncology and data science.
Hosted By
Services Society webinar platform hosts (BigData 2020) Cross-Cancer Genome Analysis on Cancer Classification Using Both Unsupervised and Supervised Ap
Services Society's Channel