Abstract: Chemical industry pays much cost and long time to develop a new compound having aimed biological activity. On average, 10,000 candidates are prepared for each successful compound. The developers need to efficiently discover initial candidates before actual synthesis, optimization and evaluation. We developed a similarity-based chemical XAI system to discover probable compounds' spaces based on mixture of multiple mutated exemplars and bioassay existence ratio. Our system piles up 4.4k exemplars and 100M public DB compounds into vectors including 41 features. Users input two biologically active sets of exemplars customized with differentiated features. Our XAI extracts compounds' spaces simultaneously similar to multiple customized exemplars using vectors' distances and predicts their biological activity and target with the probability shown as existence ratio of bioassay that is the information of biological activity and target obtained from public DB or literature including related specific text string. The basis of prediction is explainable by showing biological activity and target of similar compounds included in the extracted spaces. The mixture of multiple mutated exemplars and bioassay existence ratio shown as probability with the basis of prediction can help the developers extract probable compounds' spaces having biological activity from unknown space. The response time to extract the spaces between two sets of 128 exemplars and 100M public DB compounds was 9 minutes using single GPU with HDD read and 1.5 minutes on memory. The bioassay existence ratio of extracted spaces was 2 - 9 times higher than the average of public ones. The correlation coefficient between predicted and actual pIC50 intensity of biological activity was 0.85 using randomly selected 64 compounds. Our XAI discovered probable compounds' spaces from large space at high speed and probability.
Authors: Takashi Isobe (Hitachi High Tech America, Inc., USA)
Email: takashi.isobe.sw@hitachi-hightech.com